skip to content

Latest University advice regarding Covid safety can be found here.

The Betty & Gordon Moore Library offers a range of online support services and ways you can access our collections, to help you learn, teach and research. 

The Betty & Gordon Moore Library


Tuesday 11 July 2017: Tools and Skills

With nine hours of Wikimedia training at the Moore Library over recent weeks behind me, I’m naturally in a meditative mood. Last Thursday I used “real data” on my trainees.

One exercise was with 20 sheets of A4, each about a person called John Eyre. I asked the group to determine how many distinct people there were. One point was to cultivate respect for the underlying issue of “disambiguation”. As in the case I posed, it may be by no means straightforward to answer the question “is A = B?”. Leibniz, who has come up before in these postings, said this is true if, and only if, A and B have all the same properties (his “principle of indiscernibles”). As mathematicians say, one way is trivial.

In practice, though, much research may be needed. It was interesting to see that, although my initial talk had gone over tools for disambiguation, on Wikipedia and Wikidata, and via “reconciliation”, the group concentrated on close reading of the information they were given. One skill I saw as missing was thinking outside the box, therefore. But then the exercise was a bit unfair, for those who took it to be a packaged-up lesson. Perhaps that is another way of putting the same point. The toolbox can include dissatisfaction with progress down one avenue. A discussion about circumstantial evidence and identification was available, but didn’t happen.

[IMAGE = Toolbox]

Another exercise was with newspaper headlines, largely from The Guardian. There I asked the group to mark up those that could, even in principle, be placed in Wikidata as facts. There were three classes: R, representing a binary relation R(x,y) that could be read as “item x in Wikidata could carry the statement P(y)” where y is another item and P is a property making the statement logically equivalent to R(x,y). So for example x being David Beckham, y being Victoria Beckham, and P reading “spouse”.

There were two other classes, R+Q for a potential statement with a qualifier (such as an attribution for a statement), and also R+QQ for more complex uses of qualifiers. It looked like about 10% of headlines could be marked up, with my pre-selection: most headline writers in the mainstream press put some spin or interpretation into their work, to pep up the news. The exercise (10 minutes) seemed successful in surfacing a few issues around factuality and its expression.

One other thing from the same session was a callback to my idea of “niche” (see my Reliability Ecology posting). Dealing with Wikidata’s reliability, there is a common remark that statements about external identifiers, making up a good 50% of current Wikidata content, “don’t need” references. Well, critical thinking is involved. Such statements amount to saying that item I on Wikidata is the same as item J in another online database D. Some sort of warranty that the identification I=J is correct would be valuable; but the real point is that there is a link from I to D, and simply re-expressing that linkage as a reference gets one no further forward. In fact it may camouflage an error in identification, which is why the niche terminology is apt.

Overall, the need for research skills came up in the first hands-on session as well as the third. Gender issues came up at the end of that session, and in the second talk. Actual proposed solutions to the “workflow issue”, or in other words how to do Wikimedia editing effectively, were seen to lie with serious tools: Petscan, SPARQL, mix’n’match. I was not giving industrial-style training designed solely to move people up the learning curve with just one tool, though I hope what I did with SPARQL had some of that effect.

My link list for the third session is online. A general point that came up is that Venn diagrams are both useful, and just a bit unfamiliar in some contexts I learned via SMP maths back in the day. They are a very basic tool for “engaging with data”. This series of posts will come to an end here, at least for the present.

Engage with us


News link Read our latest news

Twitter logo Follow us on Twitter

Facebook logo Like us on Facebook

Instagram logo Follow us on Instagram

You Tube logo Subscribe to our YouTube channel