skip to content
 

Tuesday 27 June 2017: Reliability Ecology

Last Thursday I began a series of talks and training sessions, in the Glass Room downstairs at the Betty & Gordon Moore Library. I took as my talk title “Reliability of Wikipedia”, an old chestnut of a topic, and yet one rather neglected in terms of serious discussion. It provoked me into formulating the problem in ways that hadn’t occurred to me before. I’ll explore these further here, not just because they are fresh in my mind. I also hope they can lead to some insights into “noisy” data.

I intentionally covered much ground, starting with some conventional wisdom about the roles of traffic and referencing as positive factors in supporting reliable content. I went over article evaluation, adding T for traffic to an old acronym, to get STREWD. I talked about editing histories, and the major watershed in the English Wikipedia’s history a decade ago, for me the most important event for the site. I read it as a sign the community embraced quality over quantity in 2007, with a great deal of fallout.

To get further, I introduced a notion of “niche”: simply put, one should know the places on Wikipedia where mistakes can lurk. Trivia sections have long been notorious. In terms of reliability, they are obvious fire hazards. Supposed facts that are entertaining are often not scrutinised properly.

But I wanted to make also a quantitative point. In much the same way that minimum wages are usually discussed by economists in relation to median wages, the median time for removal of a mistake on Wikipedia is a useful rule of thumb. A blogpost Are you an academic who vandalises Wikipedia? Then stop it! makes a useful point. Doing a test vandal-style edit to Wikipedia, which is soon reverted, only confirms that the median time of survival of bad additions is low.

Where income inequality is high, the mean (average) wage is pulled up by those earning millions. In just the same way, though reasoning about “undetected mistakes” is as fallacious as talking about “undetected murders”, case studies show the mean time of survival of mistakes on Wikipedia is pulled up by unobvious errors. I showed examples, one where a misidentification had hung around for a decade, another where sources that would be considered authoritative had a mistake about an office-holder.

[IMAGE = Strawberries]

The niche is a classical architectural concept: as shown in the 1685 still life by Adriaen Coorte above. The ecological niche, a derived idea, is not one concept, but several, so I should clarify. Simply enough, we can think about mistakes in Wikipedia as prey, and those who look out for them as predators. So, mistakes are more likely, on the face of it, to be found in places with less traffic. We can deal with the issue of high-traffic articles that attract typical drive-by edits of low quality, by noting that those are not likely to push up the mean survival time.

Pushing on with the metaphor, one comes quickly to “camouflage”. Everything being in plain sight, what escapes attention? Unreferenced material may be apparent: sometimes not, because the scope of a footnote may not be explicit. Here we encounter some useful ideas: mistakes coming from “upstream” (referenced to an external source that is mistaken); mistakes because the reference is wrongly interpreted; references to dead links. These ideas I touched on in my talk.

But another key notion is survival. The meme concept of Richard Dawkins is popular for the self-perpetuating misconception, or the “urban myth” that happened to a friend of a friend. I prefer, though, the factoid concept introduced a few years before the meme, by Norman Mailer. It has been done no favours by its common usage as synonymous with “trivia”. See A factoid is not a small fact. Fact by David Marsh in The Guardian.

I said on Thursday that factoid is the genus, trivia just one species of the genus, and that the niches I meant should be considered as tied to other species of the genus. That was a bit offhand. I did want to open a new avenue on “fakery” in the media. The challenging of common misconceptions is no small task: Wikipedia does its best.

What matters is that Mailer was on the trail of unfounded media content. Wikipedians do become sceptical about mainstream media, which may be fact-checked according to its own lights, but is not footnoted. One can classify my niches both by how they become populated in the first place, and how the mistakes in them escape scrutiny for long enough to become memes.

Journalists probably know the chinks in Wikipedia’s armour well by now. Academics perhaps could consider a view of the online world as deeply interconnected and factoid-riddled, before resorting to point-scoring. How about article evaluation as a basic skill? We are all in this together.

Links supporting my talk are here. I welcome comments by e-mail to the address on the Engaging with Data home page.

Engage with us

News link Read our latest news

Twitter logo Follow us on Twitter

Facebook logo Like us on Facebook

You Tube logo View our YouTube channel

You Tube logo Learn with 23 Research Things