skip to content

The Betty & Gordon Moore Library

 

Tuesday 16 May 2017: Into the Unknown

At present, I’m working on bibliographic and citation data. Appropriately, my peaceful shared office in the Moore Library has two feet of books with titles like Essential Cataloguing. I’m mulling over Wikidata issues, particularly its solution to patchiness in data with values marked “unknown”, and some more traditional approaches.

Authors can be troublesome. Walter Scott’s first novel, Waverley, was published anonymously in 1814. Subsequent novels claimed to be by the “Author of Waverley”, and by the time Sir Walter Scott, now a baronet, owned up to the open secret of his authorship in 1827, two dozen books had appeared in the series. One can imagine librarians heaving a sigh of relief, that the Waverley novels could now snuggle up to Scott’s poetry books. Until then, that placeholder name had to be an “unknown thing”.

[Question Mark image]

Swift change of scene, and imagine having to deal with a problem with a substantial number of “known unknowns”. One way to go about it is simply to research all those unknowns. Another is to hope for more insight into the problem’s structure, as a first step.

So, why and how would anybody do that, beyond procrastination? Certainly, “why” can be answered as “anti-blob”, or in other words a protest against drudgery and on behalf of brainwork and concepts. As to how, I think there are hints available from logical ideas, but also from everyday logistics, such as journey planning. We’ll get to that.

The thing that has most impressed me in Wikidata, over the past couple of months, has been the query that can “look round corners”, as I would put it. How the queries work is for another time: today I’m after the corner concept. From the logical perspective, this style of query is seeking out “attributes of attributes”. Help, please, an example!

Think about the concept of the non-playing team captain, in some sport: for example in Davis Cup tennis. The team has a captain; the captain has a sports career, and (realistically) has retired. If so, the retirement date would be an attribute of the captain; and who the captain is an attribute of the team. On Wikidata, the captain could appear on the Davis Cup team’s item; and the retirement date on the captain’s item, which is linked from the team’s item. WE could access it by that route, assessing the captain’s recent active tennis experience.

So, getting back to Walter Scott, the placeholder concept “Author of Waverley” isn’t mere dickering on. It gives a corner to “look round”, from an item about a novel. It was always possible, at least, to shelve Scott’s novels together. In problem-solving, there is a constructivist idea about implication that goes back to Andrey Kolmogorov, certainly one of the major mathematicians of the 20th century. It says that the implication of problem A for problem B has an operational form, if we have a method that makes us a solution for problem B, with input a solution to problem A.

What we can call a “Kolmogorov corner”, therefore, talks about an intermediate goal A on the way to solving B. The solution, if we can access it this way, would be a procedure applied to a procedure. This is nothing more than a heuristic. But it may yield insight, since A itself may not involve all the unknowns. So it can be, with timetables, which are rarely read cover-to-cover. How do we read them?

To make it on time to an appointment on the other side of town, when I know I’ll have to take a bus and then a train, I could work out a plausible train to catch and only then consult a bus timetable for a connecting service. That approach has advantages over reading both timetables at once. To make the analogy clearer, the “plausible train” is a placeholder, which is turned into a target time of arrival at the train station. We do try to economise the effort of consulting timetables, and we do at an intuitive level gather the implications of the presence or absence of connections.

In the data world, it turns out, using placeholders for missing values is a “glass half full” approach to patchy information, though obviously a placeholder can have no warranty. It is a coping strategy, and parsimonious with effort. In contrast, the code world is “glass half empty” about values: the computer can be stuck in a loop, and Turing told us that we may be completely at the code’s mercy when it comes to how long it takes for an answer to turn up. The optimist says “you never know what’s just round the corner”, the pessimist worries that it is the accusation “you are the weakest link!”

Engage with us

 

News link Read our latest news

Twitter logo Follow us on Twitter

Facebook logo Like us on Facebook

Instagram logo Follow us on Instagram

You Tube logo Subscribe to our YouTube channel