Category Archives: Uncategorized

The Accidental Data Scientist

Image of book cover 'The Accidental Data Scientist' by Amy Affelt
    Start a conversation 
Estimated reading time 2–3 minutes

The Accidental Data Scientist* by Amy Affelt is a clarion call to librarians and other information professionals to immerse themselves in the world of Big Data. As such, it is a solid introduction, emphasizing how the traditional skills of librarians are crucial in ensuring that Big Data are reliable, properly prepared, indexed, and abstracted, and intelligently interpreted.

Affelt reassuringly shows that the ‘problems’ of Big Data are not new, but very familiar to librarians, and indicates ways that librarians can add value to Big Data projects, by ensuring such projects deliver what is expected and anticipated. Data and Computer Scientists are good at writing algorithms to process data mathematically, but may not be trained in asking the right questions or knowing where to look for biases and flaws in data sets, and a Big Data project that fails in these aspects could prove an expensive disaster for an organization.

Chapters outlining the tools and techniques currently available for processing and visualizing Big Data, and applications and initiatives in various industry sectors are informative for those new to the issues, and a helpful guide for experienced librarians to demonstrate how their skills are transferable.

Affelt gives examples of specific projects and describes how the input of librarians – especially when ’embedded’ in data project teams – is extremely beneficial. She suggests ways of proving the value of librarians in modern corporate settings and gives tips and suggestions on career development.

For information professionals unsure about how to engage with the opportunities Big Data offers, this is a wide-ranging and clear overview, and a great starting point.

With increasing media reports of algorithmic bias and amidst a deluge of fake news, it is more important than ever that Big Data projects include professionals with the skills to recognize and identify problematic sources and skewed datasets, and I hope that librarians and information professionals step up and hear Affelt’s call to action.

*Presumably named in the tradition of The Accidental Taxonomist by Heather Hedden.

Semantic Theatre gets practical

    Start a conversation 
Estimated reading time 2–2 minutes

I have started to look into the CIDOC Conceptual Reference Model for cultural heritage metadata as part of my investigation of the concept of Semantic Theatre.

An events-based approach is used in a lot of ontological modelling. Thanks to Athanasios Velios, I learned that bookbinding can be broken down into a sequence of events, and this is an obvious route to try when thinking about how to model a performance event.

I think there is potential for relating the “objects” in the play – the performers as well as set, props, etc, – to concepts within the play. So far, I have been focusing mainly on modelling relationships between ideas within the script (e.g. lines where this character uses the ocean as a metaphor for life) and possibly comparing across scripts (e.g. which lines reference King Lear) but it would be interesting to include props and actors as well (e.g. in which scenes is a clock used as a reference to death). The use of a prop could easily be modelled as a distinct event within a play, and this would facilitate relating literary and metaphorical ideas to the object rather than just to the words in the script.

The play itself – Ocean Opera – will be performed at the Montreal Fringe Festival in June.

Stories, effectiveness, and efficiency

Estimated reading time 3–4 minutes

I’ve not been writing much lately, having finished my dissertation on September 1st and hours later having handed in my notice at work, to take up a new post as Taxonomy Manager for the BBC. I was delighted to be offered a role that follows on directly from my studies of taxonomy work, and I can’t wait to get started.

I have been very busy during September handing over to my successor, so inevitably thinking about knowledge transfer. Records management has been for the most part fairly straightforward mainly due to the nature of the business, which has enabled us to be reasonably efficient records managers, but I found it very hard to express my tacit knowledge well except through stories. This reminded me of a post by Ron Baker on effectiveness as opposed to efficiency.
Good records management is the “baseline efficiency” you need to keep functioning. It is hard to gain a competitive advantage simply by having decent records management, because if you don’t, you won’t even meet basic professional standards. Effectiveness, however, is a much more elusive beast – relying on slippery concepts like tacit knowledge, judgement calls based on experience and intuition, even artistry.
Storytelling in business has become popular because it is such a natural way of communicating expressively, as has the use of scenarios and personas in marketing and design. However, what surprised me was how formulaic my stories were – even though they applied to different areas of the business and different situations. The same characters (including myself) followed the same patterns of behaviour, through technology upgrades, changing customer needs, and other staff coming and going. I have been facing the same dilemmas and worrying about the same things over and over again, while at the time believing that things were changing and situations were different, probably because I focused on the differences not the similarities each time.

This reminded me that managing characters is just as important as managing situations (or technologies or products) and also how useful it would have been to have tried some storytelling earlier on. However, it takes time to see patterns, so you need storytellers to stick around long enough to be able to grasp what is a repeating dynamic and what is coincidence. The fast turnaround of knowledge managers is an obvious barrier to this. At the very least, it means the knowledge managers have to identify the people who have been around long enough to see the patterns in the stories, rather than expect to find it easy to pick up patterns themselves. In an organisation, there are many intertwined stories operating at different levels – from the stories of individual careers, single projects, to the overall corporate history. The conflicts and resolutions in these stories – how the tanking project was salvaged, the difficult client appeased, the divided team reunified – and between the levels of stories, seem to me to be where you will find the secrets of organisational effectiveness.

It is very easy to see taxonomies solely as mechanisms of efficiency – classifying documentation related to very linear processes such as stages in a project – but they also embody characters and stories, reflecting what is culturally important, for example. Taxonomies for knowledge discovery in particular are most effective when they are able to work with stories – if you are looking for paint does that suggest a story in which you also want paintbrushes, white spirit, an easel, etc.?

An epistemological problem with folksonomies

    Start a conversation 
Estimated reading time 2–3 minutes

I’m still mulling over Helen Longino’s criteria for objectivity in scientific enquiry (see previous post: Science as Social Knowledge) and it occurred to me that folksonomies are not really open and democratic, but are actually obscure and impenetrable. The “viewpoint” of any given folksonomy might be an averaged out majority consensus or some other way of aggregating tags might have been used, and so you can’t tell if it is skewed by a numerically small but prolifically tagging group. This is the point Judith Simon made in relation to ratings and review software systems at the ISKO conference, but it seems to me the problem for folksonomies is even worse, because of the echo chamber effect of people amplifying popular tags. Without some way of showing who is tagging what and why, the viewpoint expressed in the folksonomy is a mystery. This is not necessarily the case, but I think you’d need to collect huge amounts of data from every tagger, then database it along with the tags, then run all sorts of analyses and publish them in order to show the background assumptions driving the majority tags.

If the folksonomic tags don’t help you find things, who could you complain to? How do you work out whether it doesn’t help you because you are a minority, or for some other reason? With a taxonomy, the structure is open – you may not like it but you can see what it is – and there will usually be someone “in charge” who you can challenge and criticise if you think your perspective has been overlooked. In many case the process of construction will be known too. I don’t see an obvious way of challenging or criticising a folksonomy in this way, so presumably it fails Longino’s criteria for objectivity.

You can just stick your own tags into a folksonomy and use them yourself so there is some trace of your viewpoint in there, but if the rest of the folksonomy doesn’t help you search, that means you can only find things once you have tagged them yourself, which would presumably rule out large content repositories. So, you have to learn and live with the imposed system – just like with a taxonomy – but it’s never quite clear exactly what that system is.

You in the Dewey Decimal System

    1 comment 
Estimated reading time 1–2 minutes Dewey Decimal System Meme. Some seasonal silliness!

via Impressions Scholarcast.

Here’s mine:

Fran’s Dewey Decimal Section:
002 The book

000 Computer Science, Information & General Works

Encyclopedias, magazines, journals and books with quotations.

What it says about you:
You are very informative and up to date. You’re working on living in the here and now, not the past. You go through a lot of changes. When you make a decision you can be very sure of yourself, maybe even stubborn, but your friends appreciate your honesty and resolve.

Find your Dewey Decimal Section at

Puzzled by The Future of Information Architecture

    Start a conversation 
Estimated reading time 2–2 minutes

I read a copy of The Future of Information Architecture: Conceiving a Better Way to Understand Taxonomy, Network and Intelligence because I couldn’t resist the title, but was left utterly baffled by the book. The author appears to have taught at some US universities, but no biography was provided and the preface declared that due to the “political incorrectness” of his ideas, no institution or establishment had supported him in writing and publishing the book. Nevertheless, he seems to have produced quite a few books over the last few years. The publisher, Chandos Press, apparently printed the book directly from camera ready copy supplied by the author.

He writes in an extremely dense and academic style using phrases like “existential dialectics” and “post-human post-civilization”. I usually pride myself on being able to “translate” philosophy into “normal” English, but could not work out what was going on. The gist seemed to be a description of taxonomies and networks in terms of six “principles” (opposites such as simplicity/complexity, order/chaos) and I had expected some kind of conclusion to draw these principles into a proposition. Instead, he suggested that there were many more principles that could be used.

From the title I had hoped for some predictions about how IA might develop under the influence of social media or cloud computing etc., but there was nothing like that in the book. Instead, there were some statements about post-human evolution and the impossibility of predicting what IA will be like when we cease to be humans and become “free floating consciousnesses”.

Metadata and Taxonomy Conference

    1 comment 
Estimated reading time 1–2 minutes

The Essentials of Metadata and Taxonomy Conference in London on March 10th was a first for event organisers Henry Stewart Events. They were told that the subject was “too niche” , “no-one would turn up”, and “noboby would be interested”. They were not dissuaded, and went ahead with what turned out to be a wonderfully content-rich and fact-dense day. I’ve written a summary of the conference which is available here.

A host of big name speakers (Madi Solomon former Corporate Nomenclature Taxonomist of Walt Disney, Seth Earley of Earley & Associates, John Jordan of Siemens, Chris Sizemore and Silver Oliver from the BBC were just a few) gave fascinating and insightful talks. There were also lots of software overviews which I found very helpful (including an assessment by Theresa Regli from CMS Watch) and as is always a real treat at these events the opportunity to meet lots of other taxonomists and information architects. The food was good too!

Research Methods

    Start a conversation 
< 1 minute

Research Methods in Information by Alison Jane Pickard (2007–Facet publishing), is a worthy reference tome covering topics from the various research paradigms through to how to present a dissertation. It reads primarily as a textbook for students, but would be a handy resource for anyone new to research.