Category Archives: KO

KO

In the beginning was the word: the evolution of knowledge organisation

    1 comment 
Estimated reading time 3–5 minutes

I was delighted to be introduced by Mark Davey to Leala Abbott on Monday. Leala is a smart and accomplished digital asset management consultant from the Metropolitan Museum of Art in New York and we were discussing how difficult it is to explain what we do. I told her about how I describe “the evolution of classification” to people and she asked me to write it up here. So, this is my first blog post “by commission”.

word
In the beginning there was the word, then words (and eventually sentences).

list
Then people realised words could be very useful when they were grouped into lists (and eventually controlled vocabularies, keyword lists, tag lists, and folksonomies).

taxonomy
But then the lists started to get a bit long and unwieldy, so people broke them up into sections, or categories, and lo and behold – the first taxonomy.

faceted taxonomy
People then realised you could join related taxonomies together for richer information structuring and they made faceted taxonomies, labelling different aspects of a concept in the different facets.

ontology
Then people noticed that if you specified and defined the relationships between the facets (or terms and concepts), you could do useful things with those relationships too, which becomes especially powerful when using computers to analyse content, and so ontologies were devised.

Here is a very simple example of how these different KO systems work:

I need some fruit – I think in words – apples, pears, bananas. Already I have a shopping list and that serves its purpose as a reminder to me of things to buy (I don’t need to build a fruit ontology expressing the relationships between apples and other foodstuffs, for example).

When I get to the shop, I want to find my way around. The shop has handy signs – a big one says “Fresh fruit”, so I know which section of the shop to head for. When I get there, a smaller sign says “Apples” and even smaller ones tell me the different types of apples (Gala, Braeburn, Granny Smith…). The shop signs form a simple taxonomy, which is very useful for helping me find my way around.

When I get home, I want to know how to cook apple pie, so I get my recipe book, but I’m not sure whether to look under “Apples” or “Pies”. Luckily, the index includes Apples: Pies, Puddings and Desserts as well as Pies, Puddings and Desserts: Apples. The book’s index has used a faceted taxonomy, so I can find the recipe in either place, whichever one I look in first.

After dinner, I wonder about the history of apple pies, so I go online to a website about apples, where a lot of content about apple pies has been structured using ontologies. I then can search the site for “apple pie” and get suggestions for lots of articles related to apples and pies that I can browse through, based on the ideas that the people who built the ontology have linked together. For example, if the article date has been included, I could also ask more complex questions such as “give me all the articles on apple pies written before 1910”, and if the author’s nationality has been included, I could ask for all the articles on apple pies written before 1910 by US authors.

People often ask me if a taxonomy is better than a controlled vocabulary, or if an ontology is the best of all, but the question doesn’t make sense out of context – it really depends what you are trying to do. Ontologies are the most complex and sophisticated KO classification tools we have at the moment, but when I just want a few things from the shop, it’s a good old fashioned list every time.

Using taxonomies to support ontologies

    9 comments 
Estimated reading time 4–6 minutes

What is an ontology?
Ontologies are emerging from the techie background into the knowledge organisation foreground and – as usually happens – being touted as the new panacea to solve all problems from content management to curing headaches. As with any tool, there are circumstances where they work brilliantly and some where they aren’t right for the job.

Basically, an ontology is a knowledge model (like a taxonomy or a flow chart) that describes relationships between things. The main difference between ontologies and taxonomies is that taxonomies are restricted to broader and narrower relationships whereas ontologies can hold any kind of relationship you give them.

One way of thinking about this is to see taxonomies as vertical navigation and ontologies as horizontal. In practice, they usually work together. When you add cross references to a taxonomy, you are adding horizontal pathways and effectively specifying ontological rather than taxonomical relationships.

The flexibility in the type of relationship that can be defined is what gives ontologies their strength, but is also their weakness in that they are difficult to build well and can be time consuming to manage because there are infinite relationships you could specify and if you are not careful, you will specify ones that keep changing. Ontologies can answer far more questions than taxonomies, but if the questions you wish to ask can be answered by a taxonomy, you may find a taxonomy simpler and easier to handle.

What are the differences between taxonomies and ontologies?
A good rule of thumb is to think of taxonomies as being about narrowing down, refining, and zooming in on precise pieces of information and ontologies as being about broadening out, aggregating, and linking information. So, a typical combination of ontologies and taxonomies would be to use ontologies to aggregate content and with taxonomies overlaid to help people drill down through the mass of content you have pulled together.

Ontologies can also be used as links to join taxonomies together. So, if you have a taxonomy of regions, towns, and villages and a taxonomy of birds and their habitats you could use an ontological relationship of “lives in” to show which birds live in which places. By using a taxonomy to support the ontology, you don’t have to define a relationship between every village and the birds that live there, you can link the birds’ habitats to regions via the ontology and the taxonomy will do the work of including all the relevant villages under that region.

Programmers love ontologies, because they can envisage a world where all sorts of relationships between pieces of content can be described and these diverse relationships can be used to produce lots of interesting collections of content that can’t easily be brought together otherwise. However, they leave it to other people to provide the content and metadata. Specifying all those relationships can be complicated and time-consuming so it is important to work out in advance what you want to link up and why. A good place to start is to choose a focal point of the network of relationships you need. For example, there are numerous ways you could gather content about films. You could focus on the actors so you can bring together the films they have appeared in to create content collections describing their careers, or focus on genres and release dates to create histories of stylistic developments, or you could link films that are adaptations of books to copies of those books. The choices you make determine the metadata you will need.

Know your metadata
At the moment, in practice, ontologies are typically built to string together pre-existing metadata that has been collected for navigational or archival taxonomies, but this is just because that metadata already exists to be harvested. There is a danger in this approach that you end up making connections just because you can, not because they are useful to anybody. As with all metadata-based automated systems, you also need to be careful with the “garbage in garbage out” problem. If the metadata you are harvesting was created for a different purpose, you need to make sure that you do not build false assumptions about its meaning or quality into your ontology – for example, if genre metadata has been created according to the department the commissioning editor worked for, instead of describing the content of the actual programme itself. That may not have been a problem when the genre metadata was used only by audience research to gather ratings information, but does not translate properly when you want to use it in an ontology for content-defining purposes.

Feeding your ontology with accurate and clearly defined taxonomies is likely to give you better results than using whatever metadata just happens to be lying about. Well-defined sets of provenance metadata – parametadata – about your taxonomies and ontologies is becoming more and more valuable so that you can understand what metadata sets were built for, when they were last updated, and who manages them.

Why choose one when you can have both?
Ontologies are very powerful. They perform different work to taxonomies, but ontologies and taxonomies can support and enhance each other. Don’t throw away your taxonomies just because you are moving into the ontology space. Ontologies can be (they aren’t always – see Steve’s comment below) big, tricky, and complicated, so use your taxonomies to support them.

Taxonomy as an application for an open world

    6 comments 
Estimated reading time 9–15 minutes

This post is based on the notes I made for the talk I gave at the LIKE dinner on February 25th. It covers a lot of themes I have discussed elsewhere on this blog, but I hope it will be useful as an overview.

Taxonomies have been around for ages
Pretty much the oldest form of recorded human writing is the list, back in ancient Sumeria, the Sumerian King list for example is about 4,000 years old. By the time of the ancient Greeks, taxonomies were familiar. We understand that something is a part of something else, and the notion of zooming in or narrowing down on the information we want is instinctive.
I am frequently frustrated by the limitations of free text search (see my earlier post Google is not perfect). The main limitation is to knowledge discovery – you can’t browse sensibly around a topic area and get any sense of overview of the field. Following link trails can be fun, but they leave out the obscure but important, the non-commercial, the unexpected.

The very brilliant Google staff are working on refining their algorithms all the time, but Google is a big commercial organisation and they are going to follow the money, which isn’t always where we need to be going. Other free text search issues include disambiguation/misspellings – so you need hefty synonym control, “aboutness” – you can’t find something with free text search if it doesn’t mention the word you’ve searched for, and audio-visual retrieval. The killer for heritage archives (and for highly regulated companies like pharmaceutical and law firms) is comprehensiveness – we don’t just want something on the subject, we want to know that we have retrieved everything on a particular subject.

Another myth is that search engines don’t use classification – they do, they use all sorts of classifications, it’s just that you don’t tend to notice them, partly because they are constantly being updated in response to user behaviour, giving the illusion that they don’t really exist. What is Google doing when it serves you up its best guesses, if not classifying the possible search results and serving you the categories it calculates are closest to what you want?

I’m a big fan of Google, it’s a true modern cathedral of intellectual power and I use Google all the time, but I seem to be unusual in that I don’t expect it to solve all my problems.
I also am aware of the fact that we can’t get to look at Google’s taxonomic processes arguably makes Google more political, more manipulable, and more big brother-ish than traditional open library classifications. We may not totally agree with the library classifications nor the viewpoints of their creators, but at least we know what those viewpoints are!

There was a lot of fuss about the rise of folksonomies and free tagging as being able to supersede traditional information management – and in an information overloaded world we need all the help we can get – the trouble is that folksonomies expand, coalesce, and collapse into taxonomies in the end. If they are to be effective – rather than just cheap – they need to do this – and either become self-policing or very frustrating. They are a great way of gathering information, but then you need to do something with it.

Folksonomies, just as much as taxonomies, represent a process of understanding what everyone else is talking about and negotiating some common ground. It may not be easy, but it is a necessary and indispensable part of human communication – not something we can simply outsource or computerise – algorithms just won’t do that for us. Once everything has been tagged with every term associated with every viewpoint, nothing might as well have been tagged at all. Folksonomies, just as much as taxonomies, collapse into giving a single viewpoint – it’s just that it is a viewpoint that is some obscure algorithmic calculation of popularity.

So, despite free text search and folksonomies, structured classification remains a very powerful and necessary part of your information strategy.

It’s an open world
Any information system – whatever retrieval methods it offers – has to meet the needs of its users. Current users can be approached, surveyed, talked to, but how do you meet the needs of future users? The business environment is not a closed, knowable constrained domain, but is an “open world”1 where change is the only certainty. (Open world is an expression from logic. It presumes that you can never have complete knowledge of truth or falsity. It is the opposite of the closed world, which works for constrained domains or tasks where rules can be applied – e.g. rules within a database).

So, how do you find the balance between stability, so your knowledge workers can learn and build on experience over time, while being able to react rapidly to changes?

Once upon a time, not much happened
The early library scientists such as Cutter, Kelley, Ranganathan, and Bliss, argued about which classification methods were the best, but they essentially presumed that it was possible to devise a system that maximised “user friendliness” and that once established, it would remain usable well into the future. By and large, that turned out to be the case, as it took many years for their assumptions about users to be seriously challenged.

Physical constraints tended to dictate the amount of updating that a system could handle. The time and labour required to re-mark books and update a card catalogue meant that it was worth making a huge effort to simply select or devise a classification and stick to it. It was easier to train staff to cope with the clunky technology of the time than adapt the technology to suit users. No doubt in the future, people will say exactly the same things about the clunky Internet and how awful it must have been to have to use keyboards to enter information.

So, it was sensible to plan your information project as one big chunk of upfront effort that would then be left largely alone. It is much easier to build systems based on the assumption that you can know everything in advance – you can have a simple linear project plan and fixed costs. However, it is very rare for this assumption to hold for very long, and the bigger the project, the messier it all gets.

Change now, change more
Everything is changing far more rapidly than it used to – from the development of new technologies to the rapid spread of ideas promoted by the emergence of social media and an “always on” culture. It’s harder than ever to stay cutting edge!

We all like to speak our own language and use our own names for things, and specialists and niche workers as well as fashionistas and trendsetters expect to be able to describe and discuss information in ways that make sense to them. The open philosophy of the Web 2.0 world means that they increasingly take this to be their right, but this is where folksonomic approaches can really help us.

What you need to do is to create a system that can include different pace layers so that you get the benefits of a stable taxonomy, with the rapid reactiveness of folksonomy as well as quick and easy free text search. You can think of your taxonomy as the centre of a coral reef, but coral is alive and grows following the currents and the behaviour of all the crazy fish and other organisms that dart about around it. It’s hard to pin down the crazy fish and other creatures, but they feed the central coral and keep it strong. In practice, this means incorporating multiple taxonomies and folksonomies and mapping them to one another, so that everyone can use the taxonomy and the terminology that they prefer. Taxonomy mapping tools require human training and human supervision, but they can lighten the load of the labour intensive process of mapping one taxonomy to another.

This means that taxonomy strategy does not have to be determined at a fixed point, but taxonomy creation is dynamic and organic. Folksonomies and new taxonomies can be harvested to feed back updates into the central taxonomy, breaking the traditional cycle of expensive major revision, gradual decline until the point of collapse, followed by subsequent expensive major revision…

There is a convergence between semi-automated mapping (we’ll be needing human editorial oversight for some time) and the semantic web project. This is the realisation of the “many perspectives, one repository” approach that should get round many problems of the subjective/objective divide. If you can’t agree on which viewpoint to adopt, why not have them all? Any arguments then become part of the mapping process – which is a bit of a fudge, but within organisations has the major benefit of removing a lot of politicking that surrounds information and knowledge management. It all becomes “something technical” to do with mapping that nobody other than information professionals is very interested in. Despite this, there is huge cultural potential when it comes to opening up public repositories and making them interoperate. The Europeana project is a good example.

Modern users demand that content is presented to them in a way that they feel comfortable with. The average search is a couple of words typed into Google, but they are willing to browse if they feel that they are closing in on what they want. To increase openness and usage means providing rich search and navigation experiences in a user-friendly way. If your repository is to be promoted to a wider audience future, the classification that will enable the creation of a rich navigation experience needs to be put in place now.

Your users should be able to wander about through the archive collections horizontally and vertically and to leave and delve into other collections, or to arrive at and move through the archive using their own organisation’s taxonomy and to tag where they want to tag, using whatever terms they like. The link points in the mappings provide crossroads in the navigation system for the users.

In this way the taxonomies are leveraged to become “hypertextual taxonomies” that provide rich links both horizontally and vertically.

Taxonomy as a spine
A core taxonomy that acts as an indexing language is the central spine to which other taxonomies can be attached and crucially – detached – as necessary. The automation of the bulk of the mapping process means that incorporating a new taxonomic view
becomes a task of checking the machine output for errors. Automated mapping processes can provide statistical calculations of likelihood of accuracy and so humans only need to examine those with a low likelihood of being correct.

Mapping software has the same problems as autoclassification software, so a mapping methodology, including workflow and approval processes, has to be defined and supported. The more important it is to get a fine-grained mapping, the more effort you will need to make, but a broad level mapping is easier to achieve.

Conclusion
If you start thinking of the taxonomy as an organic system in its own right – more like an open application that you can interact with – bolting on and removing elements as you choose, you do not need to attempt to account for every user viewpoint in the creation of the taxonomy, and that omission of a viewpoint at one stage does not preclude that collection from being incorporated later. Conversely, the mapping process allows “outsiders” to view your assets through their own taxonomies.

Our taxonomies represent huge edifices of intellectual effort. However, we can’t preserve them in aspic – hide them away as locked silos or like grand stately homes that won’t open their doors to the public. If we want them to thrive and grow we need to open them up to the light to let them expand, change and interact with other taxonomies and take in ideas from the outside.

Once you open up your taxonomy, share it and map it to other taxonomies, it becomes stronger. Rather than an isolated knowledge system that seems like a drain on resources, it becomes an embedded part of the information infrastructure, powering interactions between multiple systems. It ceases to be a part of document management, and becomes the way that the organisation interacts with knowledge globally. This means that the taxonomy gains strength from its associations but also gains prestige.
So our taxonomies can remain our friends for a little while longer. We won’t be hand cataloguing as we did in the past because all the wonders of the Google and automated world can be harnessed to help us.

KO

More on mapping

    Start a conversation 
Estimated reading time 1–2 minutes

When trying to integrate diverse vocabularies and repositories, the way to go is mapping – metadata crosswalks as they are known in the US. I’ve been looking for software that can handle mappings between taxonomies, of which there are a range on the market, but what is really exciting is the development of automated mapping tools to take much of the “heavy lifting” out of the work (for example Synaptica’s AutoMatch).

It seems to me that there is a convergence between semi-automated mapping (we’ll be needing human editorial oversight for some time) and the semantic web project. A combination of auto-mapping and RDF/OWL/SKOS should enable us to cross-navigate repositories using our own terminologies. This is the realisation of the “many perspectives, one repository” approach that should get round many problems of the subjective/objective divide. If you can’t agree on which viewpoint to adopt, why not have them all and save the arguments for the nuances of the mapping process. Within organisations this has immediate benefits, in removing a lot of politicking that surrounds information and knowledge management. However, there is also huge cultural potential when it comes to opening up public repositories and making them interoperate. The Europeana project is a good example.

Re-intermediating research

    Start a conversation 
Estimated reading time 2–2 minutes

A fine example of how much inspiration you can get from randomly talking to the people who are actually engaging with customers was given to me by our Research Guide last week.

She wants a video-tagging tool that includes chat functionality, some kind of interactive “pointing” facility, and plenty of metadata fields for adding and describing tags. When she is helping a customer to find the perfect bit of footage, she often finds herself in quite detailed discussions trying to explain why she thinks a shot meets their needs or in trying to understand what it is they don’t like about a particular scene. If they could both view the same footage in real time linked by some sort of online meeting functionality, they would be able to show each other what they meant and discuss and explain requirements far more easily and precisely.

This struck me as exactly how we should as information professionals be seizing new technologies to “re-intermediate” ourselves into the search process. Discussing bits of video footage is a particularly rich example, but what if an expert information professional could have a look at your search results and give you guidance via a little instant chat window? You could call up a real person to help you when you needed it without leaving your desk, in just the same way that online tech support chats work (I’ve had mixed experiences with those, but the principle is sound). I’m thinking especially of corporate settings, but wouldn’t it be a fantastic service for public libraries to offer?

It seems such a good idea I can’t believe it’s not already being done and would be very pleased to hear from anyone out there who is offering those sorts of services and in particular if there are any tools that support real time remote discussion around audio visual research.

‘We Like Lists Because We Don’t Want to Die’

    Start a conversation 
Estimated reading time 2–3 minutes

I heard Umberto Eco lecture on the search for a perfect language about 20 years ago and still find myself referencing him (trying to create a taxonomy that suits everyone would seem to be a similar quest). The lectures were nothing to do with my course really, so I benefited from that serendipitous knowledge discovery that just happens when you have time and space to explore ideas. So I was pleased when a few weeks ago this interview with Eco in der Spiegel happened upon me in the twittersphere (what’s the protocol for referencing tweets?). In the interview, Eco asserts that ‘We Like Lists Because We Don’t Want to Die’ .

It’s arguable that we do most things because we don’t want to die, but I was struck by the depiction of how fundamental the urge to collect and classify is to culture. At the LIKE dinner in early December, Cerys Hearsy said “we like hierarchies. We understand how they work” and she was talking about modern records management. Jan Wyllie in Taxonomies: Frameworks for Corporate Knowledge points out that taxonomies have been used for millennia (something I also reference frequently). Perhaps we like dualities because our brain has two hemispheres and we dream of a taxonomy of everything because then we would have conquered infinity and death itself, but such ideas are way beyond what I can speculate sensibly about. What I can say is that lists and taxonomies have been useful for so long that anyone who bets they are going to vanish anytime soon is facing very long odds. We will create them differently as technology advances, and we will manage without them in many situations where they would be helpful (if New Scientist had a taxonomy, I might have found the article about duality and the brain), but when we really need to be sure, we will create them.