Tag Archives: taxonomies


Keeping your Taxonomy Fresh and Relevant – SLA Chicago

    1 comment 
< 1 minute

Matt Johnson from EMC gave an extremely clear and useful presentation gave an overview of the taxonomy migration and revision project I have been working on for the past couple of years.

Matt and I were delighted to have such a big and lively audience for our session, especially as it was at 8 am! Thank you to everyone who joined us, to SLA’s Taxonomy division for organzing the session, to the session sponsor Gale Cengage Learning, and to Larry Lempert for moderating.

SLA Conference in Chicago

    Start a conversation 
Estimated reading time 3–5 minutes

Last month I had a wonderful time at the SLA (Special Libraries Association) conference in Chicago. I had never previously been to an SLA conference, even though there is a lively SLA Europe division. SLA is very keen to be seen as “not just for librarians” and the conference certainly spanned a vast range of information professions. The Taxonomy Division is thriving and there seem to be far more American than British taxonomists, which, although not surprising, was a pleasure as I don’t often find myself as one of a crowd! The conference has a plethora of receptions and social events, including the “legendary” IT division dance party.

There were well over 100 presentation sessions, as well as divisional meetings, panel discussions, and networking events that ranged from business breakfasts to tours of Chicago’s architectural sights. There was plenty of scope to avoid or embrace the wide range of issues and areas under discussion and I focused on taxonomies, Linked Data, image metadata, and then took a diversion into business research and propaganda.

I also thoroughly enjoyed the vendor demonstrations, especially the editorially curated and spam-free search engine Blekko, FastCase, and Law360 legal information vendors, and EOS library management systems.

My next posts will cover a few of the sessions I attended in more detail. Here’s the first:

Adding Value to Content through Linked Data

Joseph Busch of Taxonomy Strategies offered an overview of the world of Linked Data. The majority of Linked Data available in the “Linked Data Cloud” is US government data, with Life Sciences data in second place, which reflects the communities that are willing and able to make their data freely and publicly available. It is important to keep in mind the distinction between concept schemes – Dublin Core, FOAF, SKOS, which provide structures but no meanings – and semantic schemes – taxonomies, controlled vocabularies, ontologies, which provide meanings. Meanings are created through context and relationships, and many people assume that equivalence is simple and association is complex. However, establishing whether something is the “same” as something else is often far more difficult than simply asserting that two things are related to each other.

Many people also fail to use the full potential of their knowledge organization work. Vocabularies are tools that can be used to help solve problems by breaking down complex issues into key components, giving people ways of discussing ideas, and challenging perceptions.

The presentation by Joel Richard, web developer at the Smithsonian Libraries, focused on their botanic semantic project – digitizing and indexing Taxonomic Literature II. (I assume they have discussed taxonomies of taxonomy at some point!) This is a fifteen-volume guide to the literature of systemic botany published between 1753 and 1940. The International Association for Plant Taxonomy (IAPT) granted permission to the Smithsonian to release the work on the web under an open licence.

The books were scanned using OCR, which produced 99.97% accuracy, which sounds impressive but that actually means 5,000-12,000 errors – far too many for serious researchers. Errors in general text were less of a concern than errors in citations and other structured information, where – for example, mistaking an 8 for a 3 could be very misleading. After some cleanup work, the team next identified terms such as names and dates that could be parsed and tagged, and selected sets of pre-existing identifiers and vocabularies. They are continuing to look for ontologies that may be suitable for their data set. Other issues to think about are software and storage. They are using Drupal rather than a triplestore, but are concerned about scalability, so are trying to avoid creating billions of triples to manage.

Joel also outlined some of the benefits of using Linked Data, gave some examples of successful projects, and provided links to further resources.

Building, visualising and deploying taxonomies and ontologies; the reality – Content Intelligence Forum event

    Start a conversation 
Estimated reading time 1–2 minutes

I have been trying to get to the Content Intelligence Forum meetups for some time as they always seem to offer excellent speakers on key topics that don’t tend to get the attention they deserve, so I was delighted to be able to attend Stephen D’Arcy’s talk a little while ago on taxonomies and ontologies.

Stephen has many years of experience designing semantic information systems for large organisations, ranging from health care providers, to banks, to media companies. His career illustrates the transferability and wide demand for information skills.

His 8-point checklist for a taxonomy project was extremely helpful – Define, Audit, Tools, Plan, Build, Deploy, Governance, Documentation – as were his tips for managing stakeholders, IT departments in particular. He warned against the pitfalls of not including taxonomy management early enough in search systems design, and the problems that you can be left with if you do not have a flexible and dynamic way of managing your taxonomy and ontology structures. He also included a lot of examples that illustrated the fun aspects of ontologies when used to create interesting pathways through entertainment content in particular.

The conversation after the talk was very engaging and I enjoyed finding out about common problems that information professionals face, including how best to define terms, how to encourage clear thinking, and how to communicate good research techniques.


Review of The Accidental Taxonomist

    Start a conversation 
Estimated reading time 1–2 minutes

Rather late to the party on this one, but I finally got around to reading The Accidental Taxonomist by Heather Hedden. I have to confess to bias as I was very pleased to see that my FUMSI article on folksonomies was mentioned in the recommended reading section. Written in a clear, sensible and readable tone, Hedden gives a very thorough overview of practical taxonomy work. The book works as a textbook, but reads pleasantly and although I anticipate referring to it as a reference resource, I enjoyed reading it chapter by chapter.

I am very pleased to have so much practical and useful information in one place (for example lists of relevant standards, definitions of taxonomies, ontologies, and thesauri, the functions of taxonomies) as in my day-to-day work, I often have to explain the basics to people. I have been recommending the book to my team, especially those who are new to taxonomies, and they have appreciated its clarity and comprehensiveness as a “field guide”. It covered familiar ground, but for me much of that was my “tacit knowledge” that I had never fully articulated to myself, so I am sure that this “knowledge capture” from the mind of an experienced taxonomy practitioner will be very useful.


Many to many

Estimated reading time 2–2 minutes

A wise taxonomist once said to me “taxonomies are technology agnostic” and I’ve been thinking about why systems are not taxonomy agnostic. If you underpin a taxonomy with a thesaurus, can you use that to map one taxonomy to another, without altering either taxonomy? You can keep both taxonomies as metadata attached to your asset and expose one or the other depending on user choice. It’s just an interface issue. The mapping would enable cross navigation, so you could wander down one taxonomy, skip to another, then pop back to the first one if you wanted.

You could attach folksonomies too if you wanted to, and just store those as extra metadata.

I can see that there might be terminology issues that need resolving (no small task), or perhaps software or storage issues, but I can’t see why the system itself couldn’t work in theory.

I’ve spent a lot of time thinking about mediating stakeholder needs to get the best taxonomy, and that is still a valid approach when you need management and control, but I don’t see any reason not to attach other taxonomies to your core taxonomy. Those satellite taxonomies can then serve minority interests or specialised needs. As long as you collect metadata about your taxonomies and make it clear to your user the provenance of the taxonomy or folksonomy they are viewing, you can offer a range of viewpoints.

Perhaps I am missing something obvious, but it seems there is still debate about getting the best taxonomy, or choosing to implement one instead of another. That debate seems to be based on the presumption that you can only have one taxonomy at a time, but why not have lots?