The Taxonomy Guide – Bibliography is a collection of core taxonomy reference materials, compiled by the University of Toronto.
The Taxonomy Guide – Bibliography is a collection of core taxonomy reference materials, compiled by the University of Toronto.
The Taxonomy Tango discusses a multi-taxonomy management method and tool invented by Mobile Content Networks. “People are just getting comfortable with their own taxonomies and now they are realizing the world is full of taxonomies”, MCN CTO Phyllis Reuther is quoted as saying.
“MCN Query Broker and Taxonomy Engine enables MobileSearch.net to make real-time queries to any number of relevant content sources, return results from those sources, and then group, sort, and rank results according to advanced algorithms and partner rules”, according to the MCN website.
It would be interesting to know more about the rules that power this. MCN provide some pretty diagrams, which say things like it classifies, it sorts, but only question marks where the “magic ” happens!
Pace layering in ia is a paper by D. Grant Campbell and Karl V. Fast from the Faculty of Information and Media Studies, University of Western Ontario. They bring pace layering theories from ecology and environmental science into information architecture, viewing ia as an “ecology”. Basically, ecologists have noted that events occurring over different timescales interact to affect an environment – something like the lowering of the water table would be a slow event, but a flash flood would be a fast event. Only by looking at the ways these differently “paced layers” interact, can you predict how the local environment will respond. They propose that the underlying ia of website, with taxonomies and embedded nvigation structures etc., is a slow layer but that folksonomies bubble away as a fast layer of the site, changing rapidly and responding quickly. They suggest that the most stable structure will be one that can accommodate both fast-moving and slow-moving layers and that the slow layer must be robust and flexible enough to adjust itself to pressures from the fast layer.
I don’t think I have grasped all the implications of this, but my first impression is that it fits well with the “best of both worlds” approach – encouraging social tagging but not relying on it for critical information management, while using the folksonomic tags as feedback for updating and reviewing taxonomies.
The ISKO event at UCL on Thursday was fascinating. It was a real treat to hear the eminent Brian Vickery summarise the last 75 years of information retrieval developments, setting out the key questions to be answered and the challenges still to be overcome. At 90 years old he has a unique overview, having been a key member of the Classification Research Group and director of SLAIS. He pointed out that most retrieval systems have a particular user community in mind and that this affects the choice of information collected as well as the way the collection is structured. He also argued that being accepted as part of a specialist community involves use of the specialist terminology. I am very interested in the reverse of this – that lack of access to the “rght” terminology is exclusionary. It’s all about shibboleths! He said that key questions at the moment include – whether the costs and effort of building expensive retrieval systems like taxonomies are justified, whether the need for harmonisation is increasing, what is the future for general ontologies, and what needs to be done to improve statistical retrieval systems.
Stephen Robertson from Microsoft Research, who developed search algorithms that still power most of the big search engines today, talked about the TREC competition, which has almost always been won by statistically based searches. He drew a distinction between general purpose search and specialised search for highly specific contexts – such as individual organisations – adding that in general specialist search is lagging behind. He also said that we need to find ways of feeding other sources of knowledge – such as taxonomies – into statistical searching because only by yoking the power of both will we get marked improvements.
Ian Rowlands then talked about the much publicised JISC survey on the “Google generation” concluding that they are much the same as other generations. In all age groups about 20% are expert users of technology and 20% technophobes, with everyone else muddling along in the middle. The JISC project team observed that some people spend a long time looking at online navigation systems, sometimes without accessing any articles at all. It is hard to know whether this counts as success or failure. I can think of scenarios either way – often I just want to know what’s there and will return later, sometimes it means I can rule out a source as useless (which might be a good thing if it has saved me the time of reading through irrelevant articles or might be a bad thing if it means I can’t find what I need).
There was then a very interesting discussion in which people expressed concerns about information overload and the way that students find it hard to distinguish between authoritative and trivial sources. Ian lamented the fact that online you don’t have the visual clues that you had in physical libraries – big chunky leather bound books have an obvious “weight” and authority. Personally, I wonder how much this has been driven by the desire of publishers and teachers to make educational resources “fun”. If all your text books look like adverts and all your online learning resources look like pop videos, how are you going to learn which is which? It is perfectly possible to have an authoritative online style and publishers will produce it if that is what sells best. Throughout my career I have urged “authoritativeness” in design and been told by marketing departments that it isn’t what parents, teachers and kids want – they’ll only buy it if it looks flashy and fluffy! Another issue is the lack of a canon in a post-modern world – but that’s another story!
Here’s a post on the event on Madi Solomon’s Taxonomy Society blog.
Language and Social Identity is a collection of fascinating sociolinguistic papers. Dealing with gender and ethnicity, the researchers seek to show how stereotypes often arise from simple linguistic misunderstandings. For example, one paper argues that speakers of Indian English tend to use pronouns, conjunctions, and intonation very differently to speakers of UK English. UK speakers typically fail to pick up on the Indian English speakers’ cues and assume that what they are saying is confused or incoherent. Conversely, Indian English speakers think the UK English speakers must be either daft or extremely patronising because of their apparent failure to understand very simple logic. Another paper claims that men and women typically use utterances like “mm hmm” to mean different things. Women mean simply “I’m listening”, whereas men mean emphatically “I agree”. Men then think that women keep changing their minds and women think men just aren’t listening!
The most relevant paper from a taxonomic point of view was one on the highly charged political nature of language use in Montreal. The need to cut across language differences and negotiate norms of communication when diverse groups feel they have something to lose through compromise mirrors the inter-departmental language mediation that usually needs to happen in taxonomy projects.
Last night I went to the ERBI IT special interest group meeting on text mining. It was a real treat. Richard Kidd from the Royal Society of Chemistry opened by describing their award-winning Prospect project which applies semantic web technologies to primary research publishing. Essentially, along with the Sciborg project they have developed software to identify chemical entities using text mining and ontologies, which provides rich sources of links and metadata and helps their editors validate texts. There is a fantastic tool called OSCAR that can extract all sorts of information from chemistry texts. Taxonomies and ontologies plug in to these tools and systems to provide the base data. Richard stressed the need for a taxonomy to be a living thing that keeps up with terminology changes, and also talked about the way the RSC use “Tiny Ontologies All Strung Together (TOAST)” as there is no over-arching comprehensive chemistry ontology.
Phil Hastings then gave a summary of the work of Linguamatics, who have developed text-mining software for life sciences. They use natural language processing to allow “relationship searching” and the construction of complex queries, offering more sophisticated answers than can be provided by keyword searches across flat text by conventional search engines. They too use “bolt-on” taxonomies and ontologies that provide a sort of deep reference layer.
Julie Barnes from Biowisdom provided some practical examples of how “assertional metadata” can be used to help drug developers and clinicians assess the likely toxicity of certain compounds, side effects, etc. By focusing on creating high-quality metadata containing information about relationships, rather than just about the item itself, relationships and associations can easily be highlighted, helping pharmacologists to pick out key correlations from the huge oceans of data available. I particularly liked her contention that “the name or label we give something sometimes holds us in a dogma that stops us seeing something new” and that using metadata to surface relationships can bring up unexpected links and so lead to shifts in thinking and new discoveries.
Finally, the esteemed Dr Peter Murray-Rust from the University of Cambridge talked in more detail about his development of OSCAR and Chemical Markup Language (CML), an extension of XML. He stressed the need for annotation standards in markup to minimise ambiguity and that as humans rarely reach more than 90% agreement over ontological issues, it is unlikely that any software vendors claiming their product can do better will meet the challenge. However, he also made the point that “if we can communicate well, we can communicate both to humans and machines”.
This bears out my experience in reference publishing. We always used a mixture of automated and human processing, with the software doing the “heavy lifting” and the editors tidying up the anomalies and absurdities by hand afterwards. I think it will be a long time before we find something better than this “best of both worlds” approach. We also aim for consistent modes of expression to facilitate searching, databasing, and comparability. It is possible to use a rules-based approach to writing and still produce something that sounds natural and is easy to read. Classic formats, such as methodologies for writing up experiments, are a typical example of consistent structuring.
Taxonomies and thesauri: a list of references and resources for public sector applications is a bibliography compiled by Stella Dextre Clarke as an essential reading list for taxonomists working with UK government data. Plenty of reading material here, well organised (as one would expect!) into helpful categories.
Bibliography by Taxonomy Strategies. A carefully constructed list of useful papers, books, and articles, most available on line, covering taxonomy, information architecture, metadata, knowledge management, etc. This lot will keep me busy over Easter!
The Essentials of Metadata and Taxonomy Conference in London on March 10th was a first for event organisers Henry Stewart Events. They were told that the subject was “too niche” , “no-one would turn up”, and “noboby would be interested”. They were not dissuaded, and went ahead with what turned out to be a wonderfully content-rich and fact-dense day. I’ve written a summary of the conference which is available here.
A host of big name speakers (Madi Solomon former Corporate Nomenclature Taxonomist of Walt Disney, Seth Earley of Earley & Associates, John Jordan of Siemens, Chris Sizemore and Silver Oliver from the BBC were just a few) gave fascinating and insightful talks. There were also lots of software overviews which I found very helpful (including an assessment by Theresa Regli from CMS Watch) and as is always a real treat at these events the opportunity to meet lots of other taxonomists and information architects. The food was good too!
Taxonomy – some guidelines for effective design of taxonomies offers a few basic tips on taxonomy development. There are also links to how the Drupal CMS handles taxonomy.