Taxonomy and Glossaries for Enterprise Search Terminology – Enterprise Search Practice Blog has a handy little glossary from indexer and heavy user of controlled vocabularies Lynda Moulton (via Taxonomy Watch).
Taxonomy and Glossaries for Enterprise Search Terminology – Enterprise Search Practice Blog has a handy little glossary from indexer and heavy user of controlled vocabularies Lynda Moulton (via Taxonomy Watch).
Having spent years working as an editor fussing over consistency of style and orthography, I shouldn’t have been as surprised as I was to find my tags on even this little blog site, written solely by me, had already become a mess. It didn’t take too long to tidy them up, but there are only a handful of articles here so far.
I worked with some extremely clever people in my first “proper” job back in the 90s, and we used to have a “90%” rule regarding algorithmic-based language processing (we mostly processed very well-structured text). However brilliant your program, you’d always have 10% of nonsense left over at the end that you needed to sort out by hand – mainly due to the vagaries of natural language and general human inconsistency. I’m no expert on natural language processing, but I get the impression that a lot of people still think 90% is really rather good. Certainly auto-classification software seems to run at a much lower success rate, even after manual training. It strikes me that there’s a parallel between folksonomies and this sort of software. Both process a lot of information on cheaply, so make possible processing on a scale that just couldn’t be done before, but you still need someone to tidy up around the edges if you want top quality.
I think the future of folksonomies depends on how this tidying-up process develops. There are various things happening to improve quality – like auto-complete predictive text. Google’s tag game is another approach, and ravelry.com use gentle human “shepherding” of taggers, personally suggesting tags and orthography (thanks to Elizabeth for pointing this one out to me).
I would really like to get hold of some percentages. If 75% is a decent showing for off-the peg auto-categorisation/classification software, and we could get up to 90% with bespoke algorithms processing structured text, what perecentages can you expect from a folksonomic approach?
I’m still mulling over Helen Longino’s criteria for objectivity in scientific enquiry (see previous post: Science as Social Knowledge) and it occurred to me that folksonomies are not really open and democratic, but are actually obscure and impenetrable. The “viewpoint” of any given folksonomy might be an averaged out majority consensus or some other way of aggregating tags might have been used, and so you can’t tell if it is skewed by a numerically small but prolifically tagging group. This is the point Judith Simon made in relation to ratings and review software systems at the ISKO conference, but it seems to me the problem for folksonomies is even worse, because of the echo chamber effect of people amplifying popular tags. Without some way of showing who is tagging what and why, the viewpoint expressed in the folksonomy is a mystery. This is not necessarily the case, but I think you’d need to collect huge amounts of data from every tagger, then database it along with the tags, then run all sorts of analyses and publish them in order to show the background assumptions driving the majority tags.
If the folksonomic tags don’t help you find things, who could you complain to? How do you work out whether it doesn’t help you because you are a minority, or for some other reason? With a taxonomy, the structure is open – you may not like it but you can see what it is – and there will usually be someone “in charge” who you can challenge and criticise if you think your perspective has been overlooked. In many case the process of construction will be known too. I don’t see an obvious way of challenging or criticising a folksonomy in this way, so presumably it fails Longino’s criteria for objectivity.
You can just stick your own tags into a folksonomy and use them yourself so there is some trace of your viewpoint in there, but if the rest of the folksonomy doesn’t help you search, that means you can only find things once you have tagged them yourself, which would presumably rule out large content repositories. So, you have to learn and live with the imposed system – just like with a taxonomy – but it’s never quite clear exactly what that system is.
I can’t help thinking the information world has become very morbid. There was Green Chameleon’s Dead KM Walking debate, CMS Watch’s Taxonomies are dead punt, and now keyword search is dead, according to the Enterprise Search Center (via Taxonomy Watch).
Stephen Arnold says “Established system vendors and newcomers promise silver bullets that will kill the werewolves plaguing enterprise search. Taxonomies resonate in some vendors’ marketing spiels. Others focus on natural language processing… ” This makes taxonomies sound like they are some new fangled techie trick, rather than the traditional sorting out we’re all used to. He then states that users expect “a search system to … Offer a web page that gives users specific suggestions and options with hotlinks to topics, categories, and key subjects … provide the user with point and-click options … Allow the user to drill down or jump across topics.” Are those not taxonomies for navigation?
I thoroughly enjoyed Science as Social Knowledge by the US philosopher Helen Longino. It was recommended to me by Judith Simon, a very smart researcher I met at the ISKO conference in Montreal last summer. She researches trust and social software and suggested that Longino’s analysis of objectivity would be helpful to me. It took me a while to get settled with the book, but I recognised an essentially Wittgensteinian take on the notion of shared meaning. Longino works this into a set of principles for establishing degrees of objectivity in scientific enquiry. If I have grasped it all correctly, she basically says that although there is no such thing as “ideal” objectivity – a one true perspective up in the sky – we do not have to collapse into an “anything goes” relativism. We can accept that background assumptions can be challenged and change, and embed the notion of challenge and criticism into the heart of scientific enquiry itself. That establishes a self-regulating system that is more or less objective, depending on how open it is to criticism and how responsive it is to legitimate challenges. Objectivity arises out of the process of consensus-building in an open, reflective, and self-challenging community.
Applying this to taxonomy work appears to mean that the process of taxonomy building can be more or less objective, depending on how open the process is to the community and to adapting to legitimate challenges or complaints. This seems to be very much like the practical advice offered by taxonomists expressed in terms of “get user buy-in”, “consult all stakeholders”, “ensure that you consider all relevant viewpoints”, or “ensure that you have regular reviews and updates”, so it’s reassuring to know we are basically epistemologically valid in our methods!
I haven’t read the book yet, but this blog post on screen templates presents 12 basic layouts and the sort of information they work best with. It could be a useful checklist if you want to manage or rationalise presentations across a large website, especially one that has evolved organically and could do with tidying up. The templates are simple (seasoned designers won’t find much they don’t know already) but could be handy for anyone new to web design and layout who wants some “off the peg” styles to get them started.
Thanks to Rey for the link!
I’ve been studying usability evaluation methods (UEMs), which although not directly related to taxonomy work, are relevant for anyone involved in information architecture (IA). I was surprised at how controversial a subject usability is, having assumed that everyone wants their sites to be as usable as possible. However, assessing usability does involve a lot of judgement calls and tradeoffs, which is one reason why some people seem to take against it.
You have to decide who you are going to focus your usability testing on, perhaps choosing a “core user group” rather than trying to please everybody. You have to decide what aspects of usability you are going to focus on – for example accessibility (everybody should be following minimum W3C standards anyway), but you might legitimately decide that you are not going to worry about making your site easy for children to read (e.g. if it is a postgraduate discussion forum). Then you need to decide if you are going to try to make individual tasks as efficient as possible (e.g. not using as many keystrokes) or look at the site as a whole (e.g. a social networking site might place a higher value on being fun and funky over being efficient to use).
Once you have decided who your target users are and what aspect of usability you are most interested in, you can choose a testing method. There seem to be over 100 different methods out there, ranging from fairly straightforward ones like Jakob Nielsen’s Heuristic Evaluation – which gives you a checklist of things to look at, and even “expert inspection” where you just look at the site to try to find potential problems. These methods assume you know quite a lot about what makes a site usable or not.
You could do an experiment, where you set up a task or scenario and measure people’s performance at it. This is often described as laboratory testing, but you can have a “lab” that is just you, a notebook, and a computer for your participants. This sort of test is great if you have one specific function (e.g. an ecommerce function) and you want to check that people can follow the steps easily.
The methods I liked the most were the more abstract conceptual methods, like CASSM, where you try to get a picture of users’ expectations and then compare them with the website to see where there are gaps or conflicts.
Interestingly, the literature shows that for all methods there is a marked “evaluator effect“, with different evaluators getting different results even when using an identical process. I think this is because there is so much interpretation at all stages. The closest you’d get to a “scientific” set of original data would be to set up a carefully controlled usability lab test, but even then translating the results into redesign suggestions is really an art, not a science.
There also seems to be a “political correctness gone mad” brigade who say that accessibility means you can’t have any pictures on your site and that Jakob Nielsen’s site looks horrible and out of date. I think this is a misunderstanding of what usability is all about. Usability is about making a site easier for everyone to use, and accessibility isn’t about leaving features out because certain people can’t use them, it’s about providing a “Plan B” for anyone who doesn’t use the site in the way you expect. So, it seems to me that it’s fine to including fancy visualisations, as long as you also provide a text description for people who can’t see them, or a tricksy javascript feature as long as you include an alternative for browsers that don’t have javascript enabled. Nielsen’s site is old fashioned, but that doesn’t mean it is the only way to create a usable site. The BBC have aesthetically pleasing modern sites that are also well crafted for accessibility.
It is true that there are tradeoffs and quite a lot of art rather than science in usability evaluation, but I think there is a moral (not to mention legal in the UK – not sure about elsewhere) imperative to at least try to be inclusive and in most cases it is simply poor marketing to shut out or make life difficult for potential customers.
I’ve been mulling over what to say about CMS Watch’s “Taxonomies are Dead” teaser, but defer to Patrick Lambe of Green Chameleon, who has written a very good post in response: Organising Knowledge » What Are We?.
One thought of my own is that there seems to be increasing differentiation between taxonomy creators and implementers (which I take as a sign that taxonomies are thriving rather than dying). I’ve always been on the content side of things, so I see knowledge organisation as primary, and the technology you use as secondary. However, more and more it seems to be the case that people understand the word “taxonomist” to mean someone who is a sort of Sharepoint sysadmin.
Truevert: What is semantic about semantic search? is an easy introduction to the thinking behind the Truevert semantic search engine. I was heartened by the references to Wittgenstein and the attention Truevert have paid to the work of linguists and philosophers. So much commercial search seems to have been driven by computer scientists with little interest in philosophy, or if they did they kept quiet about it (any counter examples out there?)! Perhaps philosophers have not been so good at promoting themselves either. Perhaps the Chomskyian attempt to divide linguistics itself into “hard scientific” linguistics and “fuzzy” linguistic disciplines like sociolinguistics has not helped.
As a believer in interdisciplinary and collaborative approaches, I have always wondered why we seemed to be so bad at building these bridges and information science has always struck me as a natural crossing point. Of course, there has been a lot of collaboration, but my impression is that academia has been rather better at this than the commercial world, with organisations like ISKO UK working hard to forge links. Herbert Roitblat at Truevert is obviously proud of their philosophical and linguistic awareness, and more interestingly, thinks it is worth broadcasting in a promotional blog post.
Taxonomy and Records Management « Not Otherwise Categorized… is a blog post I wish I’d read a year ago when studying a records management module for my Masters. A lot of people seemed to think it was strange that I had chosen the RM option and I couldn’t understand why the records managers didn’t talk more about taxonomy. Of course, taxonomists often work on records management systems in one form or another, and are happy to discuss the differences between taxonomy as file plan, taxonomy for RM, taxonomy as classification, taxonomy for navigation, and so on.
I think it shows that there is really very little widespread understanding of what a taxonomy is. People assume it is something mysterious and technical in the heart of whichever system they encountered one in first and don’t realise that taxonomies crop up all over the place. It’s not even very easy to find an “official” definition.
Alan Gilchrist and Barry Mahon in Information Architecture: Designing Information Environments for Purpose say “TFPL takes the view that a ‘corporate taxonomy’ can be viewed as an enterprise-wide master file of the vocabularies and their structures, used or for use, across the enterprise, and from which specific tools may be derived for various purposes, of which navigation and search support are the most prominent.”
Patrick Lambe in Organising Knowledge: Taxonomies Knowledge and Organisational Effectiveness describes taxonomies as taking many forms, including “lists, trees, hierarchies, polyhierarchies, matrices, facets, system maps” and Vanda Broughton in Essential Classification points out that taxonomy is now often taken to mean “any vaguely structured set of terms in a subject area”.
Settling on a single, popular definition of taxonomy might help promote taxonomists and taxonomy work, but as taxonomies need to do so much in so many different contexts, there just might not be a simple definition that works. Perhaps we need a taxonomy of taxonomies!