Language, thought, categorisation, and talking to yourself

    Start a conversation 
Estimated reading time 3–5 minutes

The Voice of Reason (or What’s in a name? online) is a fascinating article by David Robson in New Scientist on one of my favourite topics – how language affects the way we think. The “linguistic relativity” theory of Edward Sapir and Benjamin Whorf – one of my favourite hypotheses – is blamed for the “fall from grace” of the idea that language shapes thought. The work of Eleanor Rosch – one of my favourite psychologists – on categorisation appeared to contradict the Sapir-Whorf hypothesis, by showing that categorisation rests more on physiological characteristics of humans – how we see, what size things are, whether or not something is edible – than on the names we have for things.

Noam Chomsky’s quest for a universal grammar made the notion that language and thought were essentially common to all humanity more popular than linguistic relativity. However, psychologists have started to note that having names for categories helps infants put things into those categories. Children’s spatial reasoning also seems to be improved when you remind them of spatial vocabulary (Dedre Gentner, Northwestern University, Evanston , Illinois: Cognitive Psychology, vol 50, p 315). People instinctively teach children by reminding them of what category words like “top”, “middle” and “bottom” mean. An experiment with “aliens” indicated that when people were given names for types of aliens they categorised them more quickly and accurately than when they weren’t given the names (Gary Lupyan, University of Wisconsin, Madison: Psychological Science vol 18, p 1077).

Although the strong version of the Sapir-Whorf hypothesis – that language dictates and constrains thought – appears unlikely to be true, on the grounds that you could never have a new idea or create a new category – the “weak” version – that having those words available will encourage people to think in those terms seems very plausible. An experiment has now indicated that Russian speakers – who have two different words for shades of blue – are faster at sorting out those shades than English speakers (Lera Boroditsky, Stanford University, California Proceedings of the National Academy of Sciences, vol 104, p 7780).

Labelling objects helps the memory take “shortcuts” so that minor details do not have to be remembered (Lupyan Journal of Experimental Psychology: General vol 137, p 348). Political activists in many areas have argued that language use encourages stereotyping – hence the attempts to break down stereotypes by changing names for groups. However, when applied to something like sets of documents, not bothering to see them all as individuals can be a useful shortcut. If you want to build a user-friendly taxonomy, using the categories people know and like will make your system quicker and easier to use. Of course they could learn other ways of categorising – they could break the stereotypes – if they spent a bit of time and effort thinking it all through – but in many contexts the job of the taxonomist is to give people what they want quickly and efficiently, not to enter into debates about whether or not they conceptualise things in the most politically appropriate way.

Language has also been shown to affect perception. If you use upwards-moving words (climb, rise, etc.) while showing people patterns of randomly moving dots, they are more likely to correctly detect the predominant direction of movement if the words match the direction (Psychological Science, vol 18 p 1007). Conversely, showing people upwardly-moving dots while saying “fall” confused them. The words seem to “prime” the visual system of the brain.

Another effect is that it is easier to see something if you say the name – so it really does help when you are looking for something to mutter the name of the object to help you find it. According to Andy Clark, a philosopher at the University of Edinburgh, language was the original form of “augmented reality” – “an overlay that changes how we think, reason and see”.

Content Identifiers for Digital Rights Persistence

    4 comments 
Estimated reading time 4–6 minutes

This is another write-up from the Henry Stewart DAM London conference.

Identity and identification

Robin Wilson discussed the issue of content identifiers, which are vitally important for digital rights management, but yet tend to be overlooked. He argued that although people become engaged in debates about titles and the language used in labels and classification systems, people overlook the need to achieve consensus on basic identification.

(I was quite surprised, as I have always thought that people would argue passionately about what something should be called and how using the wrong terminology affects usability, but that they would settle on machine-readable IDs quite happily. Perhaps it is the neutrality of such codes that makes the politics intractable. If you have invested huge amounts of money in a database that demands certain codes, you will argue that those codes are used by everyone else to save you the costs of translation or acquiring a compatible system, and there are no appeals to usability, or brokerage via editorial policy, that can be made. It simply becomes a matter of whoever shouts the loudest gets to spend the least money in the short term. )

Robin argued that the only way to create an efficient digital marketplace is to have a trusted authority oversee a system of digital identifiers that are tightly bound within the digital asset, so they cannot easily be stripped out even when an asset is divided, split, shared, and copied. The authority needs to be trusted by consumers and creators/publishers in terms of political neutrality, stability, etc.

(I could understand how this system would make it easier for people who are willing to pay for content to see what rights they need to buy and who they should pay, but I couldn’t see how the system could help content owners identify plagiarism without an active search mechanism. Presumably a digital watermark would persist throughout copies of an asset, provided that it wasn’t being deliberately stripped, but if the user simply decided not to pay, I don’t see how the system would help identify rights breaches. Robin mentioned in conversation Turnitin’s plagiarism management, which has become more lucrative than their original work on content analysis, but it requires an active process instigated by the content owner to search for unauthorised use of their content. This is fine for the major publishers of the world, who can afford to pay for such services, but is less appealing to individuals, whether professional freelances or amateur content creators, who would need a cheap and easy solution that would alert them to breaches of copyright without their having to spend time searching.)

The identifiers themselves need to be independent of any specific technology. At the moment, DAM systems are often proprietary and therefore identifiers and metadata cannot easily flow from one system to another. Some systems even strip away any metadata associated with a file on import and export.

Robin described five types of identifier currently being used or developed:

  • Uniform Resource Name (URN)
  • Handle System
  • Digital Object Identifier
  • Persistent URL (PURL)
  • ARK (Archival Resource Key).

He outlined three essential qualities for identifiers – that they be unique, globally registered, and locally resolved.

So why don’t we share?

Robin argued that it is easier for DAM vendors to build “safe” systems that lock all content within an enterprise environment, only those with a public service/archival remit tend to be collaborative and open. DAM vendors resist a federated approach online and prefer to use a one-to-one or directly intermediated transaction model. Federated identifier management services exist but vendors and customers don’t trust them. The problem is mainly social, not technological.

One of the problems is agreeing to share the costs of services, such as infrastructure, registration and validation, governance and development of the system, administration, and outreach and marketing.

(Efforts to standardise may well benefit the big players more than the small players and so there is a strong argument for them bearing the initial costs and offering support for smaller players to join. Once enough people opt in, the system gains critical mass and it becomes both easier to join and costs of joining become less of an unquantifiable risk – you can benefit from the experiences of others. The semantic web is currently attempting to acquire this “critical mass”. As marketers realise the potential of semantic web technology to make money, no doubt we will see an upsurge in interest. Facebook’s “like” button may well be heralding the advent of the ad-driven semantic web, which will probably drive uptake far faster than the worthy efforts of academics to improve the world by sharing research data!)

Are you a semantic romantic?

    Start a conversation 
Estimated reading time 8–12 minutes

The “semantic web” is an expression that has been used for long enough now that I for one feel I ought to know what it means, but it is hard to know where to start when so much about it is presented in “techspeak”. I am trying to understand it all in my own non-technical terms, so this post is aimed at “semantic wannabes” rather than “semantic aficionados”. It suggests some ways of starting to think about the semantic web and linked open data without worrying about the technicalities.

At a very basic level, the semantic web is something that information professionals have been doing for years. We know about using common formats so that information can be exchanged electronically, from SGML, HTML, and then XML. In the 90s, publishers used “field codes” to identify subject areas so that articles could be held in databases and re-used in multiple publications. In the library world, metadata standards like MARC and Dublin Core were devised to make it easier to share cataloguing data. The semantic web essentially just extends these principles.

So, why all the hype?

There is money to be made and lost on semantic web projects, and investors always want to try to predict the future so they can back winning horses. The recent Pew Report (thanks to Brendan for the link) shows the huge variety of opinions about what the semantic web will become.

On the one extreme, the semantic evangelists are hoping that we can create a highly sophisticated system that can make sense of our content by itself, with the familiar arguments that this will free humans from mundane tasks so that we can do more interesting things, be better informed and connected, and build a better and more intelligent world. They describe systems that “know” that when you book a holiday you need to get from your house to the airport, that you must remember to reschedule an appointment you made for that week, and that you need to send off your passport tomorrow to renew it in time. This is helpful and can seem spookily clever, but is no more mysterious than making sure my holiday booking system is connected to my diary. There are all sorts of commercial applications of such “convenience data management” and lots of ethical implications about privacy and data security too, but we have had these debates many times in the past.

A more business-focused example might be that a search engine will “realise” that when you search for “orange” you mean the mobile phone company, because it “knows” you are a market analyst working in telecoms. It will then work out that documents that contain the words “orange” and “fruit” are unlikely to be what you are after, and so won’t return them in search results. You will also be able to construct more complex questions, for example to query databases containing information on tantalum deposits and compare them with information about civil conflicts, to advise you on whether the price of mobile phone manufacture is likely to increase over the next five years.

Again, this sort of thing can sound almost magical, but is basically just compiling and comparing data from different data sets. This is familiar ground. The key difference is that for semantically tagged datasets much of the processing can be automated, so data crunching exercises that were simply too time-consuming to be worthwhile in the past become possible. The evangelists can make the semantic web project sound overwhelmingly revolutionary and utopian, especially when people start talking in sci-fi sounding phrases like “extended cognition” and “distributed intelligence”, but essentially this is the familiar territory of structuring content, adding metadata, and connecting databases. We have made the cost-benefit arguments for good quality metadata and efficient metadata management many times.

On the other extreme, the semantic web detractors claim that there is no point bothering with standardised metadata, because it is too difficult politically and practically to get people to co-operate and use common standards. In terms familiar to information professionals, you can’t get enough people to add enough good quality metadata to make the system work. Clay Shirky in “Ontology is overrated” argued that there is no point in trying to get commonalty up front, it is just too expensive (there are no “tag police” to tidy up), you just have to let people tag randomly and then try to work out what they meant afterwards. This is a great way of harvesting cheap metadata, but doesn’t help if you need to be sure that you are getting a sensible answer to a question. It only takes one person to have mistagged something, and your dataset is polluted and your complex query will generate false results. Shirky himself declares that he is talking about the web as a whole, which is fun to think about, but how many of us (apart from Google) are actually engaged in trying to sort out the entire web? Most of us just want to sort out our own little corner.

I expect the semantic web to follow all other standardisation projects. There will always be a huge “non-semantic” web that will contain vast quantities of potentially useful information that can’t be accessed by semantic web systems, but that is no different from the situation today where there are huge amounts of content that can’t be found by search engines (the “invisible web” or “dark web”) – from proprietary databases to personal collections in unusual formats. No system has been able to include everything. No archive contains every jotting scrawled on a serviette, no bookshop stocks every photocopied fanzine, no telephone directory lists every phone number in existence. However, they contain enough to be useful for most people most of the time. No standard provides a perfect universal lingua franca, but common languages increase the number of people you can talk to easily. The adoption of XML is not universal, but for everyone who has “opted in” there are commercial benefits. Not everybody uses pdf files, but for many people they have saved hours of time previously spent converting and re-styling documents.

So, should I join in?

What you really need to ask is not “What is the future of the semantic web?” but “Is it worth my while joining in right now?”. How to answer that question depends on your particular context and circumstances. It is much easier to try to think about a project, product, or set of services that is relevant to you than to worry about what everyone else is doing. If you can build a product quickly and cheaply using what is available now, it doesn’t really matter whether the semantic web succeeds in its current form or gets superseded by something else later.

I have made a start by asking myself very basic questions like:

  • What sort of content/data do we have?
  • How much is there?
  • What format is it in at the moment?
  • What proportion of that would we like to share (is it all public domain, do we have some that is commercially sensitive, but some that isn’t, are there data protection or rights restrictions)?

If you have a lot of data in well-structured and open formats (e.g. XML), there is a good chance it will be fairly straightforward to link your own data sets to each other, and link your data to external data. If there are commercial and legal reasons why the data can’t be made public, it may still be worth using semantic web principles, but you might be limited to working with a small data set of your own that you can keep within a “walled garden” – whether or not this is a good idea is another story for another post.

A more creative approach is to ask questions like:

  • What content/data services are we seeking to provide?
  • Who are our key customers/consumers/clients and what could we offer them that we don’t offer now?
  • What new products or services would they like to see?
  • What other sources of information do they access (users usually have good suggestions for connections that wouldn’t occur to us)?

Some more concrete questions would be ones like:

  • What information could be presented on a map?
  • How can marketing data be connected to web usage statistics?
  • Where could we usefully add legacy content to new webpages?

It is also worth investigating what others are already providing:

  • What content/data out there is accessible? (e.g. recently released UK government data)
  • Could any of it work with our content/data?
  • Whose data would it be really interesting to have access to?
  • Who are we already working with who might be willing to share data (even if we aren’t sure yet what sort of joint products/projects we could devise)?

It’s not as scary as it seems

Don’t be put off by talk about RDF, OWL, and SPARQL, how to construct an ontology, and whether or not you need a triple store. The first questions to ask are familiar ones like who you would like to work with, what could you create if you could get your hands on their content, and what new creations might arise if you let them share yours? Once you can see the semantic web in terms of specific projects that make sense for your organisation, you can call on the technical teams to work out the details. What I have found is that the technical teams are desperate to get their hands on high quality structured content – our content – and are more than happy to sort out the practicalities. As content creators and custodians, we are the ones that understand our content and how it works, so we are the ones who ought to be seizing the initiative and starting to be imaginative about what we can create if we link our data.

A bit of further reading:
Linked Data.org
Linked Data is Blooming: Why You Should Care
What can Data.gov.uk do for me?

Using taxonomies to support ontologies

    9 comments 
Estimated reading time 4–6 minutes

What is an ontology?
Ontologies are emerging from the techie background into the knowledge organisation foreground and – as usually happens – being touted as the new panacea to solve all problems from content management to curing headaches. As with any tool, there are circumstances where they work brilliantly and some where they aren’t right for the job.

Basically, an ontology is a knowledge model (like a taxonomy or a flow chart) that describes relationships between things. The main difference between ontologies and taxonomies is that taxonomies are restricted to broader and narrower relationships whereas ontologies can hold any kind of relationship you give them.

One way of thinking about this is to see taxonomies as vertical navigation and ontologies as horizontal. In practice, they usually work together. When you add cross references to a taxonomy, you are adding horizontal pathways and effectively specifying ontological rather than taxonomical relationships.

The flexibility in the type of relationship that can be defined is what gives ontologies their strength, but is also their weakness in that they are difficult to build well and can be time consuming to manage because there are infinite relationships you could specify and if you are not careful, you will specify ones that keep changing. Ontologies can answer far more questions than taxonomies, but if the questions you wish to ask can be answered by a taxonomy, you may find a taxonomy simpler and easier to handle.

What are the differences between taxonomies and ontologies?
A good rule of thumb is to think of taxonomies as being about narrowing down, refining, and zooming in on precise pieces of information and ontologies as being about broadening out, aggregating, and linking information. So, a typical combination of ontologies and taxonomies would be to use ontologies to aggregate content and with taxonomies overlaid to help people drill down through the mass of content you have pulled together.

Ontologies can also be used as links to join taxonomies together. So, if you have a taxonomy of regions, towns, and villages and a taxonomy of birds and their habitats you could use an ontological relationship of “lives in” to show which birds live in which places. By using a taxonomy to support the ontology, you don’t have to define a relationship between every village and the birds that live there, you can link the birds’ habitats to regions via the ontology and the taxonomy will do the work of including all the relevant villages under that region.

Programmers love ontologies, because they can envisage a world where all sorts of relationships between pieces of content can be described and these diverse relationships can be used to produce lots of interesting collections of content that can’t easily be brought together otherwise. However, they leave it to other people to provide the content and metadata. Specifying all those relationships can be complicated and time-consuming so it is important to work out in advance what you want to link up and why. A good place to start is to choose a focal point of the network of relationships you need. For example, there are numerous ways you could gather content about films. You could focus on the actors so you can bring together the films they have appeared in to create content collections describing their careers, or focus on genres and release dates to create histories of stylistic developments, or you could link films that are adaptations of books to copies of those books. The choices you make determine the metadata you will need.

Know your metadata
At the moment, in practice, ontologies are typically built to string together pre-existing metadata that has been collected for navigational or archival taxonomies, but this is just because that metadata already exists to be harvested. There is a danger in this approach that you end up making connections just because you can, not because they are useful to anybody. As with all metadata-based automated systems, you also need to be careful with the “garbage in garbage out” problem. If the metadata you are harvesting was created for a different purpose, you need to make sure that you do not build false assumptions about its meaning or quality into your ontology – for example, if genre metadata has been created according to the department the commissioning editor worked for, instead of describing the content of the actual programme itself. That may not have been a problem when the genre metadata was used only by audience research to gather ratings information, but does not translate properly when you want to use it in an ontology for content-defining purposes.

Feeding your ontology with accurate and clearly defined taxonomies is likely to give you better results than using whatever metadata just happens to be lying about. Well-defined sets of provenance metadata – parametadata – about your taxonomies and ontologies is becoming more and more valuable so that you can understand what metadata sets were built for, when they were last updated, and who manages them.

Why choose one when you can have both?
Ontologies are very powerful. They perform different work to taxonomies, but ontologies and taxonomies can support and enhance each other. Don’t throw away your taxonomies just because you are moving into the ontology space. Ontologies can be (they aren’t always – see Steve’s comment below) big, tricky, and complicated, so use your taxonomies to support them.

Taxonomy as an application for an open world

    6 comments 
Estimated reading time 9–15 minutes

This post is based on the notes I made for the talk I gave at the LIKE dinner on February 25th. It covers a lot of themes I have discussed elsewhere on this blog, but I hope it will be useful as an overview.

Taxonomies have been around for ages
Pretty much the oldest form of recorded human writing is the list, back in ancient Sumeria, the Sumerian King list for example is about 4,000 years old. By the time of the ancient Greeks, taxonomies were familiar. We understand that something is a part of something else, and the notion of zooming in or narrowing down on the information we want is instinctive.
I am frequently frustrated by the limitations of free text search (see my earlier post Google is not perfect). The main limitation is to knowledge discovery – you can’t browse sensibly around a topic area and get any sense of overview of the field. Following link trails can be fun, but they leave out the obscure but important, the non-commercial, the unexpected.

The very brilliant Google staff are working on refining their algorithms all the time, but Google is a big commercial organisation and they are going to follow the money, which isn’t always where we need to be going. Other free text search issues include disambiguation/misspellings – so you need hefty synonym control, “aboutness” – you can’t find something with free text search if it doesn’t mention the word you’ve searched for, and audio-visual retrieval. The killer for heritage archives (and for highly regulated companies like pharmaceutical and law firms) is comprehensiveness – we don’t just want something on the subject, we want to know that we have retrieved everything on a particular subject.

Another myth is that search engines don’t use classification – they do, they use all sorts of classifications, it’s just that you don’t tend to notice them, partly because they are constantly being updated in response to user behaviour, giving the illusion that they don’t really exist. What is Google doing when it serves you up its best guesses, if not classifying the possible search results and serving you the categories it calculates are closest to what you want?

I’m a big fan of Google, it’s a true modern cathedral of intellectual power and I use Google all the time, but I seem to be unusual in that I don’t expect it to solve all my problems.
I also am aware of the fact that we can’t get to look at Google’s taxonomic processes arguably makes Google more political, more manipulable, and more big brother-ish than traditional open library classifications. We may not totally agree with the library classifications nor the viewpoints of their creators, but at least we know what those viewpoints are!

There was a lot of fuss about the rise of folksonomies and free tagging as being able to supersede traditional information management – and in an information overloaded world we need all the help we can get – the trouble is that folksonomies expand, coalesce, and collapse into taxonomies in the end. If they are to be effective – rather than just cheap – they need to do this – and either become self-policing or very frustrating. They are a great way of gathering information, but then you need to do something with it.

Folksonomies, just as much as taxonomies, represent a process of understanding what everyone else is talking about and negotiating some common ground. It may not be easy, but it is a necessary and indispensable part of human communication – not something we can simply outsource or computerise – algorithms just won’t do that for us. Once everything has been tagged with every term associated with every viewpoint, nothing might as well have been tagged at all. Folksonomies, just as much as taxonomies, collapse into giving a single viewpoint – it’s just that it is a viewpoint that is some obscure algorithmic calculation of popularity.

So, despite free text search and folksonomies, structured classification remains a very powerful and necessary part of your information strategy.

It’s an open world
Any information system – whatever retrieval methods it offers – has to meet the needs of its users. Current users can be approached, surveyed, talked to, but how do you meet the needs of future users? The business environment is not a closed, knowable constrained domain, but is an “open world”1 where change is the only certainty. (Open world is an expression from logic. It presumes that you can never have complete knowledge of truth or falsity. It is the opposite of the closed world, which works for constrained domains or tasks where rules can be applied – e.g. rules within a database).

So, how do you find the balance between stability, so your knowledge workers can learn and build on experience over time, while being able to react rapidly to changes?

Once upon a time, not much happened
The early library scientists such as Cutter, Kelley, Ranganathan, and Bliss, argued about which classification methods were the best, but they essentially presumed that it was possible to devise a system that maximised “user friendliness” and that once established, it would remain usable well into the future. By and large, that turned out to be the case, as it took many years for their assumptions about users to be seriously challenged.

Physical constraints tended to dictate the amount of updating that a system could handle. The time and labour required to re-mark books and update a card catalogue meant that it was worth making a huge effort to simply select or devise a classification and stick to it. It was easier to train staff to cope with the clunky technology of the time than adapt the technology to suit users. No doubt in the future, people will say exactly the same things about the clunky Internet and how awful it must have been to have to use keyboards to enter information.

So, it was sensible to plan your information project as one big chunk of upfront effort that would then be left largely alone. It is much easier to build systems based on the assumption that you can know everything in advance – you can have a simple linear project plan and fixed costs. However, it is very rare for this assumption to hold for very long, and the bigger the project, the messier it all gets.

Change now, change more
Everything is changing far more rapidly than it used to – from the development of new technologies to the rapid spread of ideas promoted by the emergence of social media and an “always on” culture. It’s harder than ever to stay cutting edge!

We all like to speak our own language and use our own names for things, and specialists and niche workers as well as fashionistas and trendsetters expect to be able to describe and discuss information in ways that make sense to them. The open philosophy of the Web 2.0 world means that they increasingly take this to be their right, but this is where folksonomic approaches can really help us.

What you need to do is to create a system that can include different pace layers so that you get the benefits of a stable taxonomy, with the rapid reactiveness of folksonomy as well as quick and easy free text search. You can think of your taxonomy as the centre of a coral reef, but coral is alive and grows following the currents and the behaviour of all the crazy fish and other organisms that dart about around it. It’s hard to pin down the crazy fish and other creatures, but they feed the central coral and keep it strong. In practice, this means incorporating multiple taxonomies and folksonomies and mapping them to one another, so that everyone can use the taxonomy and the terminology that they prefer. Taxonomy mapping tools require human training and human supervision, but they can lighten the load of the labour intensive process of mapping one taxonomy to another.

This means that taxonomy strategy does not have to be determined at a fixed point, but taxonomy creation is dynamic and organic. Folksonomies and new taxonomies can be harvested to feed back updates into the central taxonomy, breaking the traditional cycle of expensive major revision, gradual decline until the point of collapse, followed by subsequent expensive major revision…

There is a convergence between semi-automated mapping (we’ll be needing human editorial oversight for some time) and the semantic web project. This is the realisation of the “many perspectives, one repository” approach that should get round many problems of the subjective/objective divide. If you can’t agree on which viewpoint to adopt, why not have them all? Any arguments then become part of the mapping process – which is a bit of a fudge, but within organisations has the major benefit of removing a lot of politicking that surrounds information and knowledge management. It all becomes “something technical” to do with mapping that nobody other than information professionals is very interested in. Despite this, there is huge cultural potential when it comes to opening up public repositories and making them interoperate. The Europeana project is a good example.

Modern users demand that content is presented to them in a way that they feel comfortable with. The average search is a couple of words typed into Google, but they are willing to browse if they feel that they are closing in on what they want. To increase openness and usage means providing rich search and navigation experiences in a user-friendly way. If your repository is to be promoted to a wider audience future, the classification that will enable the creation of a rich navigation experience needs to be put in place now.

Your users should be able to wander about through the archive collections horizontally and vertically and to leave and delve into other collections, or to arrive at and move through the archive using their own organisation’s taxonomy and to tag where they want to tag, using whatever terms they like. The link points in the mappings provide crossroads in the navigation system for the users.

In this way the taxonomies are leveraged to become “hypertextual taxonomies” that provide rich links both horizontally and vertically.

Taxonomy as a spine
A core taxonomy that acts as an indexing language is the central spine to which other taxonomies can be attached and crucially – detached – as necessary. The automation of the bulk of the mapping process means that incorporating a new taxonomic view
becomes a task of checking the machine output for errors. Automated mapping processes can provide statistical calculations of likelihood of accuracy and so humans only need to examine those with a low likelihood of being correct.

Mapping software has the same problems as autoclassification software, so a mapping methodology, including workflow and approval processes, has to be defined and supported. The more important it is to get a fine-grained mapping, the more effort you will need to make, but a broad level mapping is easier to achieve.

Conclusion
If you start thinking of the taxonomy as an organic system in its own right – more like an open application that you can interact with – bolting on and removing elements as you choose, you do not need to attempt to account for every user viewpoint in the creation of the taxonomy, and that omission of a viewpoint at one stage does not preclude that collection from being incorporated later. Conversely, the mapping process allows “outsiders” to view your assets through their own taxonomies.

Our taxonomies represent huge edifices of intellectual effort. However, we can’t preserve them in aspic – hide them away as locked silos or like grand stately homes that won’t open their doors to the public. If we want them to thrive and grow we need to open them up to the light to let them expand, change and interact with other taxonomies and take in ideas from the outside.

Once you open up your taxonomy, share it and map it to other taxonomies, it becomes stronger. Rather than an isolated knowledge system that seems like a drain on resources, it becomes an embedded part of the information infrastructure, powering interactions between multiple systems. It ceases to be a part of document management, and becomes the way that the organisation interacts with knowledge globally. This means that the taxonomy gains strength from its associations but also gains prestige.
So our taxonomies can remain our friends for a little while longer. We won’t be hand cataloguing as we did in the past because all the wonders of the Google and automated world can be harnessed to help us.