I went on Tuesday to Online 2008 at Olympia. It seemed quieter than last year, so I took advantage of some of the free presentations. I listened to Laurent Le Meur talking about Agence France-Presse (AFP)’s efforts to create a multimedia news database – (Imageforum), Graham Beastall of Soutron Ltd on Taxonomy Development using Sharepoint, Scott Gavin on Knowledge Plaza, and Judith Lewis of i-level on the Dark Side of Social Media.

Le Meur described the need to create a common metadata language to bring together journalists and photographers, who tended to think about subjects very differently. AFP use autocategorisation software (supplied by Temis) but have invested heavily in training it to work well, in other words lots of human input. They imported a number of existing vocabularies from such sources as GeoNames, EuroVoc, and the IPTC‘s taxonomy of news categories as a base, selecting 300 of the IPTC’s 1,300 categories to improve software performance. They currently extract and autocategorise people, organisations, locations, points of interest, products, and brands. They would realy like to be able to pick out news events, but language usage is too broad and diffuse for them to have managed this with any success.

Their documentalists and indexers were initially reluctant to work with the new system as it meant a dramatic reduction in the complexity of their indexing work. Previously they could use some 3,000 terms but this was reduced in order to be compatible with the entity extraction software.

For images, they found the key facets were expressions (e.g. smiling), action, aspect (e.g. profile, close-up) and style (e.g. backlit). They are happy with the Antidot faceted navigation system that allows them to choose index fields, but have not been able to incorporate image rights, as they are too complex and vary according to things like location of the user.

Beastall said that users of Sharepoint are not fully exploiting it, with only 4% using it as a tool for search, while 43% use it for collaborative working. He warned that you need to impose discipline in categorisation right from the start of an implementation as once information has “grown wild” it is far harder to retrospectively tidy it up. He also pointed out that people tend to think they know where to find things, but then someone else has a site reorganisation, so if key information wasn’t well indexed, it can be lost.

There is also value in segmenting your information so that you have public areas separate from the main enterprise content management system. Such public areas can then be treated differently in terms of things like security and social working. An interesting take on the taxo/folkso synergy is to let people build their own sites, but to have a central team looking out for good candidates for inclusion in centralised systems, and to bring personal sites in when they are useful, amalgamating to remove duplicates, etc. He encouraged the use of folksonomies as a “fast track” to sit beside the core vocabularies and feed into them, as folksonomies are particularly useful for new and fast-moving areas, but not so good for long term management and control. He cited the websites contentandcode.com – a Microsoft solutions provider, The Information Architecture Institute, a useful article on taxonomies, thesauruses, etc., by metamodel, and the Sharepoint blog vitalskill.com.

Knowledge Plaza is a CMS [Scott Gavin has pointed out that it is actually more a Knowledge Management/Enterprise Search tool – see his comment] that allows a dual taxo/folkso approach, with options for a totally open folkso system, a managed folkso system, where users can use any tag they like, but an administrator effectively builds a thesuarus in the background to link synonyms and prompt future users with preferred terms, and a totally controlled vocabulary.

Social search seemed to be a buzzword this year, and Gavin talked about a function that allowed you to “use people as search engines” but as far as I could tell, this actually meant the system simply recorded everybody’s search results, websites visited, etc., and then allowed other people to run a search on specific people’s collections of information. [Actually, the system runs live Google searches on the websites particular people have looked at, as well as emails, documents, etc associated with them or that are tagged a particular way – see Scott’s comment for more details].

Lewis’s talk wasn’t a cultural critique of the effect of social media on humanity, but a useful practical guide to how to avoid breaking the law and causing damage to brand reputation by using social media badly. Essentially – don’t pose as a genuine customer when you are working for a company and don’t disguise advertisements by making them indistinguishable from articles (which can be a bit of a grey area). She also suggested that it was better to have one company blog and get lots of people to contribute to it to keep it moving, than have lots of company blogs that are hardly ever updated.