Reductiones ad absurdum

    1 comment 
Estimated reading time 2–2 minutes

In Beneath the Metadata: Some Philosophical Problems with Folksonomy Elaine Petersen argues that as folksonomy is underpinned by relativism, it will always be flawed as an information retrieval method. So, folksonomy will collapse because everything ends up tagged with every conceivable tag so they all cancel each other out and you might as well have not bothered tagging anything.

On the other hand, David Weinberger in Why tagging matters? claims that taxonomy will fail because taxonomists want to invent one single taxonomy to classify everything in the entire world and in a totalitarian style insist that the one true taxonomy is the only way to organise knowledge.

I have no idea who these mysterious megalomaniac taxonomists are. Most of the taxonomists I am aware of only advocate using one single taxonomy for fairly well defined and limited situations (e.g. a single company, or perhaps a department in a big corporation) and are quite happy with the notion that you need lots of different taxonomies suited to context, which makes them much more like Petersen’s relativists.

Conversely, I am fairly sure you can’t actually create an infinite folksonomy with infinite tags for all possible viewpoints of all possible documents (let alone smaller knowledge units). When your taggers are a specific community with a shared purpose, they probably will hit upon a shared vocabulary that is “universal” within the boundaries of that community and so the folksonomy will be meaningful.

I think that these reductio ad absurdum arguments are interesting because they highlight how both folksonomies and taxonomies are inherently flexible and even somewhat unstable, especially when they become large and very widely used. Intervention and management of both will help improve and maintain their usefulness. No matter whether you choose one or the other or a combination of the two, you still need knowledge workers to keep them in good working order!

Is knowledge stuff or love?

Estimated reading time 2–2 minutes

Stuff or love? How metaphors direct our efforts to manage knowledge in organisations by Daniel G. Andriessen, in the Journal of Knowledge Management Research & Practice, is a charming paper proposing that the metaphors we use to describe knowledge affect the way that it is managed. Managers often talk about knowledge as a commodity or resource to be exploited – it has a finite value, can be traded, conserved, wasted, and presumably can run out. Having discussed various metaphors of knowledge as a resource, Andriessen asked people to talk about knowledge thinking of it as love. He says: “The topic of conversations changed completely. Suddenly their conversations were about relationships within the organisation, trust, passion in work, the gap between their tasks and their personal aspirations, etc.”

He points out the “knowledge as a resource” is a very Western viewpoint, whereas knowledge as love is more akin to Eastern philosophies. Knowledge as love can be shared without it running out, but it is much harder to direct or control it. It is not difficult to guess which metaphor managers tend to prefer!

Andriessen points out that the metaphors we use tend to remain hidden and unquestioned in our subconscious. He urges us to think about the metaphors underlying our discussions and research on knowledge management and ask “What would have been the outcome of the research if we see knowledge not at stuff but as love?”

Social Media vs. Knowledge Management

Estimated reading time 1–2 minutes

I was drawn to Venkat’s post on the Enterprise 2.0 blog via What Ralph Knows. Venkat suggests that Knowledge Management and Social Media are in conflict, with younger people preferring an anarchic, organic approach to building knowledge repositories, while older people prefer highly planned structures, and Generation X (of which I am one) remain neutral. I’m always a bit suspicious of generational divisions, as there are plenty of older innovators and young reactionaries, but I must admit I take a “best of both worlds” approach – so I conform to my generational stereotype!

I think the “battle” mirrors the taxonomy/folksonomy debate and experts I’ve asked about this suggest that the best way is to find a synergy. It all depends on the context, what is being organised, and what is needed. So social media are obviously great for certain things, but I’d hate to trust the company’s financial records to a bunch of accountants who said – “oh we don’t bother sorting and storing our files – if we need to prove your tax payments we’ll just stick a post on a forum and see if anyone still has the figures….”


Meta Knowledge Mash-up 2.0 (2008)

Estimated reading time 6–10 minutes

This joint ISKO UK/KIDMM (Knowledge, Information, Data and Metadata Management) workshop, hosted by the British Computer Society on October 9th, boasted an impressive menu of speakers and delegates.

Alan Pollard (BCS president-elect) welcomed us and Conrad Taylor (KIDMM co-ordinator and organiser of the event) provided a very handy literature review and reading list and summarised concepts of knowledge and its management. He encouraged us to “turn data dumps into real knowledge stores”, referring to Etienne Wenger’s work on Communities of Practice and Karl Popper’s notions of “three worlds” (physical, internal/mental, and socio-cultural). The reification of knowledge was another theme and I was struck by the proposal that language is a reification of knowledge that enables participation.

As the spirit of the day was collaboration, we were seated “cabaret style” in small groups and encouraged to talk to each other and share ideas and formulate questions at the end of each presentation. This was a great way of meeting other delegates and giving the day an informal conversational feel.

Marilyn Leask (Brunel University) described her experiences building large-scale and international knowledge-sharing communities in education (TeacherNet, European SchoolNet, and I&DeA). She warned that projects that begin as community-based knowledge-sharing initiatives can be co-opted by the authorities and become accountability measurement instruments or mechanisms for disseminating information, rather than as spaces for true collaboration. It is therefore important to know what you are trying to achieve with your community and who will ultimately be responsible. It may not be appropriate to have community areas on a site that is ultimately a government tool – there are no wikis on TeacherNet – as private professional discussions cannot take place freely when there is government awareness, if not systematic monitoring, of what is being said.

Where funding is required, patience is necessary as projects can take many years to get going. Often it helps to “seed” an idea, leave it to germinate, and wait until enough people start asking why the idea has not already been acted on before funding can be obtained. It takes a critical mass of people accepting the idea to give it momentum.

Another good tip is to find “champions” of the idea and allow them to provide proof of concept. There will always be “late adopters” who are unwilling to participate and there is no point in trying to convince them in the early stages. It also helps to have knowledge-sharing and participation in community sites built into people’s job descriptions and time allowed for them to learn and join in within their normal working day, so that participation does not become yet another additional burden.

It is possible to provide return on investment (ROI) and value for money figures – for example by using costs of purchasing documents from the British Library (about £30 each) or costs of re-writing existing policy documents unnecessarily (often in the region of £5,000-£10,000).

Lindsay Rees-Jones and Ed Mitchell from CILIP talked about the discussion forums and blog spaces they had created for CILIP members. They pointed out that members of the community are contributing a valuable resource – their time – and so need to feel they receive some benefits in return. It is important to ensure social as well as technical cross-sections to make communities work and establishing behaviours, protocols, and processes is more important for administrators than proposing topics for discussion.

The two presenters agreed that participation needs to be part of job roles, not a voluntary extra, and that buy-in will start slowly, with just a few early adopters, but that others will follow. They also felt that quality of content is more important than the number of contributors.

They found it useful to have some “walled gardens” that were kept completely private and some public and semi-public areas for different groups and different purposes. They also pointed out a difference between networks and communities – networks radiate out from a person, whereas communities occur collectively and separately from any individual.

Jan Wyllie (Open Intelligence Ltd) talked about a knowledge community formed in the mid-1990s to produce the “Tomorrow’s Company Inquiry” report for the Royal Society for the encouragement of Arts, Manufactures and Commerce (RSA). Content was gathered through paper-based questionnaires and letters (mainly sent by fax). Content analysis was used along with a pre-existing taxonomy to identify shared meanings and structure the content. However, a purpose-built taxonomy might have been preferable. Classification is a powerful tool in knowledge discovery, as well as organisation, allowing key questions to be brought to the surface of complex and large collections of information.

Using social networks to organise knowledge is a powerful way of rating discussion – by monitoring who is discussing a subject as well as what is being discussed, a far more detailed picture of trends in thinking and emergent topics can be produced.

Christopher Dean (Airbus SAS) described building knowledge-sharing communities within industry. He classified communities into three types – organic, declared, and manufactured. Organic communities typically grow in a stepped manner – a “punctuated equilibrium” – with bursts of growth followed by plateaus. For communities to succeed, membership needs to be an attractive proposition, with perceived benefits (such as socialising, co-learning, co-production); affordable (in terms of costs and risks – which may be in time as well as money); and voluntary (easy to join and leave). Communities are more successful when they have a clear purpose and attract “birds of a feather” rather than arise out of a process of “herding cats”. Empires fall when their citizens stop believing in them and communities tend to wither when they cease to have a clear purpose.

Communities can be disappointed when an imbalance arises between the state of knowledge within the community and within the wider organisation, as specialist communities often find that things they take for granted are not understood by outsiders.

Dialogic design is a hot new topic concerned with teaching people to participate effectively.

Sabine McNeill (3D Metrics) talked about creating communities to raise political issues and lobby governments. Focusing on a proposal for “Green Credit for Green Growth” and the “Forum for Stable Currencies she described the process of moving from data through knowledge to wisdom. She described online communities as being forums for “software-aided thinking”. She also talked about the community-building value in schemes like LETS that sidestep the problems associated with the wider economic system and environmental destruction.

In our groups we then took part in an entertaining card-sorting exercise to identify key features and requirements for a new knowledge-sharing software tool and website to be built for KIDMM by Susan Payne (De Montfort University). I thoroughly enjoyed this, although unsurprisingly – given that I was amongst expert taxonomists – we tended to focus on classification issues! Susan presented her plans for the Know*Ware software and called for participation and collaboration in its creation.

To round off, there was a panel session which focused on how the Know*Ware tool would be built and what its aims should be.

NKOS slides

    Start a conversation 
< 1 minute

Many thanks to Traugott Koch for these links:

NKOS Workshop at ECDL in Aarhus.

NKOS Special Session at DC 2008 in Berlin, all in one single pdf file.

The Joint NKOS/CENDI Workshop “New Dimensions in Knowledge Organization
Systems”, in Washington, DC, USA on September 11, 2008. “Thanks to the contributors, programme committees, chairs and the large and very active audiences. We invite your active participation 2009 as well. Watch the website. ”


The Popularity Contest: Taxonomy Development in the Petabyte Era

    Start a conversation 
< 1 minute

The Popularity Contest: Taxonomy Development in the Petabyte Era « Not Otherwise Categorized…. I really enjoyed this excellent analysis of a familiar argument (let’s just Google for information), especially the emphasis on the difficulty of answering the question “why?”. When and where are quite easy ones, but why is really tricky! I also liked the straightforward assertion that bias is not just inevitable in taxonomy, it is what makes a good taxonomy.

Web archiving

    1 comment 
Estimated reading time 6–10 minutes

I went to an excellent Anglo-French scientific discussion seminar on web archiving on Friday at the Institut Français Cultural Centre in London. The speakers were Gildas Illien of the Bibliothèque Nationale de France (BnF) (Paris) and Dr Stephen Bury of the British Library (BL).

Gildas Ilien described the web archiving project being undertaken by the BnF, using the Heritrix open source crawler to harvest the web from “seeds” (URLs). The crawler was charmingly illustrated with a picture of a “robot” (as people like to be able to see the “robot”), but the “robot” is a bit stupid – he sometimes misses things out and sometimes falls into traps and collects the same thing over and over again. The “robot” generates a lot of code for the librarians to assess and problems include the short lifespan of websites – one figure puts this as only 44 days (although whether that refers to sites disappearing altogether or just changing through updates wasn’t clear) and the “twilight zone” of what is public and what is private. In France the Legal Deposit Act was extended in 2006 to cover the web, so the BnF can collect any French website it wants to without having to ask permission. However, librarians have to choose whether to try to collect everything or just sites that are noteworthy in some way. It is also hard to guess who the future users of the archive will be and what sort of site they will want to access.

So far some 130 terabytes of data have been collected, and some 12 billion files stored.

Harvesting is done in three stages – bulk harvesting once a year; focused crawls of specific sites; and collections of e-deposits (such as e-books) directly from publishers. Some sites would be harvested occasionally – such as the website of the Festival du Cannes – which only needs to be collected once per year – and newspaper sites, which are collected more frequently.

The archive can be searched by URL or by text, although the text search is rudimentary at present.

Classification is another challenge, as traditional library classifications are not appropriate for much web content. For example, election campaign websites were classified by what the politicians were saying about themselves and by what the public were saying about them, as this was thought to be a useful distinction.

However, the problems of how to provide full and useful access to the collection and how to catalogue it properly remain unresolved.

The process was an interesting merging of traditional library skills and software engineering skills, with some stages clearly being either one or the other but a number of stages being “midway” requiring a cross-skilled approach.

Dr Stephen Bury explained that the BL is somewhat of a latecomer to web archiving, with the BnF, the Internet Archive, and the national libraries of Sweden and Australia all having more advanced web archiving programmes. Partly this is due to the state of UK legal deposit law, which has not yet been extended to include websites.

Just as there are many books about books and books about libraries, so there are many websites about the web. It is a very self-referential medium. However, there is a paradox in the BL’s current programme. Because the BL has to seek permission to collect each and every site, it may collect sites that it cannot then provide access to at all, and it cannot provide any access to sites except to readers in its reading rooms. To be able to collect the web but then not to be able to serve it back up to people through the web seems very strange.

Another issue of preservation is that the appearance of websites is browser-dependent, so a site may not look the same to people using different technology.

It is important that online information is preserved, as now websites are considered to be authentic sources of information – cited in PhDs for example – and so some way of verfiying that they existed and what content they contained is needed.

Reports have been produced by JISC and the Wellcome Trust (2002 Collecting and Preserving the World Wide Web) and (2002 Legal issues relating to the
archiving of Internet resources in the UK, EU, USA and Australia
by Andrew Charlesworth).

The BL undertook a Domain UK project to establish what the scope of a web archiving project might be. The BL used Australian PANDAS software. The UK web Archiving Consortium (UKWAC) was set up in 2003 but the need to obtain permissions has seriously limited its scope, as most website owners simply do not respond to permissions requests (very few actively refuse permission), presumably most ignore the request as spam or simply fail to reply.

The data has now been migrated from the PANDAS format to WARC and an access tool is in development. There are some 6 million UK websites, growing at a rate of 16% per year, and they are also growing in size (on average they are about 25Mb, increasing at a rate of 5% per year).

Decisions have to be made on frequency of collection, depth of collection, and quality. There are other peripheral legal issues, such as sites that fall under terrorism-related legislation. At present the BL can collect these sites but not provide access to them.

Resource discovery remains a major challenge, including how to combine cataloguing and search engine technology. So far, a thematic approach to organisation has been taken. Scalability is also a big issue. What works for a few thousand sites will not necessarily work for a few million.

This means that the nature of the “collecting institution” is changing. It is much harder to decide if a site is in or out of scope. A site may have parts that are clearly in scope and parts that clearly aren’t or it may change through time, sometimes being in scope and sometimes not.

The Digital Lives Project in association with UCL and the University of Bristol is looking at how the web is becoming an everyday part of our social and personal lifestyles.

The talks were followed by a question and answer session. I asked for more detail about the “twilight zone” of public and private websites. Both speakers agreed that there is a great need for more education on digital awareness, so that young people appreciate that putting things up on the Internet really is a form of publishing and their blogs and comments in public forums are not just private “chats” with friends. However, in France there has been little resistance to such personal material being collected. Most people are proud to have their sites considered to be part of the national heritage. A lot of outreach work has been done by the BnF to explain the aims of the archive and discuss any concerns. Gildas Ilien also pointed out that people do not necessailry have “the right to be forgotten” and that this is in fact not new. It has happened in the past that people have asked for books and other information to be removed from libraries, perhaps because they have changed their political viewpoint, and that a library would not simply remove a book from its shelves because the author decided that they had changed their mind about something in it.

There is a recent interview with Gildas Ilien (in French) on You Tube called L’archivage d’Internet, un défi pour les bibliothécaires.

Information Architecture for Audio

    Start a conversation 
< 1 minute

Information Architecture for Audio: Doing It Right – Boxes and Arrows: The design behind the design. Another gem for the ever reliable Boxes and Arrows. The article highlights differences in the way that users interact with text and audio and sets out techniques for improving delivery of audio. For example, starting off a piece of audio by saying how long it will last and a summary of its content is helpful, as you can’t scan audio in the way that you can scan text.