Classification and Ontology – UDCC Seminar 2011

    Start a conversation 
Estimated reading time 2–4 minutes

I thoroughly enjoyed the third biennial International UDC Consortium seminar at the National Library of the Netherlands, The Hague, last Monday and Tuesday. The UDC conference website includes the full programme and slides and the proceedings have been published by Ergon Verlag.

This is a first of a series of posts covering the conference.

Aida Slavic, UDC editor-in-chief, opened the conference by pointing out that classification is supposed to be an ordered place, but systems and study of it are difficult and complex. We still lack terminology to express and discuss our work clearly. There is now an obvious need to instruct computers to use and interpret classifications and perhaps our work to make our classifications machine readable will also help us explain what we do to other humans.

On being the same as

Professor Patrick Hayes of the Florida Institute for Machine Learning and Cognition delivered the keynote address, pointing out that something so simple as asserting that one thing is the same as another is actually incredibly difficult and one of the problems facing the development of the Semantic Web is that people are asserting that two things are the same when actually they are merely similar.

He explained that the formalisms and logic underpinning the Semantic Web are all slimmed down versions of modern 20th century logic based on a particular world view and set of assumptions. This works very well in theory, but once you start applying such logics to the real messy and complex world with real objects, processes, and ideas, the logics are put under increasing stress.

In logic, when two things are referred to as the same, this means they are two different names for the same thing, not that there are two things that are logically equivalent. So, Paris, the city of my dreams, and Paris the administrative area, Paris throughout history, and Paris – the capital of France are not necessarily all the same. This means that in logic we have to separate out into different versions aspects of an idea that in ordinary language we think of as the same thing.

He described this as the problem of “logic versus Occam” (as in Occam’s razor). Logic drives us to create complexity, in that we have to precisely define every aspect of a concept as a different entity. In order for the Semantic Web to work, we need to be very clear about our definitions so that we don’t muddle up different aspects of a concept.

IASA Conference 2011: Turning archives into assets

    Start a conversation 
Estimated reading time 2–3 minutes

Semantic enrichment

Guy Maréchal continued the Linked Data theme by talking in more detail about how flat data models can be semantically enriched. He pointed out that if you have good structured catalogue records, it takes very little effort to give concepts URIs and to export this data as sets of relationships. This turns your database into a graph, ready for semantic search and querying.

He argued that “going to semantics cannot be avoided” and that “born digital” works will increasingly be created with semantically modelled metadata.

From Mass Digitisation to Mass Content Enrichment

The next talk was a description of the SONUMA digitisation and metadata enhancement project. Sonuma and Memnon Archiving Services have been working on inventories and dictionaries to help them index audio visual assets. They have been converting speech to text, holding the text as XML files, and then associating sections of the XML with the appropriate point in the AV content, so that it can be searched.

They identify breaks in programmes by looking for the time stamps using OCR techniques, and then looking for jumps in the numerical sequences. They assume that jumps in the numbers are breaks in programmes. This enables them to break up long tapes into sections, which usually correspond to programmes.

Social networking and Knowledge Management

Tom Adami described Knowledge Management projects at the United Nations Mission in Sudan (Best Practice Lessons Learnt: How the Exit Interview and Oral History Project at UNMIS is building a knowledge database). The UN in Africa faces problems of high staff turnover, remote locations, and difficulties in maintaining infrastructure. However, they have been using social networking to encourage people to share their knowledge and experience in a user-friendly way and so add to the official knowledge base.

Archive as a social media lab: Creative dissemination of digital sound and audiovisual collections

Budhaditya Chattopadhyay talked about a project to bring together archival practice, artistic practice, and social media. He also referred to the problems of preserving social media which is in essence ephemeral but may be an integral part of an artwork.

IASA Conference 2011: Keynote speech on Linked Open Data

    Start a conversation 
Estimated reading time 4–6 minutes

Kevin Bradley, IASA president, gave the welcome address to the 42nd IASA annual conference. He characterised the digital revolution as one that will continue reverberating for years. He reminded us that it is not always easy to sort the sense from the nonsense and that we are often surprised by what turns out to be valid and how easy it is not to see the wood for the trees – or perhaps to “lose the word to the bits”, or the “picture to the pixels”.

Keynote address – on Linked Open Data

The keynote speech was given by Ute Schwens, deputy director of the Deutsche Nationalbibliothek / German National Library (DNB). She opened with a lovely visualisation by the Opte Project of various routes through a portion of the Internet. It looked a bit like visualisations of neurons in a brain or stars and galaxies.

Ute’s talk was in support of publishing Linked Open Data. She outlined some of the concerns – lack of money, open access versus intellectual property rights, poor quality of data and assets themselves, and inadequate legal frameworks. She said that we shouldn’t be trying to select for digitisation, because everything in an archive has already been selected or it wouldn’t have been kept in the first place. She also highlighted the benefits of making digital versions of unique or fragile artefacts, in order to allow access without risk to the original. She talked about how there are many ways to digitise and that these produce different versions, so an archival master that is as close to the original as possible should always be preserved.

She used as an illustration original piano rolls. These can only be played on very specialised electrical pianos and users cannot practically be given access to them directly, but they were played by specialists and the music recorded, so users can be given access to that. The recordings are not the same as the piano rolls, but are a new and interesting product. It seems obvious that you would not destroy the original piano rolls simply because the music from them had been recorded and now exists in a digital version, so why should you destroy other forms of media such as film, simply because you have a digital version? The digital version in such cases is for access, not preservation.

One fear is that free access to information will diminish usage of an archive or library, but by opening up you can gain new users, especially by providing free access to catalogues and metadata (I like to think of these as “advertising” – shops make their catalogues freely available because they see them primarily as marketing tools).

Another fear is loss of control, but new scientific ideas often arise when diverse strands of thought are brought together and unexpected uses are made of existing data. The unusual and the unforeseen is often the source of the greatest innovation.

She pointed out that we have drafted searching and indexing rules over centuries to try to make objects as findable as possible, so Linked Open Data is merely the next logical step. We can combine automatically generated information with data we already have to provide multiple access points. We need to describe to put objects into context, but we don’t have to describe what they look like in the ways that we used to for catalogues. Good metadata is metadata that useful for users, not metadata merely for maintaining catalogues.

She ended by calling for more open access to data as ways to promote our collections and their value, adding that in uncertain times, our only security is our ability to change.

In the discussion afterwards, she said that Google needs our data and the best way to engage with – and even influence – Google is by gaining recognition as a valued supplier and making sure Google understands how much it needs us to provide it with good quality data.

The conference was hosted by Deutsche Nationalbibliothek / German National Library (DNB), Hessischer Rundfunk / Hessian Broadcasting (hr), and the Deutsches Rundfunkarchiv / German Public Broadcasting Archives (DRA). The sponsors were EMC2 (gold); Memnon Archiving Services, NOA audio solutions, and Arvato digital services (Bertelsmann) (silver); and Cedar audio, Cube-tec International, Front Porch Digital, and Syylex Digital Storage (bronze).

KO

Building Enterprise Taxonomies – Book Review

    Start a conversation 
Estimated reading time 2–2 minutes

There aren’t many books on taxonomies, so it is good to have another on the shelf. Darin L Stewart‘s book is based on a series of lectures and provides a good introduction to key topics. As a format, that means you can pick the sections that are relevant to you. It has a very American student textbook tone, with pop quotes and definitions of key concepts in information science (e.g. precision and recall), but that doesn’t mean it isn’t a useful refresher for professionals. I particularly enjoyed the sections on XML, RDF, and ontologies as most of the coverage of these topics is either highly technical or very abstract. As the title suggests, it has a very corporate focus and so doesn’t really cover scientific taxonomies or library classifications.

The chapters introduce the concept of findability, cover the basics of metadata, types of taxonomies, how to go about developing a taxonomy and performing a content audit, general guidance on choosing terms and structures, some of the technical issues – introducing XML, XSLT, RDF, and OWL, and summarising ontologies and folksonomies.

I found a few typos and a few places slightly odd – for example I found the use of “Google whacking” to illustrate “teleporting” confusing and the descriptions of how to go about taxonomy work to be a little prescriptive. However, textbooks have to simplify the world in order to provide students with a starting point. Overall the book covers a good range of topics and concepts and is a light but informative read.

Conversations about conversation – Gurteen knowledge café

    Start a conversation 
Estimated reading time 4–6 minutes

Last Wednesday evening I attend my first “Knowledge Café” hosted by David Gurteen. I have heard a lot about these cafés at various information events and so was pleased to finally be able to attend one in person. The idea appears to be twofold – firstly that knowledge and information professionals can find out what such cafés are for and how to run them and secondly simply to participate in them for their own sake.The “meta-ness” of the theme – conversations about conversation – appealed to me. (I’ve always like metacognition – essentially thinking about thinking, too).

We had plenty of time to get a drink and network before the event started, which is always a good thing, then David gave us a short introduction to the topic. He talked about Theodore Zeldin‘s book about Conversation: How Talk can Change our Lives and reminisced about a conversation from his own childhood that had held personal significance. He then set us three questions to discuss, about whether conversations can help us to see the world differently and how we can use them to bring about change for the better.

We then had a quick round of “speed networking” and formed groups to talk about the first question, moving on to different groups subsequently, so that we were well mixed by the end of the evening. To conclude we gathered into one large circle to talk further. This way we spiralled out from a single speaker, to speaking in pairs, then small groups, then all of us together.

Some common strands that everyone seemed to touch on at some point included discussing whether conversation was medium agnostic. Some people felt quite strongly that only a face-to-face discussion was a real conversation and that chatting via email, by text, by IM, and even by telephone were not the same. Others felt that the medium was irrelevant, it was the nature and quality of the communication that mattered. They agreed that signals, such as body language, shared environment, and instant interactivity were lost when not face to face, but that other factors, such as power imbalances between participants, could be minimised by talking remotely and unseen. Most people agreed that it was far easier to chat in highly constrained media, such as texting, with people one already knew well and had talked to frequently face to face, as that acquaintance helped smooth over misunderstandings due to lack of tone of voice or hastily chosen and ambiguous words. Clarity of vocabulary was also seen as key, especially when dealing with diverse groups or communities of practice.

Trust, power, empathy, and the ability to listen were noted as important factors in productive conversations, as was persuasion, but also that people needed to be open and receptive if change – and perhaps even communication at all – were to be achieved.

I was surprised that fewer people mentioned the physical surroundings and settings of good conversations. I remembered Plato, with Socrates sometimes in the marketplace and sometimes going off to sit in a quiet place under a tree. I find the best conversations need a calm neutral space, without interruptions, where participants can be comfortable, can hear each other clearly, can see each other easily, and have space to move about, perhaps to draw, gesture, etc. if they want to emphasise or illustrate a point. Poor acoustics in restaurants can be disastrous for dinner conversations if all you can hear is clattering chairs and clinking cutlery. Chirruping mobile phones, staff requesting answers, and children needing attention break conversational rhythm and flow, not to mention trains of thought.

Interestingly, in the group discussion, and as so often happens in all conversations, people drifted off topic and became increasingly animated by discussion of something unintended and not particularly relevant. In this case it was a purely political debate about whether the competitive nature of humans was a good or bad thing. Despite mutterings that we are becoming less politically engaged, people seem to want to wear their politics very much on their sleeves.

On the way home, I wondered whether the conversations I had participated in that evening had changed me or the world. In a small way, every experience we have changes the world. I met some interesting new people. I had some new ideas and learned a few new piece of information (apparently it is less tiring to listen to a telephone conversation using both ears – e.g. through a pair of headphones instead of a single earpiece). This blog post exists as a result of the evening. However, I took to heart the point that change has to come from within and I resolved to try to remember to stay adaptable and open to new viewpoints. I also resolved to listen more attentively and to try to facilitate better, more productive conversations while at work. I certainly hope this will change the world for the better, albeit in a very subtle way.

There’s no such thing as digital privacy

    Start a conversation 
Estimated reading time 1–2 minutes

I was asked to write about privacy for Information Today Europe.

In under 1,000 words it is not easy to cover such a huge topic, so I tried to take a bird’s eye view and put just a few of the issues into a broad context. Most people focus on quite a narrow angle – for example information security, or libel cases – but the topic covers far wider socio-cultural issues. From hacking, to government surveillance, to Facebook as a marketing tool, to family logins for online services, to personalisation, and even neuroscience, what is known about us and who may know it runs right through the heart of our interactions and transactions.

Privacy is also a very hot topic, with Radio Four’s PM running a series – The Privacy Commission, and legislators trying to figure out what sort of a legal framework we need to balance the often competing interests in privacy of the rich and famous, the ordinary citizen, the family member, the child, commercial organisations, the government. There is much to consider as we rely more and more on “black boxed” algorithms and processors as our information mediators.

Digital Asset Management – DAM EU Conference – Third Session

    Start a conversation 
Estimated reading time 2–3 minutes

Sustaining your DAM

Sara Winmill from the V&A talked about the huge shifts in mindset that were needed to accompany their DAM work. They needed to stop thinking about storing pictures of things and start thinking about managing those digital images as the things. Their needs for storage were vastly underestimated at first. Unlike the myth, storage is not so cheap – the V&A need some £330K for storage annually. They have been investigating innovative approaches to “backup bartering” – finding a similar organisation and storing a copy of each other’s data, so that the backups exist offsite but without the expense of using commercial storage companies.

Despite having a semantically enabled website, they have not been able to link their Library Catalogue’s MARC records with the images, and have three sets of identifiers that are not mapped.

One of their major DAM problems is trying to stop people storing multiple copies and refusing to delete anything. The core collections images need to be kept, but publicity and marketing material is now being stored in the system without any selection and disposal policies in place, The original system was designed without a delete button altogether.

Can we fix it? Yes we Can! Successfully Implementing a Multi-faceted DAM system at HiT entertainment

It was a pleasure to hear of Tabitha Yorke’s successful DAM implementation at HiT as they built their first digital library. This was a relatively constrained collection and two fulltime members of staff were able to catalogue it in a year. This provided the metadata they needed for a straightforward taxonomy-based search system that is simple and easy to use. This meant that self-research was supported, saving the team much time and increasing productivity hugely. They are now working to integrate the library with rights systems. They worked hard at getting users to test the metadata and made sure that they were cataloguing with terms the users wanted to search with, rather than those that occurred first to the cataloguers. They now have two digital librarians managing 150,000 assets.

Tabitha stayed on the stage and was joined in a panel session by David Bercovic, Digital Project Manager at Hachette UK, and Fearghal Kelly of Kit digital. The afternoon ended with David Lipsey’s concluding remarks.

Digital Asset Management – DAM EU Conference – Second Session

    Start a conversation 
Estimated reading time 4–6 minutes

Serco Artemis Digital – Realising the Value of Archives and Rehabilitating Prisoners

Bruce Hellman from Serco described the work they have been doing to employ prisoners as cataloguers and transcribers. The work, which varied from project to project, but which included typing up handwritten archival documents that were not suitable for OCR capture techniques and adding metadata, was very popular with prisoners.

Bruce argued that it gave them a chance to develop skills that would be useful in the workplace on their release, and allowed organisations to get work done more cheaply than by paying standard market rates.

How Metadata and Semantic Technologies will Revolutionise your Workflow

John O’Donovan of the Press Association gave an entertaining presentation about using semantic technologies to index or re-index and publish to the web content from a range of systems, including legacy systems and external feeds. He pointed out – with a series of amusing ambiguities and unintentional innuendos – that simple text search lacks context, and that newspaper headlines often contain jokes, ambiguous terms, and terms that quickly become obsolete. So, metadata is vital in assembling assets that are about the same topic.

He stressed the importance of keeping your metadata management separate from your content management, so that metadata can be changed without having to re-index assets. (An exception is rights and other non-subjective metadata that needed to be embedded in the asset for further tracking. This is not a major concern to the Press Association as they do not track assets once they are published onto the web. I wasn’t sure what would happen if you decided you wanted to repurpose your content, and so needed a new set of metadata, how you link content and metadata, and how you manage the metadata and content within their separate stores.)

The PA are using Mark Logic as the content repository and a BigOWLIM triplestore to handle the associated metadata. Content is fed into the content store, then out again to a suite of indexing technologies, including concept extraction and other text-processing systems, as well as facial recognition software, to create semantic metadata. Simple ontologies are used to model the content, mainly indexing people, places, and events – themes chosen as covering the most popular search terms entered by users of the website.

John argued that such gathering and indexing of assets in order to automatically create and publish collections of associated content was simpler and easier than ingesting diverse content and metadata into traditional search, content management, and online publishing systems.

DAM for Content Marketing, Curation, and Knowledge Organisation

Mark Davey of the DAM Foundation took us on an animated and musical tour of different perspectives on metadata, engagement, social media, and how different the “digital natives” – young people who have grown up with digital technologies – will be to previous generations. Kids of the future will be able to have an idea in the morning, go to an online website app and create their site, their brand, and their marketing strategy in the afternoon, and be engaging with their potential clients by the evening.

Mark pointed out that people have moved on from the initial narcissism of social media and self-publishing and now want compelling stories they can engage with. He pointed out that as semantic technologies advance, we are caught in a feedback loop with them – we are the ontology that is driving the machines – and so we should be aware and vigilant. As the technologies become more powerful and all pervasive, we may lose sight of how they are working to serve us, rather than how we are serving up information about ourselves to them.

Marketing will have to become more sophisticated. Amongst the many statistics he quoted, I noted that 84% of 25-34 year olds have left a favourite website because of ads. At the same time, our networks become more interconnected. In a “six degrees of separation” game, we discovered that three people in the audience had met the Dalai Lama, and we are linking to more and more people through social media sites every day.

The metaphor of information as water is a familiar one, especially in the knowledge management area, but Mark’s colleague Dave pointed out how appropriate it is when talking about a DAM/dam. The DAM system forms the reservoir of content.

(I couldn’t help comparing and contrasting the ever-changing semantic seas of information at the Press Association with the more manageable streams of content that flow within smaller organisations, and how very different approaches are needed for such different contexts. The other day I saw the metaphor used again, in an interview with – apparently – one of the LulzSec hackers who talked about their pirate boat and “copywrong” as an enemy of the seas. )

Black Holes and Revelations: DAM and a museum collection

As if to continue the water metaphor, the next speaker was Douglas McCarthy from the National Maritime Museum. However, he took the metaphor up a stage, to space ships and black holes, with their content assets hidden in black holes as 100,000 uncatalogued image files.

Having catalogued and improved their DAM system, the Musuem’s Picture Library is now showing a healthy profit. Many sales come from the “long tail” of images that no-one anticipated anyone would want. Rather than saturating the market, putting the images online has been stimulating demand, with customers calling for more collections to be made available.