I was delighted to be invited by Simon Rooks, Multimedia Archivist at the BBC, to talk to the Organizing Knowledge Taskforce of IASA (International Association of Sound and Audiovisual Archives) about Linked Data and the Semantic Web. As someone from the content rather than the technical side, I focussed on introducing the basic concepts and trying to encourage people to think creatively about their data. I go to a lot of talks described as introductions to Linked and Open Data that are actually rather technical and even if they focus on projects, they tend to assume there are a lot of programmers in the audience. However, we as information professionals are the ones that understand our data, and we should be taking the lead in devising exciting projects. This doesn’t mean we all need to become coders, but we should be coming up with wonderful ideas and asking the technical teams to build them, not sitting around expecting the coders to learn about our content and suggest ideas to us.

I don’t advocate leaping in with all guns blazing and throwing away established methods and practices, but one of the appealing things about Linked Open Data to me is that it is just publishing – so you can preserve your core data in any format you like – and it is ideally suited for a pilot project approach. You can try it out with a small set of uncontentious data and see what happens. People complain that it is not clear how publishing data openly makes money, but if your aim is about increasing access and outreach, then that is less of a concern.

Simon then talked about the taxonomy training workshop my colleagues Vic Cowan and Janis Mcanallen produced to help us introduce and explain metadata and taxonomies.

During the debate after the talks, questions of trust and authority came up and Simon made the very valid point that as information professionals we should be able to identify good and trustworthy data sets – if we can’t assess data quality then who can?

Broadcast archives session – semantic search and metadata standards

I enjoyed the two presentations in the Broadcast Archives session. The first was about the CONTENTUS research and development project. It is part of the THESEUS project of the German government and emphasises the importance of context and good semantic metadata.

The project seeks to tackle 6 aspects of multimedia accessibility:

1. Digitization
2. Automatic quality control
3. Automatic content analysis
4. Semantic metadata linking
5. Formation of knowledge networks
6. Semantic multimedia search

It struck me that semantic search with links and disambiguation seems pretty mainstream and simple text search is looking somewhat old-fashioned nowadays.

I was interested in the assertion that manual indexing takes 4-10 hours per hour of video. That would seem to cover full cataloguing with rich descriptions. I am not sure it is anything like that much for key frame tagging with supported vocabularies, but it is a useful figure for comparison.

The second presentation was about EBU Core and how it is a simple metadata standard that is also very flexible. It is compatible with SKOS and RDF and has been adopted by Europeana and other projects.

(I am always interested in how subject descriptions and “aboutness” are handled in such standards, but in this case there just seemed to be a field for “subject” into which you can put as simple or complex a classification system as you choose. The standard itself makes no attempt to achieve semantic equivalence – to make sure that when I put “jaguar” in the subject field it is about the cat, not the car, and when you use “jaguar” you mean the same thing. Presumably we have to forge our own agreements on common terminologies and identifiers as a separate exercise. I suppose Linked Data is one way of approaching the consensus-building process.)