Classification meets the Web – UDCC Seminar 2011

    Start a conversation 
Estimated reading time 2–2 minutes

This post is 4th in a series about the UDC consortium international seminar in The Hague, 19-20 September, 2011.

Interoperability of knowledge organization systems with and through ontologies

Daniel Kless from the University of Melbourne pointed out that problems with ontologies arise when combining them, as errors in combination can have disastrous effects on subsequent reasoning. A well-defined modelling method is needed to minimise this. Standards such as OWL and RDF do not address the problems of methodology or terminology control.

Towards the integration of knowledge organization systems with the linked data cloud

Vincenzo Maltese of the University of Trento, Italy, explained how it is vital to make clear the semantics and purpose of any ontology when attempting to share Linked Data. Ontologies may differ in their scope, purpose, structure, terminology, language, coverage, formality, and conceptualization. He drew a distinction between descriptive ontologies and classification ontologies. It is very easy to convert a descriptive ontology to a classification ontology and the process can be automated, but extremely difficult to convert a classification ontology to a descriptive one and the process requires human intellectual and editorial effort.

Classification and reference vocabulary in linked environment data

Joachim Fock of the Federal Environment Agency (Germany) talked about how they transformed their keyword thesaurus to a Linked Data format.

Classifications and ontologies on their own terms – UDCC Seminar 2011

    Start a conversation 
Estimated reading time 2–2 minutes

This post is the third in a series about the UDC consortium international seminar in The Hague, 19-20 September, 2011.

Approaches to providing context in knowledge representation structures

Barbara Kwasnik, Syracuse University (USA), talked about ways that context can be used as a disambiguation tool, and described different kinds of contexts: warrant, scientific, educational, cultural, etc. However, interdisciplinary approaches can be difficult. It is easy to have different ontological commitments, but you need a mapping to know when and which bits need to work across domains. Ontologies will need updating as the world and world views shift and change, so we need ways of defining their scope, as well as provenance and mappings. There are also difficulties in establishing the neutrality of ontologies.

Interaction between elementary structures in universes of knowledge

Richard P. Smiraglia, University of Wisconsin (USA),
talked about how people want to turn the multidimensional world into a unidimensional top-down model. He pointed out that people tend to assume UDC is like Dewey, but it actually works far more like Ranganathan’s Colon Classification. He called for new theories of organizing knowledge in shifting contexts and theories about how to mediate between concepts and structures like UDC.

Demystifying ontology

Emad Khazraee, Drexel University (USA), talked about how ontological approaches are as old as literature itself, showing a picture of what I think was the ancient Sumerian king list. He talked about boundary objects and the overlap between different academic areas that are interested in knowledge organisation and learning. He also discussed the differences between ontology-as-categorial-analysis and ontology-as-technology.

The role of classification and ontology on the Web – UDCC Seminar 2011

    Start a conversation 
Estimated reading time 2–3 minutes

This post is the second in a series about the UDC consortium international seminar in The Hague, 19-20 September, 2011.

Knowledge Organization Systems (KOSs) as hubs in the Web of Data

In a minor change of schedule, Thomas Baker from the DCMI talked about some of the practical issues with using Linked Data. Provenance data can be recorded as additional information but it is not standardised or an integral part of RDF and this is something that is a growing concern, receving attention from W3C. URI persistence and alignment remain concerns for data managment and governance.

Aligning web vocabularies

Guus Schreiber also dealt with the problem of making sure we are all talking about the same thing when we try to align our vocabularies. He called for ontologists to be modest about what they can achieve and not to try to hide the problems that occur when you try to transfer an ontology form one domain to another. Errors typically occur due to failures to notice subtle differences between domains.

Vocabulary alignment is a complex business that requires a lot of intellectual effort and multiple techniques should be used to reinforce and support each other. It is much better to map small vocabularies to large ones that can then act as “pivots”.

There is still no adequate methodology for evaluating alignments nor for mediating consensus between observers. Perhaps there should be a way of recording the strength of consensus and the presence of disagreements and alternative views.

Classification, Collaboration and the Web of Data

Dan Brickley described three types of graph – the hypertext graph of the Internet’s links between documents, the social graph of links between people, and the factual graph of links between data. Currently Linked Data is bringing together the hypertext and factual graphs, and another step would be to add in the social dimension.

He called for a focus on what the various tools can actually do, to be wary of over-evangelical ontologists, and to remember that subject classifications are strong and robust tools that are more appropriate for many types of work than ontologies.

He said that you could expect Linked Data to solve about a third of your information linking problems.

Classification and Ontology – UDCC Seminar 2011

    Start a conversation 
Estimated reading time 2–4 minutes

I thoroughly enjoyed the third biennial International UDC Consortium seminar at the National Library of the Netherlands, The Hague, last Monday and Tuesday. The UDC conference website includes the full programme and slides and the proceedings have been published by Ergon Verlag.

This is a first of a series of posts covering the conference.

Aida Slavic, UDC editor-in-chief, opened the conference by pointing out that classification is supposed to be an ordered place, but systems and study of it are difficult and complex. We still lack terminology to express and discuss our work clearly. There is now an obvious need to instruct computers to use and interpret classifications and perhaps our work to make our classifications machine readable will also help us explain what we do to other humans.

On being the same as

Professor Patrick Hayes of the Florida Institute for Machine Learning and Cognition delivered the keynote address, pointing out that something so simple as asserting that one thing is the same as another is actually incredibly difficult and one of the problems facing the development of the Semantic Web is that people are asserting that two things are the same when actually they are merely similar.

He explained that the formalisms and logic underpinning the Semantic Web are all slimmed down versions of modern 20th century logic based on a particular world view and set of assumptions. This works very well in theory, but once you start applying such logics to the real messy and complex world with real objects, processes, and ideas, the logics are put under increasing stress.

In logic, when two things are referred to as the same, this means they are two different names for the same thing, not that there are two things that are logically equivalent. So, Paris, the city of my dreams, and Paris the administrative area, Paris throughout history, and Paris – the capital of France are not necessarily all the same. This means that in logic we have to separate out into different versions aspects of an idea that in ordinary language we think of as the same thing.

He described this as the problem of “logic versus Occam” (as in Occam’s razor). Logic drives us to create complexity, in that we have to precisely define every aspect of a concept as a different entity. In order for the Semantic Web to work, we need to be very clear about our definitions so that we don’t muddle up different aspects of a concept.

IASA Conference 2011: Turning archives into assets

    Start a conversation 
Estimated reading time 2–3 minutes

Semantic enrichment

Guy Maréchal continued the Linked Data theme by talking in more detail about how flat data models can be semantically enriched. He pointed out that if you have good structured catalogue records, it takes very little effort to give concepts URIs and to export this data as sets of relationships. This turns your database into a graph, ready for semantic search and querying.

He argued that “going to semantics cannot be avoided” and that “born digital” works will increasingly be created with semantically modelled metadata.

From Mass Digitisation to Mass Content Enrichment

The next talk was a description of the SONUMA digitisation and metadata enhancement project. Sonuma and Memnon Archiving Services have been working on inventories and dictionaries to help them index audio visual assets. They have been converting speech to text, holding the text as XML files, and then associating sections of the XML with the appropriate point in the AV content, so that it can be searched.

They identify breaks in programmes by looking for the time stamps using OCR techniques, and then looking for jumps in the numerical sequences. They assume that jumps in the numbers are breaks in programmes. This enables them to break up long tapes into sections, which usually correspond to programmes.

Social networking and Knowledge Management

Tom Adami described Knowledge Management projects at the United Nations Mission in Sudan (Best Practice Lessons Learnt: How the Exit Interview and Oral History Project at UNMIS is building a knowledge database). The UN in Africa faces problems of high staff turnover, remote locations, and difficulties in maintaining infrastructure. However, they have been using social networking to encourage people to share their knowledge and experience in a user-friendly way and so add to the official knowledge base.

Archive as a social media lab: Creative dissemination of digital sound and audiovisual collections

Budhaditya Chattopadhyay talked about a project to bring together archival practice, artistic practice, and social media. He also referred to the problems of preserving social media which is in essence ephemeral but may be an integral part of an artwork.

IASA Conference 2011: Keynote speech on Linked Open Data

    Start a conversation 
Estimated reading time 4–6 minutes

Kevin Bradley, IASA president, gave the welcome address to the 42nd IASA annual conference. He characterised the digital revolution as one that will continue reverberating for years. He reminded us that it is not always easy to sort the sense from the nonsense and that we are often surprised by what turns out to be valid and how easy it is not to see the wood for the trees – or perhaps to “lose the word to the bits”, or the “picture to the pixels”.

Keynote address – on Linked Open Data

The keynote speech was given by Ute Schwens, deputy director of the Deutsche Nationalbibliothek / German National Library (DNB). She opened with a lovely visualisation by the Opte Project of various routes through a portion of the Internet. It looked a bit like visualisations of neurons in a brain or stars and galaxies.

Ute’s talk was in support of publishing Linked Open Data. She outlined some of the concerns – lack of money, open access versus intellectual property rights, poor quality of data and assets themselves, and inadequate legal frameworks. She said that we shouldn’t be trying to select for digitisation, because everything in an archive has already been selected or it wouldn’t have been kept in the first place. She also highlighted the benefits of making digital versions of unique or fragile artefacts, in order to allow access without risk to the original. She talked about how there are many ways to digitise and that these produce different versions, so an archival master that is as close to the original as possible should always be preserved.

She used as an illustration original piano rolls. These can only be played on very specialised electrical pianos and users cannot practically be given access to them directly, but they were played by specialists and the music recorded, so users can be given access to that. The recordings are not the same as the piano rolls, but are a new and interesting product. It seems obvious that you would not destroy the original piano rolls simply because the music from them had been recorded and now exists in a digital version, so why should you destroy other forms of media such as film, simply because you have a digital version? The digital version in such cases is for access, not preservation.

One fear is that free access to information will diminish usage of an archive or library, but by opening up you can gain new users, especially by providing free access to catalogues and metadata (I like to think of these as “advertising” – shops make their catalogues freely available because they see them primarily as marketing tools).

Another fear is loss of control, but new scientific ideas often arise when diverse strands of thought are brought together and unexpected uses are made of existing data. The unusual and the unforeseen is often the source of the greatest innovation.

She pointed out that we have drafted searching and indexing rules over centuries to try to make objects as findable as possible, so Linked Open Data is merely the next logical step. We can combine automatically generated information with data we already have to provide multiple access points. We need to describe to put objects into context, but we don’t have to describe what they look like in the ways that we used to for catalogues. Good metadata is metadata that useful for users, not metadata merely for maintaining catalogues.

She ended by calling for more open access to data as ways to promote our collections and their value, adding that in uncertain times, our only security is our ability to change.

In the discussion afterwards, she said that Google needs our data and the best way to engage with – and even influence – Google is by gaining recognition as a valued supplier and making sure Google understands how much it needs us to provide it with good quality data.

The conference was hosted by Deutsche Nationalbibliothek / German National Library (DNB), Hessischer Rundfunk / Hessian Broadcasting (hr), and the Deutsches Rundfunkarchiv / German Public Broadcasting Archives (DRA). The sponsors were EMC2 (gold); Memnon Archiving Services, NOA audio solutions, and Arvato digital services (Bertelsmann) (silver); and Cedar audio, Cube-tec International, Front Porch Digital, and Syylex Digital Storage (bronze).