Category Archives: KO

Your organization is not the Internet

    1 comment 
Estimated reading time 7–11 minutes

Many people find it very difficult to understand why search within an organization can’t “just be like Google”. This is often because they haven’t thought about the differences between an organization and the Internet.

Your organization is smaller than the Internet

Search engines like Google work because they have access to big data. Google gets billions of searches to process, from billions of users. Even if your organization is a large one, it won’t have that many users either searching or contributing content, so it cannot number crunch on the same scale as Google. Your IT department is probably a lot smaller than Google’s and your enterprise search team’s daily budget is unlikely to cover more than the tiniest fraction of what Google spends. Last, but by no means least, your organization doesn’t have as much content as the Internet, so it probably needs to be far more careful about not losing any that is valuable.

Surfing the net is not many people’s job

There are important differences between how and why people search when they are at work and when they are not, and between how and why they search the Internet and their organization’s Intranet or archives. People rarely surf their organization’s Intranet for fun, to be entertained, or to while away the time. The differences in serious research behaviour and leisure searching are well documented, so I am going to write about another aspect of differences between the Internet and organizations that is often overlooked.

Putting stuff online is not the same as writing a business report

There are vast differences in the ways that people create and curate content on the Internet and within an organization. These differences have a significant effect on the way search functions. The key difference is in how much they link their content to that of others. Of course, there are people whose jobs are to create and curate online content – all the web editors, content strategists, copywriters, social media marketers, etc. – but they will be the first to explain that they have a very specialised set of skills focused on making their content searchable, commercial, or otherwise user friendly. They do a whole lot of things that most people as part of the day job neither know how nor have the time to do.

Links are a form of Knowledge Organization that Google gets for free

One of the key things that web professionals and unpaid web enthusiasts do with their content is to add and manage links. Links are what organize the web. Links are what group sites into clusters by content. Links are the web’s classification scheme. Clay Shirky back in 2005 said “there is no shelf” but it makes just as much sense to think of millions of shelves – infinite shelves going off in all directions, with new ones being created and old ones being discarded. The web is not linear – like a shelf – but it is not without structure. Google effectively picks one of the near infinity of shelves and offers it up as a linear list whenever you do a search. It chooses the shelf that seems to be the most popular, or that fits its commercial model. First on the shelf is often a paid-for advertisement or a Wikipedia entry, followed by other big well-established commercial sites. Out there on the Internet, people do an awful lot of shopping, and not much work, so that’s fine. (If they are doing more shopping than work when they are at work, your organization probably has bigger problems than search to deal with.).

For many other searches, especially more thematic research, people would be disappointed with the results, were it not for the magic of the way the web works – the links. As long as Google slings a site at you that has lots of links to other sites, it doesn’t have to take you straight to what you want, it lets you and the links do the rest of the work. Links gather together similar content, so they function like a classification scheme. The links associate content that is aimed at similar audiences, is on similar topics, is of a similar age. The links represent a huge amount of sorting, cataloguing, and classification work. Google did not have to pay for this work (genius business model). People do this work for Google for free. They do this work as part of creating and curating their content.

Many of Google’s volunteer librarians do this work for fun. They create fan sites, they write Wikipedia articles, they produce lists and generate indexes to their favourite content. They provide cataloguing descriptions and context. They do all this work partly because they enjoy it and partly because they hope to get “repaid” by their site becoming popular. They hope this will either lead to monetary reward (their band will get signed, they’ll get a better job, they’ll sell advertising) or social reward (they’ll make online “friends”, get positive feedback from comments, etc.).

From the commercial angle, people do this work because they expect to gain financial reward. They want to sell more products and make money. This is why there are howls of pain whenever Google tweaks its algorithms. Companies that balk at investing in internal search systems will spend fortunes chasing SEO.

Are your staff content curators?

If you want your organization’s search to be “just like Google” you need to think about how linked your content is. Do people who create content in your organization do so for the same reasons and with the same motivations as people create and link content on the web? It is very unlikely that you have lots of “fans” who will spend their free time creating lists of your companies’ best information resources, or collecting and rating and reviewing reports and documents. Most employees are too busy getting on with their day jobs to spend office hours pursuing their “fan” projects. Even if your staff have plenty of spare time, how many of them are big enough fans of some aspect of work to treat it like a hobby? If you want people to start looking out for similar documents on your Intranet and linking their own documents to them, you will probably have to find ways of motivating them to do this as a special initiative. It is not likely to come “for free”, like it does for the web search engines.

For some organizations, encouraging and incentivising “fan”-type behaviour may work. If the organization already has a strong collaborative culture, with people sharing ideas and using social media, it may be a small step to get them to think of their documents and presentations as blog posts. Including content creation and curation in people’s job roles and rewarding those who do well will foster a link-rich Intranet. By recognising and rewarding people who promote useful links and lists and get them to rank highly in your enterprise searches, you could bring an element of gamification to encourage this sort of behaviour. For other organisations, the culture may support this kind of web-style content creation, but people are generally too busy, have skill sets too far from what is required, or need training and encouragement. In such organizations it may make sense to have the equivalent of web editors, content strategists, user experience specialists, search engine optimizers, etc. working with the organization’s internal content to promote the most valuable resources. In other words, layer of “linkers” who work alongside the content originators.

For other organizations, where it would be inappropriate, too time consuming, or too far from established culture to encourage web-like information behaviour, enterprise search will never work “just like Google”. More formal and standardized metadata management processes are likely to be needed. Organizations that generate a lot of very specific content that is unlikely to be useful in broader contexts, confidential content, or large volumes of very similar structured content are likely to find it hard to move away from directed and standardised searching.

Many organizations will have a “mixed economy” with different types of content and different departments operating with different styles (e.g. what works in a marketing department is unlikely to work in the same way in a finance department).

Without links, search is a lot of dead ends

Without links, each search result is isolated. This stops the searcher in their tracks and means they cannot surf in the way they do on the Internet. They will have to check search results one after another in a linear fashion. If your search engine is not getting the most relevant results to the top of that list, your staff will be spending a huge amount of time working their way through that list. They cannot plump for one likely looking result then follow the trail of links, as they do on the web. The links as a form of classification do not exist, so you need another mechanism (taxonomy, ontology, index, directory) to help people find groups of related content and browse through from one document to another.

So, even though you may have the technology and the budget to match Google’s, unless your content creators are linking freely, you will never completely succeed in turning your Intranet into a mini-Internet.

Transforming and extending classification systems – UDCC Seminar

    Start a conversation 
Estimated reading time 2–3 minutes

This post is the last in a series about the UDC consortium international seminar in The Hague, 19-20 September, 2011

Joan S. Mitchell, OCLC (USA), and Marcia Lei Zeng, Kent State University (USA), supported by Maja Žumer, University of Ljubljana (Slovenia), talked about extending models for controlled vocabularies to classification systems: modelling DDC with FRSAD, which led to interesting discussions about their concepts of “nomen” and “thema”.

Along with my former colleague Andy Heather, now CTO at DODS Parliamentary Communications Ltd, I talked about our work on the data migration of classifications from a legacy database into new taxonomy management software, presenting our paper: Transformation of a legacy UDC-based classification system: exploiting and remodelling semantic relationships.

Conclusions

The key ideas I took away from the conference were:
1) Classifications and ontologies are not an either/or choice. They have different properties and different strengths and weaknesses and so should be chosen according to the task in hand.
2) It is difficult to turn a classification into an ontology, but easy to turn an ontology into a taxonomy, so if you don’t have either to start with and can’t decide, an ontology is a safer bet. If you already have a classification, you need to think carefully about whether it is worth turning it into a fully modelled ontology, as converting it to RDF or SKOS is likely to be much easier. However, at the moment, RDF and SKOS have limitations, especially in handling faceted taxonomies, so beware of losing semantic richness in the conversion process. Polyhierarchies offer a way of expressing facets in SKOS.
4) Vocabulary control and alignment continue to be significant issues for the Semantic Web.
5) Ontology curation, management, and semantic alignment will be increasingly important issues for the Semantic Web.

Slides and audio recordings of all 21 talks can be now downloaded from the conference website.

Conference proceedings are published by Ergon Verlag and can now be
purchased/ordered online from http://seminar.udcc.org/2011/php/proceedings.php.

Classification and ontology in specific subjects – UDCC Seminar

    Start a conversation 
Estimated reading time 3–5 minutes

Day two of the UDC consortium international seminar opened with two subject-specific talks – Wolfram Sperber described a classification of mathematics and Andrew Buxton showed how similar chemistry classification and ontologies are, using the ChEBI ontology. He also described the different ways classifications and ontologies could be used to support each other and about the lack of good graphical tools and visualisations to represent ontologies.

Categories and relations: key elements of ontologies – Categorial Distinctions

Roberto Poli, University of Trento (Italy) talked about the compliexisties of part-whole relationships. There are simple wholes, composed of a sum of their parts, but some parts of wholes cannot simply be added together – for example, the social, psychological, and physical aspects of a person. He also discussed the difference between science as epistemological – dealing with what can be known – and ontological – deraling with what exists.

Towards a relation ontology for the Semantic Web

Dagobert Soergel made a bold claim that the only way for the Semantic Web to deliver its promise is if we adopt a relation ontology and map each dataset to the standard, to allow interoperability. He pointed out that you “do not getting semantics from syntax alone”.

Relations in the notational hierarchy of the Dewey Decimal Classification

Rebecca Green from OCLC described the difficulties encountered when trying to automatically create ontologies from the Dewey Decimal Classification. These included semantic differences in the way subclasses had been defined, meaning that no single rule would handle them all appropriately.

Modelling concepts and structures in analytico-synthetic classifications

The eminent Ingetraut Dahlberg compared Aristotle and Ranganathan’s key facets and UDC and Colon Classification systems. She also presented a survey of academic subject areas analysed into facets.

Representing the structural elements of a freely faceted classification

Claudio Gnoli of the University of Pavia, talked about freely faceted classifications, in comparison with systems such as UDC. He emphasised the urgency of publishing classifications on line, but highlighted the limitations of SKOS and OWL to fully expressed faceted systems despite the fact that faceted systems are extremely good tools for obtaining precise search results. Faceted systems are also excellent for combining information across disciplines, allowing you to combine aspects of one subject areas with aspects of a different one, and interdisciplinarity is becoming increasingly important as an approach, as innovation often happens at the boundaries between disciplines.

He pointed out that a polyhierarchical approach can be modelled in SKOS as a way of representing facets, but that this approach is often overlooked. He also called for more work to be done on SKOS so that it can represent facets directly.

Facet analysis as a tool for modelling subject domains and terminologies

Vanda Broughton, University College London, offered the Bliss Classification as a useful tool for online subject classification, but called for help in how best to publish it for general use. Should it be released as a text document, database, or should work be done to convert it to an ontology – and if so, in what form?

She stressed how the logical approach of facet analysis and regular syntax makes it predictable and hence ideal for machine manipulation.

Analytico-synthetic approach for handling knowledge diversity in media content analysis

Devika P. Madalli, Indian Statistical Institute, DRTC (India), described the Living Knowledge project that used an analytico-synthetic approach in order to bring together around useful themes diverse content from different sources using varied means of expression. This supported a rich faceted search system.

Slides and audio recordings of all 21 talks can be now downloaded from the conference website.

Conference proceedings are published by Ergon Verlag and can now be
purchased/ordered online from http://seminar.udcc.org/2011/php/proceedings.php.

Classification meets the Web – UDCC Seminar 2011

    Start a conversation 
Estimated reading time 2–2 minutes

This post is 4th in a series about the UDC consortium international seminar in The Hague, 19-20 September, 2011.

Interoperability of knowledge organization systems with and through ontologies

Daniel Kless from the University of Melbourne pointed out that problems with ontologies arise when combining them, as errors in combination can have disastrous effects on subsequent reasoning. A well-defined modelling method is needed to minimise this. Standards such as OWL and RDF do not address the problems of methodology or terminology control.

Towards the integration of knowledge organization systems with the linked data cloud

Vincenzo Maltese of the University of Trento, Italy, explained how it is vital to make clear the semantics and purpose of any ontology when attempting to share Linked Data. Ontologies may differ in their scope, purpose, structure, terminology, language, coverage, formality, and conceptualization. He drew a distinction between descriptive ontologies and classification ontologies. It is very easy to convert a descriptive ontology to a classification ontology and the process can be automated, but extremely difficult to convert a classification ontology to a descriptive one and the process requires human intellectual and editorial effort.

Classification and reference vocabulary in linked environment data

Joachim Fock of the Federal Environment Agency (Germany) talked about how they transformed their keyword thesaurus to a Linked Data format.

Classifications and ontologies on their own terms – UDCC Seminar 2011

    Start a conversation 
Estimated reading time 2–2 minutes

This post is the third in a series about the UDC consortium international seminar in The Hague, 19-20 September, 2011.

Approaches to providing context in knowledge representation structures

Barbara Kwasnik, Syracuse University (USA), talked about ways that context can be used as a disambiguation tool, and described different kinds of contexts: warrant, scientific, educational, cultural, etc. However, interdisciplinary approaches can be difficult. It is easy to have different ontological commitments, but you need a mapping to know when and which bits need to work across domains. Ontologies will need updating as the world and world views shift and change, so we need ways of defining their scope, as well as provenance and mappings. There are also difficulties in establishing the neutrality of ontologies.

Interaction between elementary structures in universes of knowledge

Richard P. Smiraglia, University of Wisconsin (USA),
talked about how people want to turn the multidimensional world into a unidimensional top-down model. He pointed out that people tend to assume UDC is like Dewey, but it actually works far more like Ranganathan’s Colon Classification. He called for new theories of organizing knowledge in shifting contexts and theories about how to mediate between concepts and structures like UDC.

Demystifying ontology

Emad Khazraee, Drexel University (USA), talked about how ontological approaches are as old as literature itself, showing a picture of what I think was the ancient Sumerian king list. He talked about boundary objects and the overlap between different academic areas that are interested in knowledge organisation and learning. He also discussed the differences between ontology-as-categorial-analysis and ontology-as-technology.

The role of classification and ontology on the Web – UDCC Seminar 2011

    Start a conversation 
Estimated reading time 2–3 minutes

This post is the second in a series about the UDC consortium international seminar in The Hague, 19-20 September, 2011.

Knowledge Organization Systems (KOSs) as hubs in the Web of Data

In a minor change of schedule, Thomas Baker from the DCMI talked about some of the practical issues with using Linked Data. Provenance data can be recorded as additional information but it is not standardised or an integral part of RDF and this is something that is a growing concern, receving attention from W3C. URI persistence and alignment remain concerns for data managment and governance.

Aligning web vocabularies

Guus Schreiber also dealt with the problem of making sure we are all talking about the same thing when we try to align our vocabularies. He called for ontologists to be modest about what they can achieve and not to try to hide the problems that occur when you try to transfer an ontology form one domain to another. Errors typically occur due to failures to notice subtle differences between domains.

Vocabulary alignment is a complex business that requires a lot of intellectual effort and multiple techniques should be used to reinforce and support each other. It is much better to map small vocabularies to large ones that can then act as “pivots”.

There is still no adequate methodology for evaluating alignments nor for mediating consensus between observers. Perhaps there should be a way of recording the strength of consensus and the presence of disagreements and alternative views.

Classification, Collaboration and the Web of Data

Dan Brickley described three types of graph – the hypertext graph of the Internet’s links between documents, the social graph of links between people, and the factual graph of links between data. Currently Linked Data is bringing together the hypertext and factual graphs, and another step would be to add in the social dimension.

He called for a focus on what the various tools can actually do, to be wary of over-evangelical ontologists, and to remember that subject classifications are strong and robust tools that are more appropriate for many types of work than ontologies.

He said that you could expect Linked Data to solve about a third of your information linking problems.

Classification and Ontology – UDCC Seminar 2011

    Start a conversation 
Estimated reading time 2–4 minutes

I thoroughly enjoyed the third biennial International UDC Consortium seminar at the National Library of the Netherlands, The Hague, last Monday and Tuesday. The UDC conference website includes the full programme and slides and the proceedings have been published by Ergon Verlag.

This is a first of a series of posts covering the conference.

Aida Slavic, UDC editor-in-chief, opened the conference by pointing out that classification is supposed to be an ordered place, but systems and study of it are difficult and complex. We still lack terminology to express and discuss our work clearly. There is now an obvious need to instruct computers to use and interpret classifications and perhaps our work to make our classifications machine readable will also help us explain what we do to other humans.

On being the same as

Professor Patrick Hayes of the Florida Institute for Machine Learning and Cognition delivered the keynote address, pointing out that something so simple as asserting that one thing is the same as another is actually incredibly difficult and one of the problems facing the development of the Semantic Web is that people are asserting that two things are the same when actually they are merely similar.

He explained that the formalisms and logic underpinning the Semantic Web are all slimmed down versions of modern 20th century logic based on a particular world view and set of assumptions. This works very well in theory, but once you start applying such logics to the real messy and complex world with real objects, processes, and ideas, the logics are put under increasing stress.

In logic, when two things are referred to as the same, this means they are two different names for the same thing, not that there are two things that are logically equivalent. So, Paris, the city of my dreams, and Paris the administrative area, Paris throughout history, and Paris – the capital of France are not necessarily all the same. This means that in logic we have to separate out into different versions aspects of an idea that in ordinary language we think of as the same thing.

He described this as the problem of “logic versus Occam” (as in Occam’s razor). Logic drives us to create complexity, in that we have to precisely define every aspect of a concept as a different entity. In order for the Semantic Web to work, we need to be very clear about our definitions so that we don’t muddle up different aspects of a concept.