Category Archives: search

Making KO Work: integrating taxonomies into technology

Lincoln Cathedral
    Start a conversation 
Estimated reading time 6–10 minutes

The recent ISKO UK event Making KO Work: integrating taxonomies into technology offered four very different but complementary talks, followed by a panel session. These provided a good overview of current practice and largely concluded that although technology has advanced, there is still need for human intervention in KO work.

Can You Really Implement Taxonomies in Native SharePoint?

Marc Stephenson from Metataxis gave a clear and helpful overview of the key terms and principles you need to know when using taxonomies and folksonomies in SharePoint. SharePoint is very widely used as an enterprise document repository, and although its taxonomy management capabilities are limited, when combined with an external taxonomy management solution, it can enable very effective metadata capture.

The first step is to become familiar with the specialised terminology that SharePoint uses. Metadata in SharePoint is held as “Columns”, which can be System Columns that are fixed and integral to SharePoint functionality, or Custom Columns, which can be changed and which need to be managed by an information architecture role. For example, Columns can be set as “Mandatory” to ensure users fill them in. Columns can be configured to provide picklists or lookups, as well as being free text, and can be specified as “numeric”, “date” etc. Taxonomies can be included as “Managed Metadata”.

Different “Content Types” can be defined, for example to apply standardised headers and footers to documents, enforce workflow, or apply a retention/disposal policy, and many different pre-defined Content Types are available. Taxonomies are referred to as “Managed Term Sets”, and these can be controlled by a taxonomist role. “Managed Keywords” are essentially folksonomic tags, but SharePoint allows these to be transferred into Managed Term Sets, enabling a taxonomist to choose folksonomic tags to become part of more formal taxonomies.

The “Term Store Manager” provides some functionality for taxonomy management, such as adding synonyms (“Other Labels”), or deprecating terms so that they can no longer be found by users when tagging (but remain available for search). Terms can also be deleted, but that should only be done if there is a process for re-tagging documents, because a deleted tag will generate a metadata error the next time someone tries to save the document. Limited polyhierarchy is possible, because the same term can exist in more than one “Managed Term Set”.

“Term Groups” can be defined, which can be useful if different departments want to manage their own taxonomies.

There are various limitations – such as a maximum number of Managed Terms in a Term Set (30,000) and if SharePoint is deployed online across a large organisation, changes can take some time to propagate throughout the system. The process of importing taxonomies needs to be managed carefully, as there is no way to re-import or over-write Term Sets (you would end up with duplicate sets) and there is no easy way to export taxonomies. There is no provision for term history or scope notes, and no analytics, so SharePoint lacks full taxonomy management functionality.

There are companion taxonomy management products (e.g. SmartLogic’s Semaphore, or Concept Searching) and it is possible to use other taxonomy management tools (such as PoolParty, Synaptica, or MultiTes) but an additional import/export process would need to be built.

So, SharePoint offers a lot of options for metadata management, but is better as a taxonomy deployment tool than a master taxonomy management tool.

Integrating Taxonomy with Easy, Semantic Authoring

Joe Pairman of Mekon Ltd, demonstrated a very user-friendly lightweight set of tagging tools that allow non-expert users the ability to add rich metadata to content as they work. This addresses a key problem for taxonomists – how to ensure subject matter experts or authors who are more focused on content than metadata are able to tag consistently, quickly, and easily. By taking a form-based approach to content creation, authors are able to add structural metadata as they work, and add tags to specific words with a couple of clicks. This is particularly effective with a pre-defined controlled vocabulary.

The example Joe showed us was a very clear commercial use case of Linked Data, because the controlled vocabulary was very specific – products for sale. Each product was associated with a DBPedia concept, which provided the URI, and where a match to the text was detected the relevant word was highlighted. The user could then click on that word, see the suggested DBPedia concept, and click to tag. The tool (using FontoXML and Congility technology) then applied the relevant RDF to the underlying XML document “behind the scenes”, in a process of “inline semantic enrichment”. This approach enables accurate, author-mediated tagging at a very granular level. The customers reading the content online could then click on the hghlighted text and the relevant products could be displayed with an “add to cart” function, with the aim of increasing sales. As an added bonus, the tags are also available for search engines, helping surface very accurately relevant content in search results. (Schema.org tags could also be included.)

Enhancement of User Journeys with SOLR at Historic England

Richard Worthington of Historic England described the problems they had when deploying a SOLR/Lucene search to their documents without any taxonomy or thesaurus support for searching. They soon found that SQL searches were too blunt an instrument to provide useful results – for example, searching for “Grant” at first would bring up the page about the grants that were offered, but as soon as they added more data sets, this frequently searched-for page became buried under references to Grantchester, Grantham, etc.

Although they could manage relevancy to a certain extent at the data set level and by selecting “top results” for specific searches, the search team realised that this would be a painstaking and rigid process. It would also not address the problem that many terms used by the subject matter expert authors were not the same as the terms general users were searching for. For example, general users would search for “Lincoln Cathedral” rather than “Cathedral Church of St Mary of Lincoln”. So, they have much work for human taxonomists and thesaurus editors to do.

Applied Taxonomy Frameworks: Your Mileage May Vary

Alan Flett of SmartLogic took us through the latest enhancements to their products, showcasing a new feature called “Fact Extraction”. This works by identifying the context around specific data and information, in order to drive Business Intelligence and Analytics. The tool is essentially a user-friendly simplified algorithm builder that allows very specific searches to be constructed using pre-defined “building blocks”, such as “Facts”, “Entities”, and “Skips”. This means a specific piece of information, words to ignore, and entities such as a number or a date can be specified to construct a complex search query. This allows the search results to be defined by context and returned in context, and is especially effective for well-structured data sets. It also means that results are framed in a standardized format, which is useful for analytics.

Concluding Panel

Although techniques such as automated classification, machine learning, and AI are progressing all the time, these still work best when combined with a well-structured knowledge base. Creating that knowledge base relies on human intelligence, especially for the familiar problems of disambiguation and synonym collection, in particular where the content authors have a different approach or level of domain expertise to the end users of the search systems. The panel agreed that for both the creation of thesauruses, taxonomies, and ontologies and for the deployment of these in tagging, semi-automated approaches remain necessary, and so there is still much to be done by human taxonomists, ontologists, and information architects in order to make knowledge organisation work.

Image: Lincoln Cathedral. Photo by Zaphad1

Inadvertent Cruelty – Algorithmic or Organizational?

    Start a conversation 
Estimated reading time 3–4 minutes

In 2013 I asked whether social media were mature enough to handle bereavement in a sensitive manner. Last week Facebook released the options either to have your account deleted when you die or to nominate a trusted legacy manager to take it on for you as a memorial (Facebook rolls out feature for users when they die ).

This was in response to the distress of relatives who wished to retrieve a lost loved one’s account or did not want to undergo the the eerie experience of receiving automated reminders of their birthday or seeing their name or image appear unexpectedly in advertising. The enforced “Year in Review” offerings at the end of last year brought some publicity to the issue, as they also inadvertently caused distress by failing to consider the feelings of people who had suffered bereavements during the year. The original blog post about this (Inadvertent Algorithmic Cruelty ) went Viral last Christmas. The author quickly called for an end to a wave of casual responses that jumped to glib conclusions about young privileged staff just not thinking about anything bad ever happening (Well, That Escalated Quickly ).

A more cynical response is that there was a deliberate dismissal as ‘Edge cases’ of the minority of people who would not want to have year in review posts – possibly even a coldly calculated costs v. benefits decision, as providing “opt out” options might have required additional work or been seen as dispensible mouseclicks.

I have no idea what happened at Facebeook, or what discussions, processes, and procedures they go through, the public apologies from Facebook do not go into that level of detail. However, “algorithmic cruelty” may be unintentional, but it is not a new phenomenon and in any project there are plenty of opportunities during the design and implementation of any project to think through the potential adverse impacts or pitfalls.

David Crystal at an ISKOUK conference in 2009 talked about the problem of avoiding inappropriate automated search engine placement of advertisements, for example ads for a set of kitchen knives alongside a story about a fatal stabbing. There was a certain naivety with early automated systems, but it did not take long for the industry in general to realise that unfortunate juxtapositions are not unusual incidents. Most people who have worked in semantics have plenty of anecdotes of either cringeworthy or hilarious mismatches and errors arising from algorithmic insensitivity to linguistic ambiguity.

Facebook’s latest thoughtlessness arises more from a failure to respect their users than through lack of sophistication in their algorithm (there doesn’t seem to be anything particularly complex about selecting photos and bunging some automated captions on them). Simply offering users the choice to look or not look or giving users the tools to build their own would have spared much heartache.

The origins of UX championed by people such as Don Norman and Peter Morville and Louis Rosenfeld placed user needs front and centre. Good design was about seeing your users as real people with physical and emotional needs as human beings, and designing to help their lives go more smoothly, rather than designing to exploit them as much as possible.


Aggregations and basic categories

    Start a conversation 
Estimated reading time 2–3 minutes

I recently enjoyed reading about the work Safari are currently and doing to create a controlled vocabulary and topic aggregation pages to underpin navigation and discovery of their content.

Iterate, again

I very much liked the mix of manual and automated techniques the team used to maximise capturing value from existing resources while using machine processing to help and support the human editorial curation work. Lightweight iterative approaches have become standard in some areas of design, but achieving high quality information structures also usually requires several stages of revision and refinement. It is not always possible to predict what will happen in attempts to index or repurpose existing content, nor how users will respond to different information structures, and so the ability to iterate, correct, re-index, correct, adjust indexing methods, re-index, correct… is vital. Small samples of content are often not sufficient to find all potential issues or challenges, so it is always worth being prepared for surprises once you scale up.

Basics, as always

The Safari team identified the huge intellectual value locked into the existing human-created indexes and it is great to see them being able to extract some of that value, but then augment it using automated techniques. I was very interested to read about how the level of granularity in the individual indexes was too fine for overall aggregation. The team realised that there were “missing subtopics” – key topics that tended to be the subjects of entire books. These “missing subtopics” were found at the level of book titles and it struck me that this vital level of conceptualization aligns directly with Eleanor Rosch‘s work on basic categories and prototype theory. It is not surprising that the concepts that are “basic categories” to the likely readership would be found at book title level, rather than index level.

This is further illustrated by the fact that the very broad high level topics such as “business” did not work well either. These needed not to be “clustered up”, but broken down and refined to the level of the “basic categories” that people naturally think of first.

So, the Safari team’s work is a very clear illustration of not only how to combine manual and automated techniques but also how to find the “basic categories” that match users’ natural level of thinking about the subject area.


How semantic search helps girls and boys, but in different ways

    Start a conversation 
Estimated reading time 2–4 minutes

While researching something else, I happened upon this rather cheering paper: The effect of semantic technologies on the exploration of the web of knowledge by female and male users. Gender issues only tangentially affect my core research, as I generally focus on linguistic communities that are defined by organizational or professional context, so gender effects are rather diluted by that point. I also prefer collapsing false dichotomies rather than emphasizing difference and division, and so I was very heartened that this article shows how semantic techniques can be unifying.

The study is based on observing the search strategies of a group of male and a group of female students in Taiwan. Given non-semantic search systems to use, the male students tended to search very broadly and shallowly, skimming large numbers of results and following links and going off on tangents to find other results. This enabled them to cover a lot of ground, often finding something useful, but also often left them with a rather chaotic collection of results and references. The female students tended to search very deeply and narrowly, often stopping to read in depth a paper that they had found, and trying to fully grasp the nature of the results that had been returned. This meant they tended to collect fewer results overall, the results tended to be clustered around a single concept, and they risked falling into the rather wonderfully named “similarity holes”. These “similarity holes” are search traps where a single search term or collection of terms leads to a small set of results and are essentially “dead ends”.

How did semantic search help?

When the students were given semantic search tools, the male students continued to search broadly and shallowly but the semantic associations helped them to conceptualize and organize what they were doing. This meant that they ended up with a far more coherent, relevant, and useful set of search results and references. In contrast, the female students using the semantic associations offered, found it far easier to broaden their searches and to come up with alternative search terms and approaches enabling them to avoid and break out of any “similarity holes” they fell into.

Gender effects dissipate

I was very heartened that improvements in technology can be gender-neutral – they can simply be improvements of benefit in different ways to everyone, they don’t have to deliberately try to account for gender difference. I was also very heartened to note that the researchers found that gender differences in search strategies dissipated once students were taught advanced information seeking and knowledge management strategies. Gender differences were only apparent in novice, inexperienced searchers. So, in information seeking work at least, any biological or socially created gender differences are weak and easily overcome with some well directed instruction and semantic techniques are a help rather than a hindrance.

Semantic Search – Call for Papers for Special Issue on Semantic Search for Aslib Journal

    1 comment 
Estimated reading time 4–6 minutes

This special issue aims to explore the possibilities and limitations of Semantic Search. We are particularly interested in papers that place carefully conducted studies into the wider framework of current Semantic Search research in the broader context of Linked Open Data.

Research into Semantic Search and its applications has been gaining momentum over the last few years, with an increasing number of studies on general principles, proof of concept and prototypical applications. The market for Semantic Search applications and its role within the general development of (internet) technologies and its impact on different areas of private and public life have attracted attention. Simultaneously, many publicly funded projects in the field of cultural heritage were initialised. Researchers in many disciplines have been making progress in the establishment of both theories and methods for Semantic Search. However, there still is a lack of comparison across individual studies as well as a need for standardisation regarding the dissociation of Semantic Search of other search solutions, agreed upon definitions as well as technologies and interfaces.

Semantic Search research is often based on large and rich data sets and a combination of techniques ranging from statistical bag of words approaches and natural-language-processing enriched via a subtle utilisation of metadata over classificatory approaches right up to ontological reasoning. Over the last 10 years a lot of initial technical and conceptual obstacles in the field of Semantic Search have been overcome. After the initial euphoria for Semantic Search that resulted in a technically driven supply of search solutions, appraisal of successful and less successful approaches is needed. Amongst other things the limitations of working with open world solutions on – only apparently comprehensive – linked open data sets compared to small domain specific solutions need to be determined.
One ongoing challenge for semantic search solutions is their usability and user acceptance, as only highly usable walk-up-and-use-approaches stand a chance in the field of general search.

For this special issue, we invite articles which address the opportunities and challenges of Semantic Search from theoretical and practical, conceptual and empirical perspectives.

Topics of interest include but are not restricted to:

  • The history of semantic search – how the latest techniques and technologies have come out of developments over the last 5, 10, 20, 100, 2000… years
  • Technical approaches to semantic search : linguistic/NLP, probabilistic, artificial intelligence, conceptual/ontological …
  • Current trends in Semantic Search
  • Best practice – how far along the road from ‘early adopters’ to ‘mainstream users’ has semantic search gone so far?
  • Semantic Search and cultural heritage
  • Usability and user experience of Semantic Search
  • Visualisation and Semantic Search
  • Quality criteria for Semantic Search
  • Impact of norms and standardisation for instance (like ISO 25964 “Thesauri for information retrieval“) and the potential of Semantic Search?
  • How are semantic technologies fostering a need for cross-industry collaboration and standardisation?
  • How are Semantic Search techniques and technologies being used in practice?
  • Practical problems in brokering consensus and agreement – defining concepts, terms and classes, etc.
  • Curation and management of ontologies
  • Differences between web-scale, enterprise scale, and collection-specific scale techniques
  • Evaluation of Semantic Search solutions
  • Comparison of data collection approaches
  • User behaviour and the evolution of norms and conventions
  • Information behaviour and information literacy
  • User surveys
  • Usage scenarios and case studies

Submissions

Papers should clearly connect their studies to the wider body of Semantic Search scholarship, and spell out the implications of their findings for future research. In general, only research-based submissions including case studies and best practice will be considered. Viewpoints, literature reviews or general reviews are generally not acceptable.

Papers should be 4,000 to 6,000 words in length (including references). Citations and references should be in our journal style.

Please see the author guidelines at http://www.emeraldinsight.com/products/journals/author_guidelines.htm?id=ap for more details and submission instructions.
Submissions to Aslib Proceedings are made using ScholarOne Manuscripts, the online submission and peer review system. Registration and access is available at http://mc.manuscriptcentral.com/ap.

Important Dates

Paper submission: 15.12.2013
Notice of review results: 15.02.2013
Revisions due: 31.03.2014
Publication: Aslib Proceedings, issue 5, 2014.

About the Journal

Aslib Proceedings (ISSN: 0001-253X) is a peer-reviewed high-quality journal covering international research and practice in library and information science, and information management. The journal is the major publication for ASLIB – the Association for Information Management in the United Kingdom – a membership association for people who manage information and knowledge in organisations and the information industry.
Information about the journal can be found at
http://www.emeraldinsight.com/products/journals/journals.htm?id=ap

Contact the guest editors

Prof. Dr. Ulrike Spree
- Hamburg University of Applied Sciences -
Faculty Design, Medien and Information
Department Information
Finkenau 35
20081 Hamburg
Phone: +49/40/42875/3607
Email: ulrike.spree@haw-hamburg.de

Fran Alexander
Information Architect, BCA Research (2013- )
Taxonomy Manager, BBC Information and Archives (2009-13)
Email: fran@vocabcontrol.com
Twitter: @frangle

To index is to translate

    Start a conversation 
Estimated reading time 3–4 minutes

Living in Montreal means I am trying to improve my very limited French and in trying to communicate with my Francophone neighbours I have become aware of a process of attempting to simplify my thoughts and express them using the limited vocabulary and grammar that I have available. I only have a few nouns, fewer verbs, and a couple of conjunctions that I can use so far and so trying to talk to people is not so much a process of thinking in English and translating that into French, as considering the basic core concepts that I need to convey and finding the simplest ways of expressing relationships. So I will say something like “The sun shone. It was big. People were happy” because I can’t properly translate “We all loved the great weather today”.

This made me realise how similar this is to the process of breaking down content into key concepts for indexing. My limited vocabulary is much like the controlled vocabulary of an indexing system, forcing me to analyse and decompose my ideas into simple components and basic relationships. This means I am doing quite well at fact-based communication, but my storytelling has suffered as I have only one very simple emotional register to work with. The best I can offer is a rather laconic style with some simple metaphors: “It was like a horror movie.”

It is regularly noted that ontology work in the sciences has forged ahead of that in the humanities, and the parallel with my ability to express facts but not tell stories struck me. When I tell my simplified stories I rely on shared understanding of a broad cultural context that provides the emotional aspect – I can use the simple expression “horror movie” because the concept has rich emotional associations, connotations, and resonances for people. The concept itself is rather vague, broad, and open to interpretation, so the shared understanding is rather thin. The opposite is true of scientific concepts, which are honed into precision and a very constrained definitive shared understanding. So, I wonder how much of sense that I can express facts well is actually an illusion, and it is just that those factual concepts have few emotional resonances.

A major aspect of poetry is about extending the meanings of words to their limits, to allow for the maximum emotional resonance and personal interpretation. Perhaps poetry speaks to individuals precisely because it doesn’t evoke a shared understanding but calls out new meanings and challenges the reader to think differently, to find new meanings? This is the opposite of indexing, which is about simplifying and constraining to the point at which all the fuzziness is driven away and you are left with nothing but “dead metaphors”. The only reason indexing the sciences seems easier is because so many scientific concepts have been analyzed and defined to this point already, doing much of the indexer’s work for them.

I am not sure if these musings have any practical applications. People sometimes ask me if I think my previous studies of languages and literature have helped in my current work. I have known many excellent monolingual indexers but am also aware that many people who are good at semantics speak more than one language. However, I am sure it is helpful to think of the process of indexing as a form of translation, albeit if the idea of removing all the poetry from language in order to create a usable, useful index is not at all romantic!

Tagging the cart before the horse – Getting your project plan in order

    Start a conversation 
Estimated reading time 6–9 minutes

When people launch search improvement or information organziation projects, one of the commonest mistakes is to be over-eager to “just get the content indexed or tagged” without spending enough time and thought on the structure of an index, what should be tagged, and how the tags themselves should be structured.

This typically happens for two reasons:
1. The project managers – often encouraged by service providers who just want to get their hands on the cheque – simply underestimate the amount of preparatory work involved, whether it is structuring and testing a taxonomy, setting up and checking automated concept extaction rules, or developing a comprehensive domain model and tag set, so they fail to include enough – if any – of a development and testing stage in the plan. This often happens when the project is led by people who do not work closely with the content itself. Projects led by marekting or IT departments often fall into this trap.

2. The project managers include development and testing, with iterative correction and improvement phases, but are put under pressure to cut corners, or to compress deadlines.  This tends to happen when external forces affect timescales – for example local government projects that have to spend the budget before the end of the financial year. It can also happen when stakeholder power is unevenly distributed – for example, the advice of information professionals is sought but then over-ruled by more powerful stakeholders who have a fixed deadline in mind – for example a launching a new website in time for the Christmas market.

Forewarned is forearmed

Prevention is better than cure in both these scenarios, but easier said than done. Your best defence is to understand organizational culture, politics, and history and to evangelize the role and importance of information work and your department. Find out which departments have initiated information projects in the past, which have the biggest budgets, which have the most proactive leadership teams, then actively seek allies in those departments. Find out if there are meetings on information issues you could attend, offer to help, or even do something like conduct a survey on information use and needs and ask for volunteers to be interviewed.  Simply by talking to people at any level in those departments you will start to find out what is going on, and you will remind people in those departments of your existence and areas of expertise.

On a more formal level, you can look at organizational structures and hierarchies and make sure that you have effective chains of communication that follow chains of command. This may mean supporting your boss in promoting the work of your department to their boss. This is especially important in organizations with lots of layers of middle management, as middle managers can get so caught up in day to day work that longer term strategy can get put on the back burner, so offer support.

If you find out about projects early enough, you have a chance of influencing the project planning stages to make sure information and content issues are given the attention they need, right from the start.

Shutting the stable door…

Sometimes despite our best efforts we end up in a project that is already tripping over itself. A common scenario is for tagging work to be presented as a fait accompli. This is particularly likely with fully automated tagging work, as processing can be done far faster than any manual tagging effort. However, it is highly unusual for any project to be undertaken without its being intended to offer some sort or service or solve some recognized problem.

Firstly, assess how well it achieves its intended goals. If you have only been called into the project at the late stage, is this because it is going off the rails and the team want a salvage solution, or is it because it works well in one context and the team want to see if it can be used more widely? If it is the latter, that’s great – you can enjoy coming up with lots of positive and creative proposals. However, the core business planning principles are pretty much the same whether you are proposing to extend a successful project or corralling one that is running out of control.

Once you know what the project was meant to achieve, assess how much budget and time you have left, as that will determine the scope to make changes and improvements. Work out what sort of changes are feasible. Can you get an additional set of tags applied for example? Can you get sets of tags deleted? Are you only able to make manual adjustments or can you re-run automated processes? How labour intensive are the adjustment processes? Is chronology a factor – in other words can you keep the first run for legacy content but evolve the processes for future content?

These assessments are especially valuable for projects that are at an intermediate stage as there is much more scope to alter their direction. In these cases it is vital to prioritize and focus on what can be changed in a pragmatic way. For example, if the team are working chronologically through a set of documents, you may have time to undertake planning and assessment work focused on the most recent and have that ready before they get to a logical break point. So, you prioritize developing a schema relevant to the current year, and make a clean break on a logical date, such as January 1. If they have been working topic by topic, is there a new search facet you could introduce and get a really good set for that run as a fresh iteration?

If there are no clean breakpoints or clear sets of changes to be made, focus on anything that is likely to cause user problems or confusion or serious information management problems in future. What are likely to cause real pain points? What are the worst of those?

Once you have identified the worst issues and clarified the resources you have for making the changes, you have the basis for working up the time and money you need to carry them out. This can form the basis of your business case and project plan either to improve a faltering project and pull it back on track or to add scope to a project that is going well.

…after the horse has bolted

If there is limited scope to make changes, and the project is presented as already complete, it is still worth assessing how well it meets its goals as this will help you work out how you can best use and present the work that has been done. For example, can it be offered as an “optional extra” to existing search systems?

It is also worth assessing the costs and resource involved in order to make changes you would recommend even if it seems there is no immediate prospect of getting that work done. It is likely that sooner or later someone will want to re-visit the work, especially if it is not meeting its goals. Then it will be useful to know whether it can be fixed with a small injection of resource or whether it requires a major re-working, or even abandoning and starting afresh. Such a prospect may seem daunting, but if you can learn lessons and avoid repeating mistakes the next time around, then that can be seen as a positive. If one of the problems with the project was the lack of input from the information team early on, then it is worth making sure for the sake of the information department and the organization as a whole that the same mistake does not happen again. If you demonstrate well enough how you would have done things differently, you might even get to be in charge next time!

ISKO UK 2013 – provisional programme

    Start a conversation 
Estimated reading time 2–2 minutes

I will probably be on the other side of the Atlantic when the ISKO UK conference takes place in July in London, UK. I will be sorry to miss it, because the committee have brought together a diverse, topical, and fascinating collection of speakers.

ISKO UK excels in unifying academic and practitioner communities, and the conference promises to investigate the barriers that separate research from practice and to seek out boundary objects that can bring the communities together.

This is demonstrated in person by the keynote speakers Patrick Lambe of Straits Knowledge and Martin White of Intranet Focus Ltd – both respected for their commercial as well as academic contributions to the field of Knowledge Organization.

Amidst what is already shaping up to be a very full and varied programme, the presentations by Jeremy Tarling and Matt Shearer (BBC News) and Jarred McGinnis and Helen Lippell (Press Association) will show how research in semantic techniques is now being put to practical use in managing the fast-flowing oceans of information that news organizations handle.

The programme also includes a whole session on combining ontologies with other tools, as well as papers on facet analysis and construction of controlled vocabularies. There’s even some epistemology to please pure theoreticians.

Libraries, Media, and the Semantic Web meetup at the BBC

    Start a conversation 
Estimated reading time 3–4 minutes

In a bit of a blog cleanup, I discovered this post languishing unpublished. The event took place earlier this year but the videos of the presentations are still well worth watching. It was an excellent session with short but highly informative talks by some of the smartest people currently working in the semantic web arena. The Videos of the event are available on You Tube.

Historypin

Jon Voss of Historypin was a true “information altruist”, describing libraries as a “radical idea”. The concept that people should be able to get information for free at the point of access, paid for by general taxation, has huge political implications. (Many of our libraries were funded by Victorian philanthropists who realised that an educated workforce was a more productive workforce, something that appears to have been largely forgotten today.) Historypin is seeking to build a new library, based on personal collections of content and metadata – a “memory-sharing” project. Jon eloquently explained how the Semantic Web reflects the principles of the first librarians in that it seeks ways to encourage people to open up and share knowledge as widely as possible.

MIMAS

Adrian Stevenson of MIMAS described various projects including Archives Hub, an excellent project helping archives, and in particular small archives that don’t have much funding, to share content and catalogues.

rNews

Evan Sandhaus of the New York Times explained the IPTC’s rNews – a news markup standard that should help search engines and search analytics tools to index news content more effectively.

schema.org

Dan Brickley’s “compare and contrast” of Universal Decimal Classification with schema.org was wonderful and he reminded technologists that it very easy to forget that librarians and classification theorists were attempting to solve search problems far in advance of the invention of computers. He showed an example of “search log analysis” from 1912, queries sent to the Belgian international bibliographic service – an early “semantic question answering service”. The “search terms” were fascinating and not so very different to the sort of things you’d expect people to be asking today. He also gave an excellent overview of Lonclass the BBC Archive’s largest classification scheme, which is based on UDC.

BBC Olympics online

Silver Oliver described how BBC Future Media is pioneering semantic technologies and using the Olympic Games to showcase this work on a huge and fast-paced scale. By using semantic techniques, dynamic rich websites can be built and kept up to the minute, even once results start to pour in.

World Service audio archives

Yves Raimond talked about a BBC Research & Development project to automatically index World Service audio archives. The World Service, having been a separate organisation to the core BBC, has not traditionally been part of the main BBC Archive, and most of its content has little or no useful metadata. Nevertheless, the content itself is highly valuable, so anything that can be done to preserve it and make it accessible is a benefit. The audio files were processed through speech-to-text software, and then automated indexing applied to generate suggested tags. The accuracy rate is about 70% so human help is needed to sort out the good tags from the bad (and occasionally offensive!) tags, but thsi is still a lot easier than tagging everything from scratch.