Category Archives: libraries and museums

The Information Master – Louis XIV’s Knowledge Manager

    2 comments 
Estimated reading time 4–6 minutes

I recently read The Information Master: Jean-Baptiste Colbert‘s Secret State Intelligence System by Jacob Soll. It is a very readable but scholarly book that tells the story of how Colbert used the accumulation of knowledge to build a highly efficient administrative system and to promote his own political career. He seems to have been the first person to seize upon the notion of “evidence-based” politics and that knowledge, information and data collection, and scholarship could be used to serve the interests of statecraft. In this way he is an ancestor of much of the thinking that is commonplace not only in today’s political administrations but also in all organizations that value the collection and management of information. The principle sits at the heart of what we mean by the “knowledge economy”.

The grim librarian

Jean-Baptiste Colbert (1619-83) is depicted as ruthless, determined, fierce, and serious. He was an ambitious man and saw his ability to control and organize information as a way of gaining and then keeping political influence. By first persuading the King that an informed leadership was a strong and efficient leadership, and then by being the person who best understood and knew how to use the libraries and resources he collected, Colbert rose to political prominence. However, his work eventually fell victim to the political machinations of his rivals and after his death his collection was scattered.

Using knowledge to serve the state

Before Colbert, the scholarly academic tradition in France had existed independently from the monarchy, but Colbert brought the world of scholarship into the service of the state, believing that all knowledge – even from the most unlikely of sources – had potential value. This is very much in line with modern thinking about Big Data and how that can be used in the service of corporations. Even the most unlikely of sources might contain useful insights into customer preferences or previously unseen supply chain inefficiencies, for example.

Colbert’s career was caught up with the political machinations of the time. He worked as a kind of accountant of Cardinal Mazarin, but when Mazarin’s library was ransacked by political rivals and his librarian fell out of favour, Colbert restored the library and built a unified information system based on the combination of scholarship and administrative documentation, ending the former division between academia and government bureaucracy.

Importance of metadata

Colbert also instinctively grasped the importance of good metadata, cataloguing, and an accurate network of links and cross references in order to be able to obtain relevant and comprehensive information quickly, issues that remain even more urgent than ever given the information explosions modern organizations – and indeed nations – face. This enabled him to become a better administrator than his rivals and by becoming not only the source of political expedient information but also the person who knew how to use the information resources most effectively, he was able to gain political influence and become a key minister under Louis XIV.

A personal vision

I was struck by how much his vast library, archive, and document management system was the result of his own personal vision, how it was built on the dismantling and rebuilding of work of predecessors, but also how, after his death, the system itself fell victim to political changes and was left to collapse. This pattern is repeated frequently in modern information projects. So often the work of the champion of the original system is wasted as infighting that is often not directly connected to the information project itself leads to budget cuts, staff changes, or other problems that lead to the system decaying.

Soll argues that the loss of Colbert’s system hampered political administration in France for generations. Ironically, it was Colbert’s own archives that enabled successive generations of political rivals to find the documents with which to undermine the power of the crown, showing the double-edged nature of information work. It is often the same collections that can both redeem and condemn.

Secrecy or transparency?

Another theme that ran throughout Colbert’s career, with both political and practical implications, was the tension between demands for transparent government and the desire for a secret state. Much of the distinction between public and private archives was simply a matter of who was in control of them and who had set them up, so the situation in France under the monarchy was different to the situation in England where Parliament and the Monarchy maintained entirely separate information systems. In France, an insistence on keeping government financial records secret eventually undermined trust in the economy. Throughout his career Colbert was involved in debates over which and how much government information should be made public, with different factions arguing over the issue – arguments that are especially resonant today.

Semantic Search – Call for Papers for Special Issue on Semantic Search for Aslib Journal

    1 comment 
Estimated reading time 4–6 minutes

This special issue aims to explore the possibilities and limitations of Semantic Search. We are particularly interested in papers that place carefully conducted studies into the wider framework of current Semantic Search research in the broader context of Linked Open Data.

Research into Semantic Search and its applications has been gaining momentum over the last few years, with an increasing number of studies on general principles, proof of concept and prototypical applications. The market for Semantic Search applications and its role within the general development of (internet) technologies and its impact on different areas of private and public life have attracted attention. Simultaneously, many publicly funded projects in the field of cultural heritage were initialised. Researchers in many disciplines have been making progress in the establishment of both theories and methods for Semantic Search. However, there still is a lack of comparison across individual studies as well as a need for standardisation regarding the dissociation of Semantic Search of other search solutions, agreed upon definitions as well as technologies and interfaces.

Semantic Search research is often based on large and rich data sets and a combination of techniques ranging from statistical bag of words approaches and natural-language-processing enriched via a subtle utilisation of metadata over classificatory approaches right up to ontological reasoning. Over the last 10 years a lot of initial technical and conceptual obstacles in the field of Semantic Search have been overcome. After the initial euphoria for Semantic Search that resulted in a technically driven supply of search solutions, appraisal of successful and less successful approaches is needed. Amongst other things the limitations of working with open world solutions on – only apparently comprehensive – linked open data sets compared to small domain specific solutions need to be determined.
One ongoing challenge for semantic search solutions is their usability and user acceptance, as only highly usable walk-up-and-use-approaches stand a chance in the field of general search.

For this special issue, we invite articles which address the opportunities and challenges of Semantic Search from theoretical and practical, conceptual and empirical perspectives.

Topics of interest include but are not restricted to:

  • The history of semantic search – how the latest techniques and technologies have come out of developments over the last 5, 10, 20, 100, 2000… years
  • Technical approaches to semantic search : linguistic/NLP, probabilistic, artificial intelligence, conceptual/ontological …
  • Current trends in Semantic Search
  • Best practice – how far along the road from ‘early adopters’ to ‘mainstream users’ has semantic search gone so far?
  • Semantic Search and cultural heritage
  • Usability and user experience of Semantic Search
  • Visualisation and Semantic Search
  • Quality criteria for Semantic Search
  • Impact of norms and standardisation for instance (like ISO 25964 “Thesauri for information retrieval“) and the potential of Semantic Search?
  • How are semantic technologies fostering a need for cross-industry collaboration and standardisation?
  • How are Semantic Search techniques and technologies being used in practice?
  • Practical problems in brokering consensus and agreement – defining concepts, terms and classes, etc.
  • Curation and management of ontologies
  • Differences between web-scale, enterprise scale, and collection-specific scale techniques
  • Evaluation of Semantic Search solutions
  • Comparison of data collection approaches
  • User behaviour and the evolution of norms and conventions
  • Information behaviour and information literacy
  • User surveys
  • Usage scenarios and case studies

Submissions

Papers should clearly connect their studies to the wider body of Semantic Search scholarship, and spell out the implications of their findings for future research. In general, only research-based submissions including case studies and best practice will be considered. Viewpoints, literature reviews or general reviews are generally not acceptable.

Papers should be 4,000 to 6,000 words in length (including references). Citations and references should be in our journal style.

Please see the author guidelines at http://www.emeraldinsight.com/products/journals/author_guidelines.htm?id=ap for more details and submission instructions.
Submissions to Aslib Proceedings are made using ScholarOne Manuscripts, the online submission and peer review system. Registration and access is available at http://mc.manuscriptcentral.com/ap.

Important Dates

Paper submission: 15.12.2013
Notice of review results: 15.02.2013
Revisions due: 31.03.2014
Publication: Aslib Proceedings, issue 5, 2014.

About the Journal

Aslib Proceedings (ISSN: 0001-253X) is a peer-reviewed high-quality journal covering international research and practice in library and information science, and information management. The journal is the major publication for ASLIB – the Association for Information Management in the United Kingdom – a membership association for people who manage information and knowledge in organisations and the information industry.
Information about the journal can be found at
http://www.emeraldinsight.com/products/journals/journals.htm?id=ap

Contact the guest editors

Prof. Dr. Ulrike Spree
– Hamburg University of Applied Sciences –
Faculty Design, Medien and Information
Department Information
Finkenau 35
20081 Hamburg
Phone: +49/40/42875/3607
Email: ulrike.spree@haw-hamburg.de

Fran Alexander
Information Architect, BCA Research (2013- )
Taxonomy Manager, BBC Information and Archives (2009-13)
Email: fran@vocabcontrol.com
Twitter: @frangle

Online Information Conference – day two

    1 comment 
Estimated reading time 6–10 minutes

Linked Data in Libraries

I stayed in the Linked Data track for Day 2 of the Online Information Conference, very much enjoying Karen Coyle‘s presentation on metadata standards – FRBR, FRSAR, FRAD, RDA – and Sarah Bartlett‘s enthusiasm for using Linked Data to throw open bibliographic data to the world so that fascinating connections can be made. She explained that while the physical sciences have been well mapped and a number of ontologies are available, far less work has been done in the humanities. She encouraged humanities researchers to extend RDF and develop it.

In the world of literature, the potential connections are infinite and very little numerical analysis has been done by academics. For example, “intertextuality” is a key topic in literary criticism, and Linked Data that exposes the references one author makes to another can be analysed to show the patterns of influence a particular author had on others. (Google ngrams is a step in this direction, part index, part concordance.)

She stressed that libraries and librarians have a duty of care to understand, curate, and manage ontologies as part of their professional role.

Karen and Sarah’s eagerness to make the world a better place by making sure that the thoughtfully curated and well-managed bibliographic data held by libraries is made available to all was especially poignant at a time when library services in the UK are being savaged.

The Swedish Union Catalogue is another library project that has benefited from a Linked Data approach. With a concern to give users more access to and pathways into the collections, Martin Malmsten asked if APIs are enough. He stressed the popularity of just chucking the data out there in a quick and dirty form and making it as simple as possible for people to interact with it. However, he pointed out that licences need to be changed and updated, as copyright law designed for a print world is not always applicable for online content.

Martin pointed out that in a commercialised world, giving anything away seems crazy, but that allowing others to link to your data does not destroy your data. If provenance (parametadata) is kept and curated, you can distinguish between the metadata you assert about content and anything that anybody else asserts.

During the panel discussion, provenance and traceability – which the W3C is now focusing on (parametadata) – was discussed and it was noted that allowing other people to link to your data does not destroy your data, and often makes it more valuable. The question of what the “killer app” for the semantic web might be was raised, as was the question of how we might create user interfaces that allow the kinds of multiple pathway browsing that can render multiple relationships and connections comprehensible to people. This could be something a bit like topic maps – but we probably need a 13-year-old who takes all this data for granted to have a clear vision of its potential!

Tackling Linked Data Challenges

The second session of day two was missing Georgi Kobilarov of Uberblic who was caught up in the bad weather. However, the remaining speakers filled the time admirably.

Paul Nelson of Search Technologies pointed out that Google is not “free” to companies, as they pay billions in search engine optimisation (SEO) to help Google. Google is essentially providing a marketing service, and companies are paying huge amounts trying to present their data in the way that suits Google. It is therefore worth bearing in mind that Google’s algorithms are not resulting in a neutral view of available information resources, but are providing a highly commercial view of the web.

John Sheridan described using Linked Data at the National Archives to open up documentation that previously had very little easily searchable metadata. Much of the documentation in the National Archives is structured – forms, lists, directories, etc. – which present particular problems for free text searches, but are prime sources for mashing up and querying.

Taxonomies, Metadata, and Semantics: Frameworks and Approaches

There were some sensible presentations on how to use taxonomies and ontologies to improve search results in the third session.
Tom Reamy of KAPS noted the end of the “religious fervour” about folksonomy that flourished a few years ago, now that people have realised that there is no way for folksonomies to get better and they offer little help to infrequent users of a system. They are still useful as a way of getting insights into the kind of search terms that people use, and can be easier to analyse than search logs. A hybrid approach, using a lightweight faceted taxonomy over the top of folksonomic tags is proving more useful.

Taxonomies remain key in providing the structure on which autocategorisation and text analytics is based, and so having a central taxonomy team that engages in regular and active dialogue with users is vital. Understanding the “basic concepts” (i.e. Lakoff and Rosch’s “basic categories”) that are the most familiar terms to the community of users is vital for constructing a helpful taxonomy and labels should be as short and simple as possible. Labels should be chosen for their distinctiveness and expressiveness.

He also pointed out that adults and children have different learning strategies, which is worth remembering. I was also pleased to hear his clear and emphatic distinction between leisure and workplace search needs. It’s a personal bugbear of mine that people don’t realise that looking for a hairdresser in central London – where any one of a number will do – is not the same as trying to find a specific shot of a particular celebrity shortly after that controversial haircut a couple of years ago from the interview they gave about it on a chat show.

Tom highlighted four key functions for taxonomies:

  • knowledge organisation systems (for asset management)
  • labelling systems (for asset management)
  • navigation systems (for retrieval and discovery)
  • search systems (for retrieval)

He pointed out that text analytics needs taxonomy to underpin it, to base contextualisation rules on. He also stressed the importance of data quality, as data quality problems cause the majority of search project failures. People often focus on cool new features and fail to pay attention to the underlying data structures they need to put in place for effective searching.

He noted that the volumes of data and metadata that need to processed are growing at a furious rate. He highlighted Comcast as a company that is very highly advanced in the search and data management arena, managing multiple streams of data that are constantly being updated, for an audience that expects instant and accurate information.

He stated that structure will remain the key to findability for the foreseeable future. Autonomy is often hailed as doing something different to other search engines because it uses statistical methods, but at heart it still relies on structure in the data.

Richard Padley made it through the snow despite a four-hour train journey from Brighton, and spoke at length about the importance of knowledge organisation to support search. He explained the differences between controlled vocabularies, indexes, taxonomies, and ontologies and how each performs a different function.

Marianne Lykke then talked about information architecture and persuasive design. She also referred to “basic categories” as well as the need to guide people to where you want them to go via simple and clear steps.

Taxonomies, Metadata, and Semantics in Action

I spoke in the final session of the day, on metadata life cycles, asset lifecycles, parametadata, and managing data flows in complex information “ecosystems” with different “pace layers”.

Neil Blue from Biowisdom gave a fascinating and detailed overview of Biowisdom’s use of semantic technologies, in particular ontology-driven concept extraction. Biowisdom handle huge complex databases of information to do with the biological sciences and pharmaceuticals, so face very domain-specific issues, such as how to bridge the gap between “hard” scientific descriptions and “soft” descriptions of symptoms and side-effects typically given by patients.

In the final presentation of the day, Alessandro Pica outlined the use of semantic technologies by Italian News agency AGI.

Online Information Conference 2010

    Start a conversation 
Estimated reading time 3–4 minutes

Despite the recession, tube strikes, and snow, there was a fine collection of speakers, exhibitors, and delegates at a smaller than usual Online Information Conference and Exhibition this year.

Librarians seem to be getting heavily into Linked Data, while the corporate sector is still mainly concerned with business intelligence and search.

On day one I enjoyed the practical explanations of how Linked Data principles have been made to work at The Guardian, The Press Association, the Dutch Parliament, and the ALISS health project in Scotland.

Linked Data tags are a form of metadata that can be used to automatically generate content aggregations for web pages. This means that not only can you re-use your own content, increasing its lifespan, but you can gather cheap content that is openly available online. This is very familiar territory to me, as we used to build products in the same way back in the 90s, the difference being that we didn’t have much of an external web to link to back then. In the meantime, using a linkable, interoperbale format for your tags has very many benefits, and whether your focus is primarily for content within or beyond a firewall, the arguments for using standards that have the potential to link to the wider world seem very compelling. I can’t see any logical reasons not to standardise the format your metadata is held in (technical and practical issues are another matter), although standardising the semantic content of the metadata is a far more difficult problem.

It was reassuring to hear that everyone else is struggling with the problems of who mints IDs and URIs, who settles arguments about what exactly the IDs refer to – especially across domains – and who resolves and manages mappings. Such issues are difficult to resolve within a firewall, out there on the Web they become vast. The W3C is starting to work on provenance standards (the parametadata or meta-metadata), a pet worry of mine, because I am certain we need to get that layer of semantic information into our tags as soon as possible if we are going to be able to advance the semantic web beyond crunching databases together.

In the meantime, Linked Data is working very well especially for mashups and information visualisations. I particularly liked the Dutch Parliament’s “Attaquograms” – a diagram showing how often MPs were interrupted in debates and how much they interrupted others, although it doesn’t appear to have changed their behaviour yet. I also enjoyed The Guardian’s “league tables” of MPs’ performance. When MPs protested that such analyses ignored qualitative issues, The Guardian pointed out that if MPs advocate such data crunching as a way to judge schools and hospitals, then it must be good enough to apply to MPs themselves.

Andy Hyde from the ALISS project is working on ways to use Linked Data to help people manage their health, especially for patients with long term conditions such as diabetes. He stressed the importance of involving the users in any information project and addressing them on their terms, stating “The most collaborative tool we have is the cup of tea”.

My only concern about using Linked Data to pull in Wikipedia content, is whether the audience will start recognising it. If every website that mentions a topic has the same Wikipedia content attached to it, won’t people get bored? Perhaps there are just so many people out there contributing, so many varieties of alogrithmic aggregations, and so much content to read, it will never happen!

There is a related Guardian technology blog post.

I will post summaries of days two and three shortly.

Re-intermediating research

    Start a conversation 
Estimated reading time 2–2 minutes

A fine example of how much inspiration you can get from randomly talking to the people who are actually engaging with customers was given to me by our Research Guide last week.

She wants a video-tagging tool that includes chat functionality, some kind of interactive “pointing” facility, and plenty of metadata fields for adding and describing tags. When she is helping a customer to find the perfect bit of footage, she often finds herself in quite detailed discussions trying to explain why she thinks a shot meets their needs or in trying to understand what it is they don’t like about a particular scene. If they could both view the same footage in real time linked by some sort of online meeting functionality, they would be able to show each other what they meant and discuss and explain requirements far more easily and precisely.

This struck me as exactly how we should as information professionals be seizing new technologies to “re-intermediate” ourselves into the search process. Discussing bits of video footage is a particularly rich example, but what if an expert information professional could have a look at your search results and give you guidance via a little instant chat window? You could call up a real person to help you when you needed it without leaving your desk, in just the same way that online tech support chats work (I’ve had mixed experiences with those, but the principle is sound). I’m thinking especially of corporate settings, but wouldn’t it be a fantastic service for public libraries to offer?

It seems such a good idea I can’t believe it’s not already being done and would be very pleased to hear from anyone out there who is offering those sorts of services and in particular if there are any tools that support real time remote discussion around audio visual research.

From Walled Garden to Amazon Jungle

    Start a conversation 
Estimated reading time 2–4 minutes

I enjoyed the LIKE dinner the other Thursday. The speaker Tim Buckley-Owen spoke on the theme “From Walled Garden to Amazon Jungle” describing the changing environment that information professionals find themselves in. He spoke of how disintermediation is often perceived as a threat in the information world, but that this is a mistake, because out in the jungle, the services of an expert guide become indispensable if you are to avoid getting completely lost and falling prey to posionous snakes and other hazards. He pointed out that at least one other profession is facing a similarly shifting environment – the legal profession. We, however, should be in a better position than lawyers because they believe they are masters of the universe, whereas we see ourselves as merely useful. The Trafigura affair showed that information can act as a force that even the lawyers can’t contain.

Although I would never have dreamt of comparing myself to a lawyer, I could see the similarity in the way that disintermediation enabled by an online world is affecting the two professions. For lawyers, distintermediation arises out of the increasing ease of self-representation – e.g. the availability of online forms so that you can manage your own simple legal processes. As Tim pointed out, going to small claims court can already be handled online by the claimant alone. Conveyancing is becoming increasingly straightforward for non-lawyers, as it is largely a question of being able to search effectively (anybody need an information specialist – cheaper than a solicitor?). Perhaps even the processing of divorces and wills can be administered via online forms. (That might not prevent family disputes, but would certainly make them cheaper!) The smart lawyers are, of course, responding by focusing on tailor-made specialised services for unusual cases or one-off situations. This is exactly what information professionals are doing too. Librarians have always offered bespoke research services and the value they add over and above trawling through millions of results on Google is their knowledge of which sources are the best and what are the best sources to answer your specific question (and figuring out the question you really want the answer to, instead of the one you actually asked, which is much harder than it sounds). In a world where information is proliferating while the quality of sources is not necessarily improving, the knowledge of where to look is increasingly rather than decreasingly valuable.

Tim described some research indicating that the people who are least likely to delegate their research are the most senior executives (middle managers are too busy and like having people do things for them). In particular, top execs like to do their own competitor research. His hot tip for the information profession was to work with software developers to produce really effective competitior research services and tools.

Virginia Henry and David Holme have also blogged about the evening.

Like 9 is on December 3rd.

UDC Seminar 2009 – call for papers

    Start a conversation 
< 1 minute

UDC Seminar 2009 – call for papers. The “Classification at a Crossroads” conference will address the potential of classification, the Universal Decimal Classification in particular, in supporting information organization, management and resource discovery in the networked environment. It will explore solutions for better subject access control and vocabulary sharing services.

Digital Humanities 2009 – call for papers

    Start a conversation 
Estimated reading time 1–2 minutes

Digital Humanities 2009 » Call for Papers. Digital Humanities 2009–the annual joint meeting of the Association for Computers and the Humanities, the Association for Literary and Linguistic Computing, and the Society for Digital Humanities / Société pour l’étude des médias interactifs–will be hosted by the Maryland Institute for Technology in the Humanities (MITH) at the University of Maryland in College Park, USA.

Suitable subjects for proposals include, for example,

* text analysis, corpora, corpus linguistics, language processing, language learning
* libraries, archives and the creation, delivery, management and preservation of humanities digital resources
* computer-based research and computing applications in all areas of literary, linguistic, cultural, and historical studies, including electronic literature and interdisciplinary aspects of modern scholarship
* use of computation in such areas as the arts, architecture, music, film, theatre, new media, and other areas reflecting our cultural heritage
* research issues such as: information design and modelling; the cultural impact of the new media; software studies; Human-Computer interaction
* the role of digital humanities in academic curricula
* digital humanities and diversity