The Information Master – Louis XIV’s Knowledge Manager

    2 comments 
Estimated reading time 4–6 minutes

I recently read The Information Master: Jean-Baptiste Colbert‘s Secret State Intelligence System by Jacob Soll. It is a very readable but scholarly book that tells the story of how Colbert used the accumulation of knowledge to build a highly efficient administrative system and to promote his own political career. He seems to have been the first person to seize upon the notion of “evidence-based” politics and that knowledge, information and data collection, and scholarship could be used to serve the interests of statecraft. In this way he is an ancestor of much of the thinking that is commonplace not only in today’s political administrations but also in all organizations that value the collection and management of information. The principle sits at the heart of what we mean by the “knowledge economy”.

The grim librarian

Jean-Baptiste Colbert (1619-83) is depicted as ruthless, determined, fierce, and serious. He was an ambitious man and saw his ability to control and organize information as a way of gaining and then keeping political influence. By first persuading the King that an informed leadership was a strong and efficient leadership, and then by being the person who best understood and knew how to use the libraries and resources he collected, Colbert rose to political prominence. However, his work eventually fell victim to the political machinations of his rivals and after his death his collection was scattered.

Using knowledge to serve the state

Before Colbert, the scholarly academic tradition in France had existed independently from the monarchy, but Colbert brought the world of scholarship into the service of the state, believing that all knowledge – even from the most unlikely of sources – had potential value. This is very much in line with modern thinking about Big Data and how that can be used in the service of corporations. Even the most unlikely of sources might contain useful insights into customer preferences or previously unseen supply chain inefficiencies, for example.

Colbert’s career was caught up with the political machinations of the time. He worked as a kind of accountant of Cardinal Mazarin, but when Mazarin’s library was ransacked by political rivals and his librarian fell out of favour, Colbert restored the library and built a unified information system based on the combination of scholarship and administrative documentation, ending the former division between academia and government bureaucracy.

Importance of metadata

Colbert also instinctively grasped the importance of good metadata, cataloguing, and an accurate network of links and cross references in order to be able to obtain relevant and comprehensive information quickly, issues that remain even more urgent than ever given the information explosions modern organizations – and indeed nations – face. This enabled him to become a better administrator than his rivals and by becoming not only the source of political expedient information but also the person who knew how to use the information resources most effectively, he was able to gain political influence and become a key minister under Louis XIV.

A personal vision

I was struck by how much his vast library, archive, and document management system was the result of his own personal vision, how it was built on the dismantling and rebuilding of work of predecessors, but also how, after his death, the system itself fell victim to political changes and was left to collapse. This pattern is repeated frequently in modern information projects. So often the work of the champion of the original system is wasted as infighting that is often not directly connected to the information project itself leads to budget cuts, staff changes, or other problems that lead to the system decaying.

Soll argues that the loss of Colbert’s system hampered political administration in France for generations. Ironically, it was Colbert’s own archives that enabled successive generations of political rivals to find the documents with which to undermine the power of the crown, showing the double-edged nature of information work. It is often the same collections that can both redeem and condemn.

Secrecy or transparency?

Another theme that ran throughout Colbert’s career, with both political and practical implications, was the tension between demands for transparent government and the desire for a secret state. Much of the distinction between public and private archives was simply a matter of who was in control of them and who had set them up, so the situation in France under the monarchy was different to the situation in England where Parliament and the Monarchy maintained entirely separate information systems. In France, an insistence on keeping government financial records secret eventually undermined trust in the economy. Throughout his career Colbert was involved in debates over which and how much government information should be made public, with different factions arguing over the issue – arguments that are especially resonant today.

On being the only girl in the room

    1 comment 
Estimated reading time 3–5 minutes

Perhaps it is because I am settling into a new culture, or perhaps it is because my new time zone has altered the nature of what I see in my Twitter feed, but there seem to have been a spate of articles lately about sexism faced by women working in technology, which makes me very sad. This was on my mind when I received from a former colleague a copy of a report we had co-authored. As I read the list of names, I was struck by how wonderful a group of guys they were, how intelligent, creative, and technically knowledgeable, and what a pleasure it had been to be the only girl in the room. Those guys were utterly supportive, thoughtful, generous of spirit, and full of interest in and encouragement of my contributions.

I am from an editorial background and I don’t really write code, but never once in that group did I experience any kind of tech snobbery. Whenever there was something that I didn’t know about, or unfamiliar acronyms or jargon, someone would provide a clear explanation, without every being patronising, appearing bored or impatient, or making any assumptions about what anyone “ought” to know. I was never made to feel I had asked a stupid question, said something foolish, or that I did not belong. At the same time, these men were always keen and interested to hear my perspectives, and to learn from my experiences. The group dynamic was one of free and open exchange of ideas and of working collaboratively to find solutions to problems. All contributions were valued and everything was considered jointly and equally authored.

I didn’t remain the only girl in the room. I was learning so much in the meetings that I invited my (now former) colleague to join us, bringing a new set of expertise and skills that were welcomed. I had not a moment of concern about inviting a younger and even less technical female colleague in to the group, because I knew she would be made welcome and would have a fantastic opportunity to learn from some brilliant minds.

Of course I have encountered much sexism in my career, but it is not necessary and it is not inevitable. I hate the thought of young women being put off technology as a career because of fears of sexism and discrimination. I know this happens a lot – it happened to me, although I found my way into tech eventually. I do not know whether there is “more” sexism in technology – a charge some of the post I have read have levelled – than there is anywhere else, but I do know that there is sexism in all industries, so you might as well ignore it as a factor and choose a career based on aspects like intellectual stimulation or good career prospects. Technology certainly offers those. I personally have encountered sexism in so-called “female friendly” industries such as publishing and teaching, and I am quite sure it is suffered by nurses, waitresses, actresses, pop singers…. Since I have been working in technology, I have often been the only girl in the room but almost always that room has been a fascinating, welcoming, and inspiring place to be.

This is not primarily written for the specific individuals nor for all the other fantastic guys in tech I have met or worked with (there are so many I can’t possibly name them all), although I hope they enjoy it. This post is intended to promote positive male role models and examples of decent male behaviour for boys and young men to follow, and as a mythbuster for anyone who thinks sexism and geekiness are somehow intrinsically linked.

It is also written for women, as a reminder that although we must speak out against sexist and otherwise toxic behaviour when we encounter it, approval and affirmation are very powerful motivators of change, so we also help by shouting about and celebrating when we find fabulous guys in tech and in life who are getting it right.

How semantic search helps girls and boys, but in different ways

    Start a conversation 
Estimated reading time 2–4 minutes

While researching something else, I happened upon this rather cheering paper: The effect of semantic technologies on the exploration of the web of knowledge by female and male users. Gender issues only tangentially affect my core research, as I generally focus on linguistic communities that are defined by organizational or professional context, so gender effects are rather diluted by that point. I also prefer collapsing false dichotomies rather than emphasizing difference and division, and so I was very heartened that this article shows how semantic techniques can be unifying.

The study is based on observing the search strategies of a group of male and a group of female students in Taiwan. Given non-semantic search systems to use, the male students tended to search very broadly and shallowly, skimming large numbers of results and following links and going off on tangents to find other results. This enabled them to cover a lot of ground, often finding something useful, but also often left them with a rather chaotic collection of results and references. The female students tended to search very deeply and narrowly, often stopping to read in depth a paper that they had found, and trying to fully grasp the nature of the results that had been returned. This meant they tended to collect fewer results overall, the results tended to be clustered around a single concept, and they risked falling into the rather wonderfully named “similarity holes”. These “similarity holes” are search traps where a single search term or collection of terms leads to a small set of results and are essentially “dead ends”.

How did semantic search help?

When the students were given semantic search tools, the male students continued to search broadly and shallowly but the semantic associations helped them to conceptualize and organize what they were doing. This meant that they ended up with a far more coherent, relevant, and useful set of search results and references. In contrast, the female students using the semantic associations offered, found it far easier to broaden their searches and to come up with alternative search terms and approaches enabling them to avoid and break out of any “similarity holes” they fell into.

Gender effects dissipate

I was very heartened that improvements in technology can be gender-neutral – they can simply be improvements of benefit in different ways to everyone, they don’t have to deliberately try to account for gender difference. I was also very heartened to note that the researchers found that gender differences in search strategies dissipated once students were taught advanced information seeking and knowledge management strategies. Gender differences were only apparent in novice, inexperienced searchers. So, in information seeking work at least, any biological or socially created gender differences are weak and easily overcome with some well directed instruction and semantic techniques are a help rather than a hindrance.

Semantic Search – Call for Papers for Special Issue on Semantic Search for Aslib Journal

    1 comment 
Estimated reading time 4–6 minutes

This special issue aims to explore the possibilities and limitations of Semantic Search. We are particularly interested in papers that place carefully conducted studies into the wider framework of current Semantic Search research in the broader context of Linked Open Data.

Research into Semantic Search and its applications has been gaining momentum over the last few years, with an increasing number of studies on general principles, proof of concept and prototypical applications. The market for Semantic Search applications and its role within the general development of (internet) technologies and its impact on different areas of private and public life have attracted attention. Simultaneously, many publicly funded projects in the field of cultural heritage were initialised. Researchers in many disciplines have been making progress in the establishment of both theories and methods for Semantic Search. However, there still is a lack of comparison across individual studies as well as a need for standardisation regarding the dissociation of Semantic Search of other search solutions, agreed upon definitions as well as technologies and interfaces.

Semantic Search research is often based on large and rich data sets and a combination of techniques ranging from statistical bag of words approaches and natural-language-processing enriched via a subtle utilisation of metadata over classificatory approaches right up to ontological reasoning. Over the last 10 years a lot of initial technical and conceptual obstacles in the field of Semantic Search have been overcome. After the initial euphoria for Semantic Search that resulted in a technically driven supply of search solutions, appraisal of successful and less successful approaches is needed. Amongst other things the limitations of working with open world solutions on – only apparently comprehensive – linked open data sets compared to small domain specific solutions need to be determined.
One ongoing challenge for semantic search solutions is their usability and user acceptance, as only highly usable walk-up-and-use-approaches stand a chance in the field of general search.

For this special issue, we invite articles which address the opportunities and challenges of Semantic Search from theoretical and practical, conceptual and empirical perspectives.

Topics of interest include but are not restricted to:

  • The history of semantic search – how the latest techniques and technologies have come out of developments over the last 5, 10, 20, 100, 2000… years
  • Technical approaches to semantic search : linguistic/NLP, probabilistic, artificial intelligence, conceptual/ontological …
  • Current trends in Semantic Search
  • Best practice – how far along the road from ‘early adopters’ to ‘mainstream users’ has semantic search gone so far?
  • Semantic Search and cultural heritage
  • Usability and user experience of Semantic Search
  • Visualisation and Semantic Search
  • Quality criteria for Semantic Search
  • Impact of norms and standardisation for instance (like ISO 25964 “Thesauri for information retrieval“) and the potential of Semantic Search?
  • How are semantic technologies fostering a need for cross-industry collaboration and standardisation?
  • How are Semantic Search techniques and technologies being used in practice?
  • Practical problems in brokering consensus and agreement – defining concepts, terms and classes, etc.
  • Curation and management of ontologies
  • Differences between web-scale, enterprise scale, and collection-specific scale techniques
  • Evaluation of Semantic Search solutions
  • Comparison of data collection approaches
  • User behaviour and the evolution of norms and conventions
  • Information behaviour and information literacy
  • User surveys
  • Usage scenarios and case studies

Submissions

Papers should clearly connect their studies to the wider body of Semantic Search scholarship, and spell out the implications of their findings for future research. In general, only research-based submissions including case studies and best practice will be considered. Viewpoints, literature reviews or general reviews are generally not acceptable.

Papers should be 4,000 to 6,000 words in length (including references). Citations and references should be in our journal style.

Please see the author guidelines at http://www.emeraldinsight.com/products/journals/author_guidelines.htm?id=ap for more details and submission instructions.
Submissions to Aslib Proceedings are made using ScholarOne Manuscripts, the online submission and peer review system. Registration and access is available at http://mc.manuscriptcentral.com/ap.

Important Dates

Paper submission: 15.12.2013
Notice of review results: 15.02.2013
Revisions due: 31.03.2014
Publication: Aslib Proceedings, issue 5, 2014.

About the Journal

Aslib Proceedings (ISSN: 0001-253X) is a peer-reviewed high-quality journal covering international research and practice in library and information science, and information management. The journal is the major publication for ASLIB – the Association for Information Management in the United Kingdom – a membership association for people who manage information and knowledge in organisations and the information industry.
Information about the journal can be found at
http://www.emeraldinsight.com/products/journals/journals.htm?id=ap

Contact the guest editors

Prof. Dr. Ulrike Spree
- Hamburg University of Applied Sciences -
Faculty Design, Medien and Information
Department Information
Finkenau 35
20081 Hamburg
Phone: +49/40/42875/3607
Email: ulrike.spree@haw-hamburg.de

Fran Alexander
Information Architect, BCA Research (2013- )
Taxonomy Manager, BBC Information and Archives (2009-13)
Email: fran@vocabcontrol.com
Twitter: @frangle

To index is to translate

    Start a conversation 
Estimated reading time 3–4 minutes

Living in Montreal means I am trying to improve my very limited French and in trying to communicate with my Francophone neighbours I have become aware of a process of attempting to simplify my thoughts and express them using the limited vocabulary and grammar that I have available. I only have a few nouns, fewer verbs, and a couple of conjunctions that I can use so far and so trying to talk to people is not so much a process of thinking in English and translating that into French, as considering the basic core concepts that I need to convey and finding the simplest ways of expressing relationships. So I will say something like “The sun shone. It was big. People were happy” because I can’t properly translate “We all loved the great weather today”.

This made me realise how similar this is to the process of breaking down content into key concepts for indexing. My limited vocabulary is much like the controlled vocabulary of an indexing system, forcing me to analyse and decompose my ideas into simple components and basic relationships. This means I am doing quite well at fact-based communication, but my storytelling has suffered as I have only one very simple emotional register to work with. The best I can offer is a rather laconic style with some simple metaphors: “It was like a horror movie.”

It is regularly noted that ontology work in the sciences has forged ahead of that in the humanities, and the parallel with my ability to express facts but not tell stories struck me. When I tell my simplified stories I rely on shared understanding of a broad cultural context that provides the emotional aspect – I can use the simple expression “horror movie” because the concept has rich emotional associations, connotations, and resonances for people. The concept itself is rather vague, broad, and open to interpretation, so the shared understanding is rather thin. The opposite is true of scientific concepts, which are honed into precision and a very constrained definitive shared understanding. So, I wonder how much of sense that I can express facts well is actually an illusion, and it is just that those factual concepts have few emotional resonances.

A major aspect of poetry is about extending the meanings of words to their limits, to allow for the maximum emotional resonance and personal interpretation. Perhaps poetry speaks to individuals precisely because it doesn’t evoke a shared understanding but calls out new meanings and challenges the reader to think differently, to find new meanings? This is the opposite of indexing, which is about simplifying and constraining to the point at which all the fuzziness is driven away and you are left with nothing but “dead metaphors”. The only reason indexing the sciences seems easier is because so many scientific concepts have been analyzed and defined to this point already, doing much of the indexer’s work for them.

I am not sure if these musings have any practical applications. People sometimes ask me if I think my previous studies of languages and literature have helped in my current work. I have known many excellent monolingual indexers but am also aware that many people who are good at semantics speak more than one language. However, I am sure it is helpful to think of the process of indexing as a form of translation, albeit if the idea of removing all the poetry from language in order to create a usable, useful index is not at all romantic!

Can you use statistics to find meaning?

    Start a conversation 
Estimated reading time 2–2 minutes

I enjoyed this article in New Scientist about using statistical analysis on the Voynich manuscript to try to work out whether it is a meaningful but secret code or just gibberish.

Ultimately, I remain puzzled as to what the statistics actually tell us. They identify patterns, but meaning is more than simply patterns. However, the fact that certain sets of symbols in the Voynich text appear to cluster in sections with common illustrations suggests it is code. The counter-argument that you could deliberately fake such clustering by mechanical means is intriguing. Without far larger samples, and an understanding of random clusterings, I have no idea whether this sort of faking would produce the same patterns as natural language. I am sure clusters must appear all over the place, without bearing any meaning whatsoever.

I also thought it was interesting that one of the arguments in favour of gibberish was that there were no mistakes. It strikes me there could be many reasons for the lack of proofing and correction and I would want to know more about the rate of correction in similar works before I could assess that argument. I know that standardization of spelling came relatively late, presumably before then far more “mistakes” would have been tolerated.

Nevertheless, a fascinating mystery and one that perhaps cannot be resolved by analysis but by coincidental discovery of the key (if it exists!) – if it is gibberish, perhaps we will never know. Either way, I am sure it would have amused the author to know that their work would still be a controversial topic hundreds of years after it was written!

This time it’s personal data – Indiverses and Personal APIs

    5 comments 
Estimated reading time 3–4 minutes

Sooner or later I was bound to find some other Semanticists in Canada and on Thursday I attended a Semantic Web meetup in Montreal. The audience was small, but that led to more of a group discussion atmosphere than a formal talk. The presenter, Dr Joan Yess Kahn, has coined the term Indiverse – Individual Information Universe – to facilitate her thinking about the set of personal information and data that we accumulate through our lives.

She pointed out that some of this information is created by us, some about us, some with our knowledge and consent, some without, and our entire digital lives can be stolen and abused. She made some interesting observations about how our personal and public information spaces were essentially one and the same before the industrial revolution, when most people’s work and home lives were intertwined (e.g. artisans living in their workshops), and that changes such as the industrial revolution and public education split those apart as people left home to work somewhere else. However, in the information age more people are returning to working from home while others are increasingly using their computers at work to carry out personal tasks, such as online shopping.

This blurring of the public and private has many social and commercial implications. We discussed the potential monetary value of personal attention and intention data to advertisers, and implications for surveillance of individuals by governments and other organizations.

We also talked about information overload and information anxiety. Joan has written about ways of categorizing, indexing, and managing our personal information – our address books, calendars, to do lists, etc. – and this led us to consider ideas of how to construct sharable, standardized Personal Data Lockers (for example The Locker Project) and to take back control of our online identity and information management, for example in shifting from Customer Relations Management (CRM) to Vendor Relations Management (VRM).

In previous posts I have talked about our need to become our own personal digital archivists as well and I was sent a link by Mark to a Personal API developed by Naveen. This takes personal information curation to the data level, as Naveen is seeking an easy way to manage the huge amounts of data that he generates simply by being a person in the world – his fitness routines, diet, etc.

There is a clear convergence here with the work done by such medical innovators as Patients Know Best electronic patient health records. Moral and social implications of who is responsible for curating and protecting such data are huge and wide-ranging. At the moment doting parents using apps to monitor their babies or fitness enthusiasts using apps (such as map my run etc.) are doing this for fun, but will we start seeing this as a social duty? Will we have right-wing campaigns to deny treatment to people who have failed to look after their health data or mass class actions to sue hospitals that get hacked? If you think biometric passports are information dense, just wait until every heartbeat from ultrasound to grave is encoded somewhere in your Indiverse.

Tagging the cart before the horse – Getting your project plan in order

    Start a conversation 
Estimated reading time 6–9 minutes

When people launch search improvement or information organziation projects, one of the commonest mistakes is to be over-eager to “just get the content indexed or tagged” without spending enough time and thought on the structure of an index, what should be tagged, and how the tags themselves should be structured.

This typically happens for two reasons:
1. The project managers – often encouraged by service providers who just want to get their hands on the cheque – simply underestimate the amount of preparatory work involved, whether it is structuring and testing a taxonomy, setting up and checking automated concept extaction rules, or developing a comprehensive domain model and tag set, so they fail to include enough – if any – of a development and testing stage in the plan. This often happens when the project is led by people who do not work closely with the content itself. Projects led by marekting or IT departments often fall into this trap.

2. The project managers include development and testing, with iterative correction and improvement phases, but are put under pressure to cut corners, or to compress deadlines.  This tends to happen when external forces affect timescales – for example local government projects that have to spend the budget before the end of the financial year. It can also happen when stakeholder power is unevenly distributed – for example, the advice of information professionals is sought but then over-ruled by more powerful stakeholders who have a fixed deadline in mind – for example a launching a new website in time for the Christmas market.

Forewarned is forearmed

Prevention is better than cure in both these scenarios, but easier said than done. Your best defence is to understand organizational culture, politics, and history and to evangelize the role and importance of information work and your department. Find out which departments have initiated information projects in the past, which have the biggest budgets, which have the most proactive leadership teams, then actively seek allies in those departments. Find out if there are meetings on information issues you could attend, offer to help, or even do something like conduct a survey on information use and needs and ask for volunteers to be interviewed.  Simply by talking to people at any level in those departments you will start to find out what is going on, and you will remind people in those departments of your existence and areas of expertise.

On a more formal level, you can look at organizational structures and hierarchies and make sure that you have effective chains of communication that follow chains of command. This may mean supporting your boss in promoting the work of your department to their boss. This is especially important in organizations with lots of layers of middle management, as middle managers can get so caught up in day to day work that longer term strategy can get put on the back burner, so offer support.

If you find out about projects early enough, you have a chance of influencing the project planning stages to make sure information and content issues are given the attention they need, right from the start.

Shutting the stable door…

Sometimes despite our best efforts we end up in a project that is already tripping over itself. A common scenario is for tagging work to be presented as a fait accompli. This is particularly likely with fully automated tagging work, as processing can be done far faster than any manual tagging effort. However, it is highly unusual for any project to be undertaken without its being intended to offer some sort or service or solve some recognized problem.

Firstly, assess how well it achieves its intended goals. If you have only been called into the project at the late stage, is this because it is going off the rails and the team want a salvage solution, or is it because it works well in one context and the team want to see if it can be used more widely? If it is the latter, that’s great – you can enjoy coming up with lots of positive and creative proposals. However, the core business planning principles are pretty much the same whether you are proposing to extend a successful project or corralling one that is running out of control.

Once you know what the project was meant to achieve, assess how much budget and time you have left, as that will determine the scope to make changes and improvements. Work out what sort of changes are feasible. Can you get an additional set of tags applied for example? Can you get sets of tags deleted? Are you only able to make manual adjustments or can you re-run automated processes? How labour intensive are the adjustment processes? Is chronology a factor – in other words can you keep the first run for legacy content but evolve the processes for future content?

These assessments are especially valuable for projects that are at an intermediate stage as there is much more scope to alter their direction. In these cases it is vital to prioritize and focus on what can be changed in a pragmatic way. For example, if the team are working chronologically through a set of documents, you may have time to undertake planning and assessment work focused on the most recent and have that ready before they get to a logical break point. So, you prioritize developing a schema relevant to the current year, and make a clean break on a logical date, such as January 1. If they have been working topic by topic, is there a new search facet you could introduce and get a really good set for that run as a fresh iteration?

If there are no clean breakpoints or clear sets of changes to be made, focus on anything that is likely to cause user problems or confusion or serious information management problems in future. What are likely to cause real pain points? What are the worst of those?

Once you have identified the worst issues and clarified the resources you have for making the changes, you have the basis for working up the time and money you need to carry them out. This can form the basis of your business case and project plan either to improve a faltering project and pull it back on track or to add scope to a project that is going well.

…after the horse has bolted

If there is limited scope to make changes, and the project is presented as already complete, it is still worth assessing how well it meets its goals as this will help you work out how you can best use and present the work that has been done. For example, can it be offered as an “optional extra” to existing search systems?

It is also worth assessing the costs and resource involved in order to make changes you would recommend even if it seems there is no immediate prospect of getting that work done. It is likely that sooner or later someone will want to re-visit the work, especially if it is not meeting its goals. Then it will be useful to know whether it can be fixed with a small injection of resource or whether it requires a major re-working, or even abandoning and starting afresh. Such a prospect may seem daunting, but if you can learn lessons and avoid repeating mistakes the next time around, then that can be seen as a positive. If one of the problems with the project was the lack of input from the information team early on, then it is worth making sure for the sake of the information department and the organization as a whole that the same mistake does not happen again. If you demonstrate well enough how you would have done things differently, you might even get to be in charge next time!