Building, visualising and deploying taxonomies and ontologies; the reality – Content Intelligence Forum event

    Start a conversation 
Estimated reading time 1–2 minutes

I have been trying to get to the Content Intelligence Forum meetups for some time as they always seem to offer excellent speakers on key topics that don’t tend to get the attention they deserve, so I was delighted to be able to attend Stephen D’Arcy’s talk a little while ago on taxonomies and ontologies.

Stephen has many years of experience designing semantic information systems for large organisations, ranging from health care providers, to banks, to media companies. His career illustrates the transferability and wide demand for information skills.

His 8-point checklist for a taxonomy project was extremely helpful – Define, Audit, Tools, Plan, Build, Deploy, Governance, Documentation – as were his tips for managing stakeholders, IT departments in particular. He warned against the pitfalls of not including taxonomy management early enough in search systems design, and the problems that you can be left with if you do not have a flexible and dynamic way of managing your taxonomy and ontology structures. He also included a lot of examples that illustrated the fun aspects of ontologies when used to create interesting pathways through entertainment content in particular.

The conversation after the talk was very engaging and I enjoyed finding out about common problems that information professionals face, including how best to define terms, how to encourage clear thinking, and how to communicate good research techniques.

I friend dead people – Are social media mature enough to cope with bereavement?

    3 comments 
Estimated reading time 4–7 minutes

This is a very personal post about topics in which I am not an expert, so I welcome comments and suggestions.

When “like” and “lol” don’t help

In February, a young man I had never met died in sad circumstances. He was a friend of a friend and I was supposed to meet him on the day he died. Completely coincidentally, within a fortnight I myself lost a dear friend, someone I had known for over 20 years.

The closeness in timing has thrown out sharp contrasts in the way that these deaths have reverberated around my social media worlds (obviously the real world impacts have been huge, but I am not going to discuss those here).

In many ways, dealing with the death of my own friend on social media has been easier. Being well known to her family and her circle of closest friends has meant that I have felt able to post messages of condolence and remembrance as I instinctively know what is appropriate, and I know that most of the people reading them will know me. It has been strange to see her name pop up as a “friend available on chat” when I know any activity in her account must be one of her family members logging in to maintain the page. Yesterday was her birthday, and the reminders in my calendar and the little birthday gift “event reminder” were bittersweet, but not unwelcome. I think of her and her family often, and do not want to forget.

Just after she died, I received a message through a social media site from someone I had never met or even heard of, who had been a schoolfriend of hers long ago, asking what had happened to our mutual friend, and I felt comfortable in answering. It helped me to talk about her with this stranger. I even flattered myself that I was doing some good, in that they clearly felt awkward about contacting her family directly while I was able to act as an “information resource” meaning the family and closest friends could focus on their own grieving.

I friend dead people

In contrast, how to cope with the loss of an almost-friend on social media has been strange and unnerving. One social media application has tactlessly and repeatedly suggested him as a friend, noting how many friends we had (have?) in common. Somehow I didn’t have the heart to click on “ignore”. I realise now I should have done just that, because I was anguished when I accidentally clicked on “confirm”. I worried that his friends and relatives might see my “friend request” and be distressed by it. Maybe they would never spot the noitification, maybe they would assume it was sent at a time before his death – just another reminder of what might have been, maybe they would even be comforted by the continuation of these distant social interactions with almost-strangers. (I immediately emailed the site in question asking them to retrieve my suggestion, but received no reply.)

My uncertainty about the appropriate “social media etiquette” was no doubt increased rather than diminished by our social distance. I do not know his family and friends well enough to mention this casually in passing, to express that this had been a mistake and was not intended to distress, or even to know what sort of people they are and whether this is the sort of thing that might upset them. However, it is exactly these sort of loose “one degree of separation” relationships that online social media foster and this incident struck me as illustrating how inadequate such media are when interactions need to go beyond chirruping about the weather, saying a website is cool, or asking whether or not someone wants to go to a party.

Digital memorials

My friend’s social media pages have slipped into being a form of digital memorial, but this also raises new issues. There have been stories in the press of “trolls” deliberately desecrating memorial pages in an online equivalent of upturning flowers left on a grave or kicking over and spraying graffiti on a headstone (e.g. http://gawker.com/5868503/why-people-troll-dead-kids-on-facebook). The only way to deal with this seems to be to remove the page, which is a shame and in a way seems to mean the bullies have won. It also highlights a strange transition from personal to public. Our graveyards are either public spaces that the authorities monitor and maintain or privately curated grounds. I have previously thought of my social media pages as more like a private garden – people may peer over the wall, but it is essentially “my” space to maintain. People are starting to think more and more about their digital legacies (the British Computer Society recently held an event on this theme).

There are already “digital memorial” companies offering guarantees of “permanent” archiving and access to sites (e.g. Much Loved). Other sites offer memorial pages that allow people to make donations to charity, but presumably these are not expected to remain in place forever.

However, these sites are aimed at those who remain setting up the sites, not taking over the sites that belonged to their loved ones. The value of someone’s posts and pages changes dramatically when they become precious memories, and not just ephemeral chatter. If we (or our loved ones) want our own sites to go on after us, do we need to bequeath our passwords to trusted friends or family? How does that affect our contracts with hosts and service providers? What rights do families have to “reclaim” the pages and content if there is no such bequest? How would disputes over inheritance of such sites be decided? What recourse do we have if the site owner decides to shut down and delete the content or simply loses it?

It seems to me that such issues have the potential to cause far more distress than the strangenesses we encounter when automated reminders and friend suggestions behave as if we are all immortal.

Change, technology, understanding, and the information professions

    Start a conversation 
Estimated reading time 3–4 minutes

Not being a morning person, I was unsure whether a networking breakfast would suit me, but the recruitment agent Sue Hill’s event offered good food and interesting conversation, so I thought I would give it a try. I wasn’t disappointed – the food was excellent and the big round tables promoted lively group discussion.

We were a mix of information professionals from public and private sector, at different stages of our careers, but three key themes prompted the most debate.

Change management

Managing technology change and bridging the cultural and political divisions within organisations in order to bring about change were key concerns. Information professionals can contribute by explaining how new technologies work, how technologies can be catalysts of changes in behaviour, and how they mitigate or increase informational and archival risks. Even simply letting people know new technology is out there can be hugely valuable. Knowledge and information workers can help manage change on political and cultural levels by understanding the corporate culture they are working in and helping their organisation to understand itself and so make good decisions about systems procurement. Information professionals can also often help to break down cultural barriers, to sharing information, for example.

Social media

Social media are now being used to differing degrees within organisations – some having embraced the technologies wholeheartedly, others seeing them as a problem or a threat. There was a general concern that technology is being adopted and used faster than we can understand its impacts and devise strategies for mitigating any risks.

Personal and cultural understanding of the divisions between the public and the private seemed to be a problematic area. Young people in particular were perceived as being vulnerable to “over exposure” as they seemed not to notice that postings about them – pictures especially – would remain available for decades to come and could compromise them in their future careers. Recruitment agents use social media to find out about potential job candidates, and notice inconsistencies between a very professional image presented in a CV or at interview with a Twitter feed that paints a picture of carelessness, foolishness, or irresponsibility.

Information literacy

Awareness of how to use and abuse social media, search engines and research tools, and data and statistics was seen as an arena in which information professionals can offer advice and mentoring, to young people, but also to organisations. Information professionals should also set good examples of how to use social media tools, adopt new working practices, and evaluate new technologies. They should also be able to explain how search engines work, what the pitfalls of poorly planned or too narrow research strategies are, and how to research in a more efficient and effective manner.

A new area that information professionals also need to understand is data analytics and how statistics and algorithmic data mining can be used or abused. Information professionals need not be advanced mathematicians to contribute in this area – an understanding of how to interpret data, the political and cultural issues that can bias interpretations, how to frame questions to get mathematically and statistically significant results, and how to understand the importance of outliers and statistical anomalies are skills that are becoming more important every day.

Overall, I thoroughly enjoyed being woken up by such thoughtful and interesting breakfast companions and went about the rest of the day with a head full of fresh ideas.

Isn’t search the same as browse?

    Start a conversation 
Estimated reading time 4–6 minutes

I nearly wept when one of our young rising IT stars queried in a meeting why we had separated “search” and “browse” as headings for our discusssions on archive navigation functionality. So, to spare me further tears here are some distinctions and similarities. There won’t be anything new for information professionals, but I hope it will be useful if any of your colleagues in IT need a little help. I am sure this is far from comprehensive, so please leave additions and comments!

Differences between search and browse

Search is making a beeline to a known target, browse is wandering around and exploring.
Search is for when you know what you are looking for, browse is for when you don’t.
Search is for when you know what you are looking for exists, browse is for when you don’t.

Search expects you to look for something that is findable, browse shows you the sort of thing you can find.
Search is for when you already know what is available in a collection or repository, browse is how you find out what is there, especially if you are a newcomer.
Search is difficult when you don’t know the right words to use, browse offers suggestions.
Search is a quickfire answer, browse is educative.
Search is about one-off actions, browse is about establishing familiar pathways that can be followed again or varied with predictable results.

Search relies on the seeker to do all the thinking, browse offers suggestions.
Search is a tricky way of finding content on related topics, browse is an easy way of finding related content.
Search is difficult when you are trying to distinguish between almost identical content, browse can highlight subtle distinctions.
Search rarely offers completeness, browse often offers completeness.

Search is pretty much a “black box” to most people, so it is hard to tell how well it has worked, browse systems are visible so it is easy to judge them.
Search uses complex processing that most people don’t want to see, browse uses links and connections that most people like to see.
Search is based on calcuations and assumptions that are under the surface, browse systems offer frameworks that are more open.

Search works well on the web, because the web is so big no-one has had time to build an easy way to browse it, browse works well on smaller structured collections.
Search can run across vast collections, browse needs to be offered at human-readable scales.
Search does not usually give an indication of the size or scope of a collection, browse can be designed to indicate scale.

Similarities between search and browse

Search and browse are both ways of finding content.
Search and browse can both be configured in a huge variety of ways.
Search and browse both have many different mechanisms and implementations.
Search and browse should both be tailored to users’ needs.
Search and browse systems both require thought and editorial judgement in their creation so that they work effectively for any particular collection.
Search and browse systems can often both be created largely automatically.
Search and browse often both involve metadata.
Search and browse behaviours may be intertwined, with users switching from one to the other.
Search and browse may be used by the same users for different tasks at different times.
Search and browse both offer serendipity, although serendipitous opportunities are often hidden by interface design.

Should I offer my users search or browse?

Almost always, you should offer both. Unless you are very sure that your users will always be performing the same kind of task and have the same level of familiarity with your content. With small static collections of content, it may not matter too much, but for most content collections, users will probably want both, but which you make your main focus depends on the context and collection.

Shops might have lots of images and very little text, so a beautifully designed navigation system will help customers find – and buy – products they might not know about, while only a simple search system might be needed to cover searches for product names. A library will need to support lots of searches for titles and across catalogue text with a good search system, but will also need to help educate and inform users with a clear user-friendly browsable navigation system. A large incoherent collection of unstructured text with no particular purpose is likely to be difficult to navigate no matter what you design, so will need good search, but – apart from the web itself – such unbounded and unmanaged collections tend to be quite unusual.

Data: The New Black Gold?

    Start a conversation 
Estimated reading time 5–8 minutes

Last week I attended a seminar organised by The British Screen Advisory Council and Intellect, the technology trade association, and hosted by the law firm SNR Denton. The panellists included Derek Wyatt, internet visionary and former politician, Dr Rob Reid, Science Policy Adviser, Which?, Nick Graham, of SNR Denton, Steve Taylor, creative mentor, Donna Whitehead, Government Affairs Manager, Microsoft, Theo Bertram, UK Policy Manager, Google, David Boyle, Head of Insight, Zeebox, and Louisa Wong, Aegis Media.

Data as oil

The event was chaired by Adam Singer, BSAC chairman, who explored the metaphor of “data as oil”. Like oil, raw data is a valuable commodity, but usually needs processing and refining before it can be used, especially by individual consumers. Like oil, data can leak and spill, and if mishandled can be toxic.

It struck me through the course of the evening, that just like oil, we are in danger of allowing control of data to fall into the hands of a very small number of companies, who could easily form cartels and lock out competition. It became increasingly obvious during the seminar that Google has immense power because of the size of the “data fields” it controls, with Facebook and others trying to stake their claims. All the power Big Data offers – through data mining, analytics, etc. – is dependent on scale. If you don’t have access to data on a huge scale, you cannot get statistically significant results, so you cannot fine tune your algorithms in the way that Google can. The implication is that individual companies will never be able to compete in the Big Data arena, because no matter how much data they gather on their customers, they will only ever have data on a comparatively small number of people.

How much is my data worth?

At a individual level, people seemed to think that “their” data had a value, but could not really see how they could get any benefit from it, other than by trading it for “free” services in an essentially hugely asymmetrical arrangement. The value of “my” data on its own – i.e. what I could sell it for as an individual – is little, but when aggregated, as on Facebook, the whole becomes worth far more than the sum of its parts.

At the same time, the issue of who actually owns data becomes commercially significant. Do I have any rights to data about my shopping habits, for example? There are many facts about ourselves that are simply public, whether we like it or not. If I walk down a public street, anybody can see how tall I am, guess my age, weight, probably work out my gender, social status, where I buy my clothes, even such “personal” details as whether I am confident or nervous. If they then observe that I go into a certain supermarket and purchase several bags of shopping, do I have any right to demand that they “forget” or do not use such observations?

New data, new laws?

It was repeatedly stated that the law as it stands is not keeping up with the implications of technological change. It was suggested that we need to re-think laws about privacy, intellectual property, and personal data.

It occurred to me that we may need laws that deal with malicious use of data, rather than ownership of data. I don’t mind people merely seeing me when I walk down the street, but I don’t want them shouting out observations about me, following me home, or trying to sell me things, as in the “Minority Report” scenario of street signs acting like market hawkers, calling out your name as you walk by.

What sort of a place is the Internet?

Technological change has always provoked psychological and political unease, and some speakers mentioned that younger people are simply adapting to the idea that the online space is a completely open public space. The idea that “on the Internet, no-one knows you are a dog” will be seen as a temporary quirk – a rather quaint notion amongst a few early idealists. Nowadays, not only does everyone know you are a dog, they know which other dogs you hang out with, what your favourite dog food is, and when you last went to the vet.

The focus of the evening seemed to be on how to make marketing more effective, with a few mentions of using Big Data to drive business process efficiencies. A few examples of how Big Data analytics can be used to promote social goods, such as monitoring outbreaks of disease, were also offered.

There were clear differences in attitudes. Some people wanted to keep their data private, and accept in return less personalised marketing. They also seemed to be more willing to pay for ad-free services. Others were far more concerned that data about them should be accurate and they wanted easy ways of correcting their own records. This was not just to ensure factual accuracies, but also because they wanted targeted, personalised advertising and so actively wanted to engage with companies to tell them their preferences and interests. They were quite happy with “Minority Report” style personalisation, provided that it was really good at offering them products they genuinely wanted. They were remarkably intolerant of “mistakes”. The complaint “I bought a book as a present for a friend on Amazon about something I have no interest in, now all it recommends to me are more books on that subject” was common. Off-target recommendations seemed to upset people far more than the thought of companies amassing vast data sets in the first place.

Lifting the lid of the Big Data black box

The issue that I like to raise in these discussions is one that Knowledge Organisation theorists have been concerned about for some time – that we build hidden biases so deeply into our data collection methods, our algorithms, and processes, that our analyses of Big Data only ever give us answers we already knew.

We already know you are more likely to sell luxury cars to people who live in affluent areas, and we already know where those areas are. If all our Big Data analysis does is refine the granularity of this information, it probably won’t gain us that many more sales or improve our lives. If we want Big Data to do more for us, we need to ask better questions – questions that will challenge rather than confirm our existing prejudices and assumptions and promote innovation and creativity, not easy questions that merely consolidate the status quo.

Your organization is not the Internet

    1 comment 
Estimated reading time 7–11 minutes

Many people find it very difficult to understand why search within an organization can’t “just be like Google”. This is often because they haven’t thought about the differences between an organization and the Internet.

Your organization is smaller than the Internet

Search engines like Google work because they have access to big data. Google gets billions of searches to process, from billions of users. Even if your organization is a large one, it won’t have that many users either searching or contributing content, so it cannot number crunch on the same scale as Google. Your IT department is probably a lot smaller than Google’s and your enterprise search team’s daily budget is unlikely to cover more than the tiniest fraction of what Google spends. Last, but by no means least, your organization doesn’t have as much content as the Internet, so it probably needs to be far more careful about not losing any that is valuable.

Surfing the net is not many people’s job

There are important differences between how and why people search when they are at work and when they are not, and between how and why they search the Internet and their organization’s Intranet or archives. People rarely surf their organization’s Intranet for fun, to be entertained, or to while away the time. The differences in serious research behaviour and leisure searching are well documented, so I am going to write about another aspect of differences between the Internet and organizations that is often overlooked.

Putting stuff online is not the same as writing a business report

There are vast differences in the ways that people create and curate content on the Internet and within an organization. These differences have a significant effect on the way search functions. The key difference is in how much they link their content to that of others. Of course, there are people whose jobs are to create and curate online content – all the web editors, content strategists, copywriters, social media marketers, etc. – but they will be the first to explain that they have a very specialised set of skills focused on making their content searchable, commercial, or otherwise user friendly. They do a whole lot of things that most people as part of the day job neither know how nor have the time to do.

Links are a form of Knowledge Organization that Google gets for free

One of the key things that web professionals and unpaid web enthusiasts do with their content is to add and manage links. Links are what organize the web. Links are what group sites into clusters by content. Links are the web’s classification scheme. Clay Shirky back in 2005 said “there is no shelf” but it makes just as much sense to think of millions of shelves – infinite shelves going off in all directions, with new ones being created and old ones being discarded. The web is not linear – like a shelf – but it is not without structure. Google effectively picks one of the near infinity of shelves and offers it up as a linear list whenever you do a search. It chooses the shelf that seems to be the most popular, or that fits its commercial model. First on the shelf is often a paid-for advertisement or a Wikipedia entry, followed by other big well-established commercial sites. Out there on the Internet, people do an awful lot of shopping, and not much work, so that’s fine. (If they are doing more shopping than work when they are at work, your organization probably has bigger problems than search to deal with.).

For many other searches, especially more thematic research, people would be disappointed with the results, were it not for the magic of the way the web works – the links. As long as Google slings a site at you that has lots of links to other sites, it doesn’t have to take you straight to what you want, it lets you and the links do the rest of the work. Links gather together similar content, so they function like a classification scheme. The links associate content that is aimed at similar audiences, is on similar topics, is of a similar age. The links represent a huge amount of sorting, cataloguing, and classification work. Google did not have to pay for this work (genius business model). People do this work for Google for free. They do this work as part of creating and curating their content.

Many of Google’s volunteer librarians do this work for fun. They create fan sites, they write Wikipedia articles, they produce lists and generate indexes to their favourite content. They provide cataloguing descriptions and context. They do all this work partly because they enjoy it and partly because they hope to get “repaid” by their site becoming popular. They hope this will either lead to monetary reward (their band will get signed, they’ll get a better job, they’ll sell advertising) or social reward (they’ll make online “friends”, get positive feedback from comments, etc.).

From the commercial angle, people do this work because they expect to gain financial reward. They want to sell more products and make money. This is why there are howls of pain whenever Google tweaks its algorithms. Companies that balk at investing in internal search systems will spend fortunes chasing SEO.

Are your staff content curators?

If you want your organization’s search to be “just like Google” you need to think about how linked your content is. Do people who create content in your organization do so for the same reasons and with the same motivations as people create and link content on the web? It is very unlikely that you have lots of “fans” who will spend their free time creating lists of your companies’ best information resources, or collecting and rating and reviewing reports and documents. Most employees are too busy getting on with their day jobs to spend office hours pursuing their “fan” projects. Even if your staff have plenty of spare time, how many of them are big enough fans of some aspect of work to treat it like a hobby? If you want people to start looking out for similar documents on your Intranet and linking their own documents to them, you will probably have to find ways of motivating them to do this as a special initiative. It is not likely to come “for free”, like it does for the web search engines.

For some organizations, encouraging and incentivising “fan”-type behaviour may work. If the organization already has a strong collaborative culture, with people sharing ideas and using social media, it may be a small step to get them to think of their documents and presentations as blog posts. Including content creation and curation in people’s job roles and rewarding those who do well will foster a link-rich Intranet. By recognising and rewarding people who promote useful links and lists and get them to rank highly in your enterprise searches, you could bring an element of gamification to encourage this sort of behaviour. For other organisations, the culture may support this kind of web-style content creation, but people are generally too busy, have skill sets too far from what is required, or need training and encouragement. In such organizations it may make sense to have the equivalent of web editors, content strategists, user experience specialists, search engine optimizers, etc. working with the organization’s internal content to promote the most valuable resources. In other words, layer of “linkers” who work alongside the content originators.

For other organizations, where it would be inappropriate, too time consuming, or too far from established culture to encourage web-like information behaviour, enterprise search will never work “just like Google”. More formal and standardized metadata management processes are likely to be needed. Organizations that generate a lot of very specific content that is unlikely to be useful in broader contexts, confidential content, or large volumes of very similar structured content are likely to find it hard to move away from directed and standardised searching.

Many organizations will have a “mixed economy” with different types of content and different departments operating with different styles (e.g. what works in a marketing department is unlikely to work in the same way in a finance department).

Without links, search is a lot of dead ends

Without links, each search result is isolated. This stops the searcher in their tracks and means they cannot surf in the way they do on the Internet. They will have to check search results one after another in a linear fashion. If your search engine is not getting the most relevant results to the top of that list, your staff will be spending a huge amount of time working their way through that list. They cannot plump for one likely looking result then follow the trail of links, as they do on the web. The links as a form of classification do not exist, so you need another mechanism (taxonomy, ontology, index, directory) to help people find groups of related content and browse through from one document to another.

So, even though you may have the technology and the budget to match Google’s, unless your content creators are linking freely, you will never completely succeed in turning your Intranet into a mini-Internet.