Building, visualising and deploying taxonomies and ontologies; the reality – Content Intelligence Forum event

    Start a conversation 
Estimated reading time 1–2 minutes

I have been trying to get to the Content Intelligence Forum meetups for some time as they always seem to offer excellent speakers on key topics that don’t tend to get the attention they deserve, so I was delighted to be able to attend Stephen D’Arcy’s talk a little while ago on taxonomies and ontologies.

Stephen has many years of experience designing semantic information systems for large organisations, ranging from health care providers, to banks, to media companies. His career illustrates the transferability and wide demand for information skills.

His 8-point checklist for a taxonomy project was extremely helpful – Define, Audit, Tools, Plan, Build, Deploy, Governance, Documentation – as were his tips for managing stakeholders, IT departments in particular. He warned against the pitfalls of not including taxonomy management early enough in search systems design, and the problems that you can be left with if you do not have a flexible and dynamic way of managing your taxonomy and ontology structures. He also included a lot of examples that illustrated the fun aspects of ontologies when used to create interesting pathways through entertainment content in particular.

The conversation after the talk was very engaging and I enjoyed finding out about common problems that information professionals face, including how best to define terms, how to encourage clear thinking, and how to communicate good research techniques.

I friend dead people – Are social media mature enough to cope with bereavement?

    3 comments 
Estimated reading time 4–7 minutes

This is a very personal post about topics in which I am not an expert, so I welcome comments and suggestions.

When “like” and “lol” don’t help

In February, a young man I had never met died in sad circumstances. He was a friend of a friend and I was supposed to meet him on the day he died. Completely coincidentally, within a fortnight I myself lost a dear friend, someone I had known for over 20 years.

The closeness in timing has thrown out sharp contrasts in the way that these deaths have reverberated around my social media worlds (obviously the real world impacts have been huge, but I am not going to discuss those here).

In many ways, dealing with the death of my own friend on social media has been easier. Being well known to her family and her circle of closest friends has meant that I have felt able to post messages of condolence and remembrance as I instinctively know what is appropriate, and I know that most of the people reading them will know me. It has been strange to see her name pop up as a “friend available on chat” when I know any activity in her account must be one of her family members logging in to maintain the page. Yesterday was her birthday, and the reminders in my calendar and the little birthday gift “event reminder” were bittersweet, but not unwelcome. I think of her and her family often, and do not want to forget.

Just after she died, I received a message through a social media site from someone I had never met or even heard of, who had been a schoolfriend of hers long ago, asking what had happened to our mutual friend, and I felt comfortable in answering. It helped me to talk about her with this stranger. I even flattered myself that I was doing some good, in that they clearly felt awkward about contacting her family directly while I was able to act as an “information resource” meaning the family and closest friends could focus on their own grieving.

I friend dead people

In contrast, how to cope with the loss of an almost-friend on social media has been strange and unnerving. One social media application has tactlessly and repeatedly suggested him as a friend, noting how many friends we had (have?) in common. Somehow I didn’t have the heart to click on “ignore”. I realise now I should have done just that, because I was anguished when I accidentally clicked on “confirm”. I worried that his friends and relatives might see my “friend request” and be distressed by it. Maybe they would never spot the noitification, maybe they would assume it was sent at a time before his death – just another reminder of what might have been, maybe they would even be comforted by the continuation of these distant social interactions with almost-strangers. (I immediately emailed the site in question asking them to retrieve my suggestion, but received no reply.)

My uncertainty about the appropriate “social media etiquette” was no doubt increased rather than diminished by our social distance. I do not know his family and friends well enough to mention this casually in passing, to express that this had been a mistake and was not intended to distress, or even to know what sort of people they are and whether this is the sort of thing that might upset them. However, it is exactly these sort of loose “one degree of separation” relationships that online social media foster and this incident struck me as illustrating how inadequate such media are when interactions need to go beyond chirruping about the weather, saying a website is cool, or asking whether or not someone wants to go to a party.

Digital memorials

My friend’s social media pages have slipped into being a form of digital memorial, but this also raises new issues. There have been stories in the press of “trolls” deliberately desecrating memorial pages in an online equivalent of upturning flowers left on a grave or kicking over and spraying graffiti on a headstone (e.g. http://gawker.com/5868503/why-people-troll-dead-kids-on-facebook). The only way to deal with this seems to be to remove the page, which is a shame and in a way seems to mean the bullies have won. It also highlights a strange transition from personal to public. Our graveyards are either public spaces that the authorities monitor and maintain or privately curated grounds. I have previously thought of my social media pages as more like a private garden – people may peer over the wall, but it is essentially “my” space to maintain. People are starting to think more and more about their digital legacies (the British Computer Society recently held an event on this theme).

There are already “digital memorial” companies offering guarantees of “permanent” archiving and access to sites (e.g. Much Loved). Other sites offer memorial pages that allow people to make donations to charity, but presumably these are not expected to remain in place forever.

However, these sites are aimed at those who remain setting up the sites, not taking over the sites that belonged to their loved ones. The value of someone’s posts and pages changes dramatically when they become precious memories, and not just ephemeral chatter. If we (or our loved ones) want our own sites to go on after us, do we need to bequeath our passwords to trusted friends or family? How does that affect our contracts with hosts and service providers? What rights do families have to “reclaim” the pages and content if there is no such bequest? How would disputes over inheritance of such sites be decided? What recourse do we have if the site owner decides to shut down and delete the content or simply loses it?

It seems to me that such issues have the potential to cause far more distress than the strangenesses we encounter when automated reminders and friend suggestions behave as if we are all immortal.

Change, technology, understanding, and the information professions

    Start a conversation 
Estimated reading time 3–4 minutes

Not being a morning person, I was unsure whether a networking breakfast would suit me, but the recruitment agent Sue Hill’s event offered good food and interesting conversation, so I thought I would give it a try. I wasn’t disappointed – the food was excellent and the big round tables promoted lively group discussion.

We were a mix of information professionals from public and private sector, at different stages of our careers, but three key themes prompted the most debate.

Change management

Managing technology change and bridging the cultural and political divisions within organisations in order to bring about change were key concerns. Information professionals can contribute by explaining how new technologies work, how technologies can be catalysts of changes in behaviour, and how they mitigate or increase informational and archival risks. Even simply letting people know new technology is out there can be hugely valuable. Knowledge and information workers can help manage change on political and cultural levels by understanding the corporate culture they are working in and helping their organisation to understand itself and so make good decisions about systems procurement. Information professionals can also often help to break down cultural barriers, to sharing information, for example.

Social media

Social media are now being used to differing degrees within organisations – some having embraced the technologies wholeheartedly, others seeing them as a problem or a threat. There was a general concern that technology is being adopted and used faster than we can understand its impacts and devise strategies for mitigating any risks.

Personal and cultural understanding of the divisions between the public and the private seemed to be a problematic area. Young people in particular were perceived as being vulnerable to “over exposure” as they seemed not to notice that postings about them – pictures especially – would remain available for decades to come and could compromise them in their future careers. Recruitment agents use social media to find out about potential job candidates, and notice inconsistencies between a very professional image presented in a CV or at interview with a Twitter feed that paints a picture of carelessness, foolishness, or irresponsibility.

Information literacy

Awareness of how to use and abuse social media, search engines and research tools, and data and statistics was seen as an arena in which information professionals can offer advice and mentoring, to young people, but also to organisations. Information professionals should also set good examples of how to use social media tools, adopt new working practices, and evaluate new technologies. They should also be able to explain how search engines work, what the pitfalls of poorly planned or too narrow research strategies are, and how to research in a more efficient and effective manner.

A new area that information professionals also need to understand is data analytics and how statistics and algorithmic data mining can be used or abused. Information professionals need not be advanced mathematicians to contribute in this area – an understanding of how to interpret data, the political and cultural issues that can bias interpretations, how to frame questions to get mathematically and statistically significant results, and how to understand the importance of outliers and statistical anomalies are skills that are becoming more important every day.

Overall, I thoroughly enjoyed being woken up by such thoughtful and interesting breakfast companions and went about the rest of the day with a head full of fresh ideas.

Isn’t search the same as browse?

    Start a conversation 
Estimated reading time 4–6 minutes

I nearly wept when one of our young rising IT stars queried in a meeting why we had separated “search” and “browse” as headings for our discusssions on archive navigation functionality. So, to spare me further tears here are some distinctions and similarities. There won’t be anything new for information professionals, but I hope it will be useful if any of your colleagues in IT need a little help. I am sure this is far from comprehensive, so please leave additions and comments!

Differences between search and browse

Search is making a beeline to a known target, browse is wandering around and exploring.
Search is for when you know what you are looking for, browse is for when you don’t.
Search is for when you know what you are looking for exists, browse is for when you don’t.

Search expects you to look for something that is findable, browse shows you the sort of thing you can find.
Search is for when you already know what is available in a collection or repository, browse is how you find out what is there, especially if you are a newcomer.
Search is difficult when you don’t know the right words to use, browse offers suggestions.
Search is a quickfire answer, browse is educative.
Search is about one-off actions, browse is about establishing familiar pathways that can be followed again or varied with predictable results.

Search relies on the seeker to do all the thinking, browse offers suggestions.
Search is a tricky way of finding content on related topics, browse is an easy way of finding related content.
Search is difficult when you are trying to distinguish between almost identical content, browse can highlight subtle distinctions.
Search rarely offers completeness, browse often offers completeness.

Search is pretty much a “black box” to most people, so it is hard to tell how well it has worked, browse systems are visible so it is easy to judge them.
Search uses complex processing that most people don’t want to see, browse uses links and connections that most people like to see.
Search is based on calcuations and assumptions that are under the surface, browse systems offer frameworks that are more open.

Search works well on the web, because the web is so big no-one has had time to build an easy way to browse it, browse works well on smaller structured collections.
Search can run across vast collections, browse needs to be offered at human-readable scales.
Search does not usually give an indication of the size or scope of a collection, browse can be designed to indicate scale.

Similarities between search and browse

Search and browse are both ways of finding content.
Search and browse can both be configured in a huge variety of ways.
Search and browse both have many different mechanisms and implementations.
Search and browse should both be tailored to users’ needs.
Search and browse systems both require thought and editorial judgement in their creation so that they work effectively for any particular collection.
Search and browse systems can often both be created largely automatically.
Search and browse often both involve metadata.
Search and browse behaviours may be intertwined, with users switching from one to the other.
Search and browse may be used by the same users for different tasks at different times.
Search and browse both offer serendipity, although serendipitous opportunities are often hidden by interface design.

Should I offer my users search or browse?

Almost always, you should offer both. Unless you are very sure that your users will always be performing the same kind of task and have the same level of familiarity with your content. With small static collections of content, it may not matter too much, but for most content collections, users will probably want both, but which you make your main focus depends on the context and collection.

Shops might have lots of images and very little text, so a beautifully designed navigation system will help customers find – and buy – products they might not know about, while only a simple search system might be needed to cover searches for product names. A library will need to support lots of searches for titles and across catalogue text with a good search system, but will also need to help educate and inform users with a clear user-friendly browsable navigation system. A large incoherent collection of unstructured text with no particular purpose is likely to be difficult to navigate no matter what you design, so will need good search, but – apart from the web itself – such unbounded and unmanaged collections tend to be quite unusual.

Data: The New Black Gold?

    Start a conversation 
Estimated reading time 5–8 minutes

Last week I attended a seminar organised by The British Screen Advisory Council and Intellect, the technology trade association, and hosted by the law firm SNR Denton. The panellists included Derek Wyatt, internet visionary and former politician, Dr Rob Reid, Science Policy Adviser, Which?, Nick Graham, of SNR Denton, Steve Taylor, creative mentor, Donna Whitehead, Government Affairs Manager, Microsoft, Theo Bertram, UK Policy Manager, Google, David Boyle, Head of Insight, Zeebox, and Louisa Wong, Aegis Media.

Data as oil

The event was chaired by Adam Singer, BSAC chairman, who explored the metaphor of “data as oil”. Like oil, raw data is a valuable commodity, but usually needs processing and refining before it can be used, especially by individual consumers. Like oil, data can leak and spill, and if mishandled can be toxic.

It struck me through the course of the evening, that just like oil, we are in danger of allowing control of data to fall into the hands of a very small number of companies, who could easily form cartels and lock out competition. It became increasingly obvious during the seminar that Google has immense power because of the size of the “data fields” it controls, with Facebook and others trying to stake their claims. All the power Big Data offers – through data mining, analytics, etc. – is dependent on scale. If you don’t have access to data on a huge scale, you cannot get statistically significant results, so you cannot fine tune your algorithms in the way that Google can. The implication is that individual companies will never be able to compete in the Big Data arena, because no matter how much data they gather on their customers, they will only ever have data on a comparatively small number of people.

How much is my data worth?

At a individual level, people seemed to think that “their” data had a value, but could not really see how they could get any benefit from it, other than by trading it for “free” services in an essentially hugely asymmetrical arrangement. The value of “my” data on its own – i.e. what I could sell it for as an individual – is little, but when aggregated, as on Facebook, the whole becomes worth far more than the sum of its parts.

At the same time, the issue of who actually owns data becomes commercially significant. Do I have any rights to data about my shopping habits, for example? There are many facts about ourselves that are simply public, whether we like it or not. If I walk down a public street, anybody can see how tall I am, guess my age, weight, probably work out my gender, social status, where I buy my clothes, even such “personal” details as whether I am confident or nervous. If they then observe that I go into a certain supermarket and purchase several bags of shopping, do I have any right to demand that they “forget” or do not use such observations?

New data, new laws?

It was repeatedly stated that the law as it stands is not keeping up with the implications of technological change. It was suggested that we need to re-think laws about privacy, intellectual property, and personal data.

It occurred to me that we may need laws that deal with malicious use of data, rather than ownership of data. I don’t mind people merely seeing me when I walk down the street, but I don’t want them shouting out observations about me, following me home, or trying to sell me things, as in the “Minority Report” scenario of street signs acting like market hawkers, calling out your name as you walk by.

What sort of a place is the Internet?

Technological change has always provoked psychological and political unease, and some speakers mentioned that younger people are simply adapting to the idea that the online space is a completely open public space. The idea that “on the Internet, no-one knows you are a dog” will be seen as a temporary quirk – a rather quaint notion amongst a few early idealists. Nowadays, not only does everyone know you are a dog, they know which other dogs you hang out with, what your favourite dog food is, and when you last went to the vet.

The focus of the evening seemed to be on how to make marketing more effective, with a few mentions of using Big Data to drive business process efficiencies. A few examples of how Big Data analytics can be used to promote social goods, such as monitoring outbreaks of disease, were also offered.

There were clear differences in attitudes. Some people wanted to keep their data private, and accept in return less personalised marketing. They also seemed to be more willing to pay for ad-free services. Others were far more concerned that data about them should be accurate and they wanted easy ways of correcting their own records. This was not just to ensure factual accuracies, but also because they wanted targeted, personalised advertising and so actively wanted to engage with companies to tell them their preferences and interests. They were quite happy with “Minority Report” style personalisation, provided that it was really good at offering them products they genuinely wanted. They were remarkably intolerant of “mistakes”. The complaint “I bought a book as a present for a friend on Amazon about something I have no interest in, now all it recommends to me are more books on that subject” was common. Off-target recommendations seemed to upset people far more than the thought of companies amassing vast data sets in the first place.

Lifting the lid of the Big Data black box

The issue that I like to raise in these discussions is one that Knowledge Organisation theorists have been concerned about for some time – that we build hidden biases so deeply into our data collection methods, our algorithms, and processes, that our analyses of Big Data only ever give us answers we already knew.

We already know you are more likely to sell luxury cars to people who live in affluent areas, and we already know where those areas are. If all our Big Data analysis does is refine the granularity of this information, it probably won’t gain us that many more sales or improve our lives. If we want Big Data to do more for us, we need to ask better questions – questions that will challenge rather than confirm our existing prejudices and assumptions and promote innovation and creativity, not easy questions that merely consolidate the status quo.