Category Archives: culture

Data: The New Black Gold?

    Start a conversation 
Estimated reading time 5–8 minutes

Last week I attended a seminar organised by The British Screen Advisory Council and Intellect, the technology trade association, and hosted by the law firm SNR Denton. The panellists included Derek Wyatt, internet visionary and former politician, Dr Rob Reid, Science Policy Adviser, Which?, Nick Graham, of SNR Denton, Steve Taylor, creative mentor, Donna Whitehead, Government Affairs Manager, Microsoft, Theo Bertram, UK Policy Manager, Google, David Boyle, Head of Insight, Zeebox, and Louisa Wong, Aegis Media.

Data as oil

The event was chaired by Adam Singer, BSAC chairman, who explored the metaphor of “data as oil”. Like oil, raw data is a valuable commodity, but usually needs processing and refining before it can be used, especially by individual consumers. Like oil, data can leak and spill, and if mishandled can be toxic.

It struck me through the course of the evening, that just like oil, we are in danger of allowing control of data to fall into the hands of a very small number of companies, who could easily form cartels and lock out competition. It became increasingly obvious during the seminar that Google has immense power because of the size of the “data fields” it controls, with Facebook and others trying to stake their claims. All the power Big Data offers – through data mining, analytics, etc. – is dependent on scale. If you don’t have access to data on a huge scale, you cannot get statistically significant results, so you cannot fine tune your algorithms in the way that Google can. The implication is that individual companies will never be able to compete in the Big Data arena, because no matter how much data they gather on their customers, they will only ever have data on a comparatively small number of people.

How much is my data worth?

At a individual level, people seemed to think that “their” data had a value, but could not really see how they could get any benefit from it, other than by trading it for “free” services in an essentially hugely asymmetrical arrangement. The value of “my” data on its own – i.e. what I could sell it for as an individual – is little, but when aggregated, as on Facebook, the whole becomes worth far more than the sum of its parts.

At the same time, the issue of who actually owns data becomes commercially significant. Do I have any rights to data about my shopping habits, for example? There are many facts about ourselves that are simply public, whether we like it or not. If I walk down a public street, anybody can see how tall I am, guess my age, weight, probably work out my gender, social status, where I buy my clothes, even such “personal” details as whether I am confident or nervous. If they then observe that I go into a certain supermarket and purchase several bags of shopping, do I have any right to demand that they “forget” or do not use such observations?

New data, new laws?

It was repeatedly stated that the law as it stands is not keeping up with the implications of technological change. It was suggested that we need to re-think laws about privacy, intellectual property, and personal data.

It occurred to me that we may need laws that deal with malicious use of data, rather than ownership of data. I don’t mind people merely seeing me when I walk down the street, but I don’t want them shouting out observations about me, following me home, or trying to sell me things, as in the “Minority Report” scenario of street signs acting like market hawkers, calling out your name as you walk by.

What sort of a place is the Internet?

Technological change has always provoked psychological and political unease, and some speakers mentioned that younger people are simply adapting to the idea that the online space is a completely open public space. The idea that “on the Internet, no-one knows you are a dog” will be seen as a temporary quirk – a rather quaint notion amongst a few early idealists. Nowadays, not only does everyone know you are a dog, they know which other dogs you hang out with, what your favourite dog food is, and when you last went to the vet.

The focus of the evening seemed to be on how to make marketing more effective, with a few mentions of using Big Data to drive business process efficiencies. A few examples of how Big Data analytics can be used to promote social goods, such as monitoring outbreaks of disease, were also offered.

There were clear differences in attitudes. Some people wanted to keep their data private, and accept in return less personalised marketing. They also seemed to be more willing to pay for ad-free services. Others were far more concerned that data about them should be accurate and they wanted easy ways of correcting their own records. This was not just to ensure factual accuracies, but also because they wanted targeted, personalised advertising and so actively wanted to engage with companies to tell them their preferences and interests. They were quite happy with “Minority Report” style personalisation, provided that it was really good at offering them products they genuinely wanted. They were remarkably intolerant of “mistakes”. The complaint “I bought a book as a present for a friend on Amazon about something I have no interest in, now all it recommends to me are more books on that subject” was common. Off-target recommendations seemed to upset people far more than the thought of companies amassing vast data sets in the first place.

Lifting the lid of the Big Data black box

The issue that I like to raise in these discussions is one that Knowledge Organisation theorists have been concerned about for some time – that we build hidden biases so deeply into our data collection methods, our algorithms, and processes, that our analyses of Big Data only ever give us answers we already knew.

We already know you are more likely to sell luxury cars to people who live in affluent areas, and we already know where those areas are. If all our Big Data analysis does is refine the granularity of this information, it probably won’t gain us that many more sales or improve our lives. If we want Big Data to do more for us, we need to ask better questions – questions that will challenge rather than confirm our existing prejudices and assumptions and promote innovation and creativity, not easy questions that merely consolidate the status quo.

Data Ghosts in the Facebook Machine by Fantasticlife

    1 comment 
< 1 minute

Understanding how data mining works is going to become increasingly important. There is a huge gap in popular and even professional knowledge about what organisations can now do “under the surface” with our data. For a very clear and straightforward explanation of how social graphs work and why we should be paying attention read Data Ghosts in the Facebook Machine.

Conversations about conversation – Gurteen knowledge café

    Start a conversation 
Estimated reading time 4–6 minutes

Last Wednesday evening I attend my first “Knowledge Café” hosted by David Gurteen. I have heard a lot about these cafés at various information events and so was pleased to finally be able to attend one in person. The idea appears to be twofold – firstly that knowledge and information professionals can find out what such cafés are for and how to run them and secondly simply to participate in them for their own sake.The “meta-ness” of the theme – conversations about conversation – appealed to me. (I’ve always like metacognition – essentially thinking about thinking, too).

We had plenty of time to get a drink and network before the event started, which is always a good thing, then David gave us a short introduction to the topic. He talked about Theodore Zeldin‘s book about Conversation: How Talk can Change our Lives and reminisced about a conversation from his own childhood that had held personal significance. He then set us three questions to discuss, about whether conversations can help us to see the world differently and how we can use them to bring about change for the better.

We then had a quick round of “speed networking” and formed groups to talk about the first question, moving on to different groups subsequently, so that we were well mixed by the end of the evening. To conclude we gathered into one large circle to talk further. This way we spiralled out from a single speaker, to speaking in pairs, then small groups, then all of us together.

Some common strands that everyone seemed to touch on at some point included discussing whether conversation was medium agnostic. Some people felt quite strongly that only a face-to-face discussion was a real conversation and that chatting via email, by text, by IM, and even by telephone were not the same. Others felt that the medium was irrelevant, it was the nature and quality of the communication that mattered. They agreed that signals, such as body language, shared environment, and instant interactivity were lost when not face to face, but that other factors, such as power imbalances between participants, could be minimised by talking remotely and unseen. Most people agreed that it was far easier to chat in highly constrained media, such as texting, with people one already knew well and had talked to frequently face to face, as that acquaintance helped smooth over misunderstandings due to lack of tone of voice or hastily chosen and ambiguous words. Clarity of vocabulary was also seen as key, especially when dealing with diverse groups or communities of practice.

Trust, power, empathy, and the ability to listen were noted as important factors in productive conversations, as was persuasion, but also that people needed to be open and receptive if change – and perhaps even communication at all – were to be achieved.

I was surprised that fewer people mentioned the physical surroundings and settings of good conversations. I remembered Plato, with Socrates sometimes in the marketplace and sometimes going off to sit in a quiet place under a tree. I find the best conversations need a calm neutral space, without interruptions, where participants can be comfortable, can hear each other clearly, can see each other easily, and have space to move about, perhaps to draw, gesture, etc. if they want to emphasise or illustrate a point. Poor acoustics in restaurants can be disastrous for dinner conversations if all you can hear is clattering chairs and clinking cutlery. Chirruping mobile phones, staff requesting answers, and children needing attention break conversational rhythm and flow, not to mention trains of thought.

Interestingly, in the group discussion, and as so often happens in all conversations, people drifted off topic and became increasingly animated by discussion of something unintended and not particularly relevant. In this case it was a purely political debate about whether the competitive nature of humans was a good or bad thing. Despite mutterings that we are becoming less politically engaged, people seem to want to wear their politics very much on their sleeves.

On the way home, I wondered whether the conversations I had participated in that evening had changed me or the world. In a small way, every experience we have changes the world. I met some interesting new people. I had some new ideas and learned a few new piece of information (apparently it is less tiring to listen to a telephone conversation using both ears – e.g. through a pair of headphones instead of a single earpiece). This blog post exists as a result of the evening. However, I took to heart the point that change has to come from within and I resolved to try to remember to stay adaptable and open to new viewpoints. I also resolved to listen more attentively and to try to facilitate better, more productive conversations while at work. I certainly hope this will change the world for the better, albeit in a very subtle way.

There’s no such thing as digital privacy

    Start a conversation 
Estimated reading time 1–2 minutes

I was asked to write about privacy for Information Today Europe.

In under 1,000 words it is not easy to cover such a huge topic, so I tried to take a bird’s eye view and put just a few of the issues into a broad context. Most people focus on quite a narrow angle – for example information security, or libel cases – but the topic covers far wider socio-cultural issues. From hacking, to government surveillance, to Facebook as a marketing tool, to family logins for online services, to personalisation, and even neuroscience, what is known about us and who may know it runs right through the heart of our interactions and transactions.

Privacy is also a very hot topic, with Radio Four’s PM running a series – The Privacy Commission, and legislators trying to figure out what sort of a legal framework we need to balance the often competing interests in privacy of the rich and famous, the ordinary citizen, the family member, the child, commercial organisations, the government. There is much to consider as we rely more and more on “black boxed” algorithms and processors as our information mediators.

The Organizational Digital Divide

    Start a conversation 
Estimated reading time 2–2 minutes

Catching up on my reading, I found this post by Jonah Bossewitch: Pick a Corpus, Any Corpus and was particularly struck by his clear articulation of the growing information gulf between organizations and individuals.

I have since been thinking about the contrast between our localised knowledge organization systems and the semantic super-trawlers of the information oceans that are only affordable – let alone accessible – to the megawealthy. It is hard not to see this as a huge disempowerment of ordinary people, swamping the democratizing promise of the web as a connector of individuals. The theme has also cropped up in KIDMM discussions about the fragmentation of the information professions. The problem goes far beyond the familiar digital divide, beyond just keeping our personal data safe, to how we can render such meta-industrial scale technologies open for ordinary people to use. Perhaps we need public data mines to replace public libraries? It seems particularly bad timing that our public institutions – our libraries and universities – are under political and financial attack just at the point when we need them to be at the technological (and expensive) cutting edge.

We rely on scientists and experts to advise us on how to use, store and transport potentially hazardous but generally useful chemicals, radioactive substances, even weapons, and information professionals need to step up to the challenges of handling our new potentially hazardous data and data analysis tools and systems. I am reassured that there are smart people like Jonah rising to the call, but we all need to engage with the issues.

There’s no such thing as originality

    Start a conversation 
Estimated reading time 5–8 minutes

Back in 1995 my brother wrote his MA dissertation on copyright: Of Cows and Calves: An Analysis of Copyright and Authorship (with Implications for Future Developments in Communications Media) or How I Learned To Stop Worrying and Love Home-Taping. It is interesting how relevant it remains today, especially in the light of the Hargreaves Report delivered to the government in May. Essentially, nothing much has changed in the intervening 16 years. My brother reflected the predictions of the profound changes that digital technologies would make – and were already making – to the creative industries. Although details such as the excitement over ISDN lines and no mention of mobile technologies date his work, the core issues he covers – who owns and idea and who should get paid for it – remain remarkably current.

I’ve written a brief overview of the Hargreaves Report for Information Today, Europe. The two aspects of most interest to me are the proposals for a Digital Copyright Exchange and for handling of orphan works.

Ideas as objects

My brother argues that creative works are all part of an ongoing cultural dialogue that no one individual can really “own” and that copyright only made sense for the short period of time where technology reified ideas as artefacts that could be traded as commodities (like potatoes or coal). The business model of “content” as “physical item” started to fail with the invention of the printing press, as the process of copying ceased to be a creative act, so each individual copy was not a “new” work in its own right. Copyright law was developed to commodify the “idea” within a book, not the physical book and was enforceable for only as long as access to the copying technologies – printing presses – could be limited. The digital age has made control of the copying process impossible, as the computer replaced the printing press, so one could exist on every desk, and now, thanks to mobile technology, in every pocket. He notes that in pre-literate societies, authorship of a myth or a folklore was not important, and I find it interesting that crowd sourcing (e.g. Wikipedia, citizen journalism) has in some ways returned us to the notion of a culturally held store of knowledge contributed and curated by volunteers, rather than by paid professionals.

Music is not the only art

Many of the discussions of free v. paid for content seem to run to extremes, and seem to be coloured by the popular music industry’s taste for excess. The music industry inflated commodity prices far beyond what consumers were willing to pay just as cheap copying technologies became widely available, making the pirates feel morally justified. It is hard to feel sympathy for people living a millionaire rockstar lifestyle. The inevitable increase in piracy was met not by lower prices, but by the industry issuing alarmist statements about home taping killing music. It didn’t! Music industry profits have risen steadily. The industry has simply turned its attention to charging more for merchandise and live events. The lesson to be learned is that people are willing to pay for experiences, services, and commodities that they perceive as being worth the price and better than the alternatives. Most music fans would rather pay for an easy, virus-free reliable download service than deal with illegal download sites, just as back in the 1980s sticking a microphone in front of the radio to record the charts wasn’t as good as buying the vinyl. The effects on the reference publishing industry were very different, affecting many small businesses and people on far less than rockstar wages, but most displaced people found ways to transfer their skills to numerous new areas of work – obvious examples are content strategy or user experience design – that simply didn’t exist in the 1990s.

It has become a bit of a cliche that there aren’t any business models that work, or have been shown to work, in the new digital economy, but are things really so different? In order to have a business somebody somewhere has to be persuaded to pay for something. Everything else is just a complication. If your free content is supported by advertising, it just means that someone needs to be persuaded to pay for the advertised product, instead of the free bit that appears on the way. Similarly, “freemium” is really just old-style free samples and loss leaders. You can’t have the “free” bit without paying for the “premium”. The two key questions for producers remains, as they have always been, how do you produce content that is so useful, entertaining, or attractive that people are willing to pay for it, and how do you deliver it in ways that make the buying process as easy as possible?

Hargreaves suggests a light touch towards enforcement of rights and anti-piracy, firmly supporting the view that if content and services are good enough, people will pay, and that education about why artists have a right to be paid for their work is as important as catching the pirates. Attempting to “lock” copies with Digital Rights Management systems certainly don’t seem to have been very successful. They are expensive to implement, unpopular, and pirates always manage to hack them. Watermarking doesn’t attempt to prevent copying but does help prove origin if a breach is discovered. Piracy is less of a worry for business-to-business trade, as most legitimate businesses want to be sure they have the correct rights and licences for content they use, rather than face embarrassing and expensive lawsuits, and a simplified, secure Digital Copyright Exchange would presumably be in their interests.

Digital Copyright Exchange

Hargreaves proposes the Digital Copyright Exchange as a mechanism to make the buying and selling of rights far easier. At the moment, piracy can be a temptation because of the time and effort required to attempt to purchase rights. Collections agencies and the law form layers of bureaucracy that hamper start-ups from developing new products and simply confuse ordinary users. This represents real lost revenue to the content providers.

Metadata analyst and music fan Sam Kuper suggested an interesting proposal for setting fair prices – that artists should put a “reserve price” on their work, with an initial fee for purchasers. Once the “reserve price” has been reached, any subsequent purchase is shared between the artist and the early purchasers. This would guarantee a level of income for artists, allow keen fans to get hold of new material quickly, and allow those less sure to wait to see if the price drops before purchase. Such a system sounds complex, but could work through some kind of centralised system, so that “returns” to early purchasers would be returned as credits to their accounts.

From an archive point of view, Hargreaves’s call to allow the digitisation and release of orphan works without endless detective work in trying to trace origins would be a huge boon.

So, there is much to think about in the Hargreaves report and some very sensible practical suggestions, but much detail to be worked out as well. I wonder if in another 16 years, my brother and I will have seen any real change or will we still be going through the same debate?

Serendipity and large video collections

    Start a conversation 
Estimated reading time 2–2 minutes

I enjoyed this blog post: On Serendipity. Ironically, it was recommended to me, and I am now recommending it!

Serendipity is rarely of use to the asset manager, who wants to find exactly what they expect to find, but is a delight for the consumer or leisure searcher. People sometimes cite serendipity as a being a reason to abandon classification, but in my experience classification often enhances serendipity and can be lost in simple online search systems.

For example, when browsing an alphabetically ordered collection in print, such as an encyclopedia or dictionary, you just can’t help noticing the entries that sit next to the one you were looking for. This can lead you to all sorts of interesting connections – for example, looking up crescendo, I couldn’t help noticing that crepuscular means relating to twilight, and that there is a connection between crepe paper and the crepes you can eat (from the French for “wrinkly”), but crepinette has a different derivation (from the French for “caul”). What was really interesting was the fact that there was no connection, other than an accident of alphabetical order. I wasn’t interested in things crepuscular, or crepes and crepinettes, and I can’t imagine anyone deliberately modelling connections between all these things as “related concepts”.

Wikipedia’s “random article” function is an attempt to generate serendipity alogrithmically. On other sites the “what people are reading/borrowing/watching now” functions use chronological order to throw out unsought items from a collection in the hope that they will be interesting. Twitter’s “trending topics” use a combination of chronological order and statistics on the assumption that what is popular just now is intrinsically interesting. These techniques look for “interestingness” out of what can be calculated and it is easy to see how they work, but the semantic web enthusiasts aim to open up to automated processing the kind of free associative links that human brains are so good at generating.

Online Information Conference – day three

    Start a conversation 
Estimated reading time 4–6 minutes

Managing content in a mobile world

On Day 3, I decided to try the “Mobile content” track. The speakers were Alan Pelz Sharpe from The Real Story Group, Russell Reeder from Libre Digital and Dave Kellogg from MarkLogic.

Augmented reality is one of my pet topics, so I was pleased to hear the speakers confirm it is all about data and meaning. It is just one aspect of how consumers want more and more data and information presented to them without delay on smaller and simpler devices. However, this means a greater need for metadata and more work for usability specialists.

The whole way people interact with content is very different when they are on the move, so simply trying to re-render websites is not enough. Entire patterns of information needs and user behaviour have to be considered. “A mobile person is not the same as a mobile device…the person needs to be mobile, not necessary the content.” For example, mobile workers often prefer to contact a deskbound researcher and get answers sent to them, not do the research themselves while on the move.

It is not enough just to worry about the technological aspects, or even just the information architecture aspects of versions of content for mobile users. A different editorial style needs to be used for small screens, so content needs to be edited to a very granular level for mobile – no long words!

Users don’t care about formats, so may get a very bad impression of your service if you allow them to access the wrong content. One customer was cited as complaining that they could watch You Tube videos easily on their phone (You Tube transcodes uploads so they are low res and streamable), but another video caused huge problems and took ages (it turned out to be a download of an entire HD feature film).

The session made me feel quite nostalgic, as back in 1999 we spent much time pondering how we would adapt our content to mobile devices. Of course, then we were trying to present everything on tiny text-only screens – 140 characters was seen as a luxury. There is just no comparison with today’s multifunctional multicoloured shiny touch screen devices.

New World of Search – Closing Keynote

I think every conference should include a speaker who rises above day-to-day business concerns and looks at really big pictures. Stephen Arnold outlined the latest trends in search technologies, both open source and proprietary, and how people are now getting better at picking elements from different systems, perhaps combining open source with proprietary options and building modular, integrated systems to prevent lock-in. However, he also talked engagingly about digital identity, privacy, and the balance of power between individuals and large organisations in the digital world.

He reiterated the point that Google (and other search engines) are not free. “Free is not what it seems to the hopeful customer” but that we haven’t yet worked out the value of data and content – let alone information and knowledge – in the light of the digital revolution. Traditional business models do not work and old economic theories no longer apply: “19th century accounting rules don’t work in the 21st century knowledge economy”.

He noted that Facebook has managed to entice users and developers to give up their content, work, time, and intellectual property to a locked-in proprietary walled garden. People have done this willingly and apparently without prompting, enabling Facebook to achieve something that software and content vendors such as IBM and AOL have been trying unsuccessfully to do for decades.

There is no clear way of evaluating the service that Facebook provides against the value of the content that is supplied to it free by users. However, it is clear that it is of huge value in terms of marketing. It is possibly the biggest and most sophisticated marketing database ever built.

As well as content, people are willing to surrender personal information, apparently with minimal concerns for privacy and security. It is not clear what the implications of this are: “What is the true cost of getting people to give up their digital identities?” It is clear that the combined data held by Facebook, Google, Foursquare, mobile phone companies, credit card companies, travel card companies, etc. creates a hugely detailed “digital profile” of human lives. Stephen urged the audience to take seriously the potential for this to be used to cause harm to individuals or society – from cyberstalking to covert government surveillance – because technology is moving so fast that any negative social and political effects may already be irreversible.

Augmented reality

    Start a conversation 
Estimated reading time 3–5 minutes

I went to a British Computer Society talk on Augmented Reality a few weeks ago. The BCS audience is typically highly technical, but the talks themselves are always accessible and entertaining. People often wonder why I am interested in augmented reality, because they assume it has nothing to do with information, but to me it is all about information. I would love to be able to serve up archive content to someone’s mobile phone using location data – a clip of a scene from an episode of their favourite programme that was filmed in that location, or an old news report about an event that took place there. Managing vast data sets containing huge amounts of content in a searchable form will form the backbone of many augmented reality tools and applications. If this isn’t an area that information scientists should be exploring, I don’t know what is!

The speakers were Professor Anthony Steed, Head of Virtual Environments and Computer Graphics at UCL, and Lester Madden, founder and director of Augmented Planet.

They explained the difference between visual search, true augmented reality, and virtual reality. Visual search is using an image as a search term (as in Google Goggles) and then returning results. Because this can be done via a camera, the image can be one that is in the searcher’s immediate environment, and the results can be returned as an soverlay on the original image. True augmented reality is not just adding graphics to an unrelated camera feed, but is responsive to the real surroundings. Virtual reality is an entirely computer-generated environment.

3-D models of the world are being built, but keeping them up to date is proving a challenge, and crowdsourcing may be the only pragmatic option. Another technical challenge suggested was how to render the augmentation visually indistinguishable from “real” vision, which raises all sorts of interesting philosophical and ethical questions about how we handle the behaviour of people who become confused or cease to be able to tell the difference, either temporarily or permanently. At the moment, augmented reality is quite distinct from virtual reality, but eventually the two will presumably meet. However, nobody seems to think that is likely anytime soon.

In the meantime, there was a rather lovely video of an augmented reality audience, designed to help people who have difficulty speaking in public. Apparently, this is a particular problem for those people in the software industry who are not natural extroverts but find that their careers can only advance if they get out from behind the screen and start talking at conferences, trade shows, etc., where audiences can be quite hostile. University students are hopeless at pretending to be a hostile audience – they are too polite, apparently (this week’s events notwithstanding!) – and actors are too expensive. Avatars, however, can be programmed to look bored, chat on their mobiles, get up and walk out, etc., and real people tend to have similar emotional reactions to the behaviour of avatars as they do to other humans, making an augmented reality theatre a perfect place for practising speaking and building confidence.

Augmented reality is also finding practical applications in the construction industry, to create visualisations of buildings before they are constructed, in medicine to help surgeons, and for improved videoconferencing. There are also many ways that augmented reality can be used to sell things – show me information about the restaurant or shoe shop in this street. Amusingly, identifying unique buildings is quite easy, but for the branded chains disambiguation is proving a challenge – their outlets look the same in every town – which brings us back again to familiar information science territory.

There is also a BCS blog post about the event.