Big_Data | VocabControl

Last week I attended a seminar organised by The British Screen Advisory Council and Intellect , the technology trade association, and hosted by the law firm SNR Denton. The panellists included Derek Wyatt, internet visionary and former politician, Dr Rob Reid, Science Policy Adviser, Which?, Nick Graham, of SNR Denton, Steve Taylor, creative mentor, Donna Whitehead, Government Affairs Manager, Microsoft, Theo Bertram, UK Policy Manager, Google, David Boyle, Head of Insight, Zeebox, and Louisa Wong, Aegis Media.

Data as oil

The event was chaired by Adam Singer, BSAC chairman, who explored the metaphor of “data as oil”. Like oil, raw data is a valuable commodity, but usually needs processing and refining before it can be used, especially by individual consumers. Like oil, data can leak and spill, and if mishandled can be toxic.

It struck me through the course of the evening, that just like oil, we are in danger of allowing control of data to fall into the hands of a very small number of companies, who could easily form cartels and lock out competition. It became increasingly obvious during the seminar that Google has immense power because of the size of the “data fields” it controls, with Facebook and others trying to stake their claims. All the power Big Data offers – through data mining, analytics, etc. – is dependent on scale. If you don’t have access to data on a huge scale, you cannot get statistically significant results, so you cannot fine tune your algorithms in the way that Google can. The implication is that individual companies will never be able to compete in the Big Data arena, because no matter how much data they gather on their customers, they will only ever have data on a comparatively small number of people.

How much is my data worth?

At a individual level, people seemed to think that “their” data had a value, but could not really see how they could get any benefit from it, other than by trading it for “free” services in an essentially hugely asymmetrical arrangement. The value of “my” data on its own – i.e. what I could sell it for as an individual – is little, but when aggregated, as on Facebook, the whole becomes worth far more than the sum of its parts.

At the same time, the issue of who actually owns data becomes commercially significant. Do I have any rights to data about my shopping habits, for example? There are many facts about ourselves that are simply public, whether we like it or not. If I walk down a public street, anybody can see how tall I am, guess my age, weight, probably work out my gender, social status, where I buy my clothes, even such “personal” details as whether I am confident or nervous. If they then observe that I go into a certain supermarket and purchase several bags of shopping, do I have any right to demand that they “forget” or do not use such observations?

New data, new laws?

It was repeatedly stated that the law as it stands is not keeping up with the implications of technological change. It was suggested that we need to re-think laws about privacy, intellectual property, and personal data.

It occurred to me that we may need laws that deal with malicious use of data, rather than ownership of data. I don’t mind people merely seeing me when I walk down the street, but I don’t want them shouting out observations about me, following me home, or trying to sell me things, as in the “Minority Report” scenario of street signs acting like market hawkers, calling out your name as you walk by.

What sort of a place is the Internet?

Technological change has always provoked psychological and political unease, and some speakers mentioned that younger people are simply adapting to the idea that the online space is a completely open public space. The idea that “on the Internet, no-one knows you are a dog” will be seen as a temporary quirk – a rather quaint notion amongst a few early idealists. Nowadays, not only does everyone know you are a dog, they know which other dogs you hang out with, what your favourite dog food is, and when you last went to the vet.

The focus of the evening seemed to be on how to make marketing more effective, with a few mentions of using Big Data to drive business process efficiencies. A few examples of how Big Data analytics can be used to promote social goods, such as monitoring outbreaks of disease, were also offered.

There were clear differences in attitudes. Some people wanted to keep their data private, and accept in return less personalised marketing. They also seemed to be more willing to pay for ad-free services. Others were far more concerned that data about them should be accurate and they wanted easy ways of correcting their own records. This was not just to ensure factual accuracies, but also because they wanted targeted, personalised advertising and so actively wanted to engage with companies to tell them their preferences and interests. They were quite happy with “Minority Report” style personalisation, provided that it was really good at offering them products they genuinely wanted. They were remarkably intolerant of “mistakes”. The complaint “I bought a book as a present for a friend on Amazon about something I have no interest in, now all it recommends to me are more books on that subject” was common. Off-target recommendations seemed to upset people far more than the thought of companies amassing vast data sets in the first place.

Lifting the lid of the Big Data black box

The issue that I like to raise in these discussions is one that Knowledge Organisation theorists have been concerned about for some time – that we build hidden biases so deeply into our data collection methods, our algorithms, and processes, that our analyses of Big Data only ever give us answers we already knew.

We already know you are more likely to sell luxury cars to people who live in affluent areas, and we already know where those areas are. If all our Big Data analysis does is refine the granularity of this information, it probably won’t gain us that many more sales or improve our lives. If we want Big Data to do more for us, we need to ask better questions – questions that will challenge rather than confirm our existing prejudices and assumptions and promote innovation and creativity, not easy questions that merely consolidate the status quo.

I recently attended a gem of an event at the British Computer Society. There were three top class speakers and great networking with interesting people afterwards. I am always a little concerned that I’m not “hard” IT enough for these events, but metadata and taxonomies are such a cross-domain topic I always end up in fascinating conversations.

Why Information Professionals should care about Big Data

There are huge skills shortages in Big Data, mainly on the understanding and interpretation side, rather than the IT angle. The feeling seemed to be that knowing how to build the machines to do the work is the easy bit, but understanding the meanings of data, avoiding the semantic traps and pitfalls, the interpretation of statistics, and above all the appreciation of risk in order to make sound judgement calls are far harder. I believe this is where information scientists and professionals, taxonomists and ontologists, librarians, archivists, and historians have a vital role to play. Our skills of understanding and interpretation are vital, so we need to step up and embrace Big Data collections as a new form of libraries and archives.

Big Data is scary because there is so much of it, it changes so fast, and it has the power to create feedback loops. An especial danger, and one that is familiar to classificationists (as Bowker and Star discuss in their book Sorting Things Out), is that interpretations, biases, and assumptions can be “hard coded” into raw data right from the point of collection. We know that the way you ask a survey question skews the answers you are likely to get, and which data sets you decide to collect and compare will lead you to certain interpretations above others. Make the wrong choices and the algorithms will send you down false alleyways while appearing to be coldly neutral and scientific (a theme that Jonah Bossewitch inspired me to think about). Miss a key factor, and you will make an apparently unarguable leap to a wrong conclusion.

For example, if you see some statistics that indicate a recent increase in clickthroughs to advertisements, you might conclude that your new mobile marketing strategy is a success. However, further investigation might reveal that most of those clickthroughs don’t result in a sale. If you look again you might see that most of the clickthroughs are followed by people immediately reverting back to the site they were on before, making it far more likely that they are just accidentally clicking on adverts. If you refine the results to identify those who were using a specific new device, it seems likely most of the hits were because people were getting used to an unfamiliar touchscreen. Your apparently successful marketing strategy might in fact just be annoying your users.

How Big Data is changing the world

The first speaker, John Morton, Chief Technology Officer, SAS UK, described how increases in processing power have meant that data capture, collection, and analysis are now taking place on a mind-boggling scale at incredible speeds. One example of how this has changed over the last decade is a processing job that used to take 167 hours can now be carried out in 84 seconds. At the same time, huge amounts of unstructured and unmanaged data that used to be uncollectable ephemera are now being stored and can be analysed. He spoke of an urgent need for “data curators” and suggested they would be the librarians of the future, as without management and curation the value of the data is lost. Organisations are typically only manging 5% of their data. Issues familiar to librarians and archivists, such as the importance of metadata, quality tracking, and provenance, are vital in a Big Data world, where the quality of your analysis depends on the quality of the data you mine. Much of this quality lies within the structure of the data and its metadata.

Big Data and Artificial Intelligence

Peter Waggett, Emerging Technology Programme Leader, IBM, talked about how Big Data analysis was used to power the “intelligence” of Watson, the computer that won the US quiz game Jeopardy, beating human contestants. The mistakes Watson made were in many ways more interesting than the questions it answered correctly, as was the speed of answering. Watson always needed some processing time, whereas humans were much faster when they just “knew” the answer.

Although Watson just seems like fun, there are many examples of how Big Data techniques can be used in practice. The retail industry is one of the leaders in Big Data analysis, using data on shopping behaviour gathered from sources like loyalty cards and online shopping orders. Some companies are now using RFID chips (e.g. clothing company Zara) to gather data about the physical movement of products.

(I wondered if retailers were leaping ahead because they can use Big Data to reap rewards with comparatively little risk. In retail, there are huge profits to be made by better tailoring stock to consumer demands, but mistakes are generally not disastrous – a product line that doesn’t sell well and has to be dropped is not a new problem and is one that business models and buyers’ expertise already make allowances for.)

Another example of Big Data analysis is the football team AC Milan, where analysing data about players’ physiology and movements has helped predict injury rates and manage players in order to minimise risks.

The Internet of Things is going to generate even more Big Data and understanding its applications in new arenas – sporting or otherwise – is going to be a huge challenge for managers of the future.

Big Data and Bias

Brijesh Malkan, Founding Partner, EGO Ventures, highlighted some issues to be resolved as we move into a Big Data age. The nature of privacy is changing as oceans of personal data sweep around the world (Facebook already knows you), and so organisations are going to need transparent ethical policies about handling such data. We have reached our “cognitive limits” with so much data to read and so information visualisation is going to be of increasing importance. Data quality also needs to be managed if data mining techniques are to be effective and if algorithmic processing is to produce sensible, useful results. Brijesh talked about “Bayesian filters” and “cognitive scrubbers” to help compensate for biases in data, whether these biases are embedded in the data capture process, in the choices of data used, in the algorithms processing the data, or ultimately in the decisions made by the humans who are interpreting the data.

He spoke of the need for more understanding of psychology, especially of groupthink, echo chambers, and risk perception. Financial markets in particular are prone to “stampede” behaviour, creating bubbles and panics in markets. Data mining of social networking can be prone to creating feedback loops and encouraging risky behaviour. He also spoke of a desperate shortage of people who understand statistics and probability, even within the scientific community.

(This reminded me of the question Patrick Lambe raised at the ISKO UK conference in the summer, asking how information professionals can do something useful for science and for society. Understanding how to interpret and capture data and account for biases, explaining how easy it is to manipulate people’s perceptions through the way information is presented, and teaching how knowledge requires judgement as well as number crunching would seem to be skills that we can offer to the Big Data world already.)

Tag Archives: Big_Data

Data: The New Black Gold?

Data as oil

How much is my data worth?

New data, new laws?

What sort of a place is the Internet?

Lifting the lid of the Big Data black box

Big Data

Why Information Professionals should care about Big Data

How Big Data is changing the world

Big Data and Artificial Intelligence

Big Data and Bias

Pages

Recently

Blogroll