Assumptions, mass data, and ghosts in the machine

3rd November, 2010 Fran 1 comment
Estimated reading time 3–5 minutes

Back in the summer, I was very lucky to meet Jonah Bossewitch (thanks Sam!) an inspiring social scientist, technical architect, software developer, metadatician, and futurologist. His article The Bionic Social Scientist is a call to arms for the social sciences to recognise that technological advances have led to a proliferation of data. This is assumed to be unequivocably good, but is also fuelling a shadow science of analysis that is using data but failing to challenge the underlying assumptions that went into collecting that data. As I learned from Bowker and Star, assumptions – even at the most basic stage of data collection – can skew the results obtained and that any analysis of such data may well be built on shaky (or at the very least prejudiced) foundations. When this is compounded by software that analyses data, the presuppositions of the programmers, the developers of the algorithms, etc. stack assumption on top of assumption. Jonah points out that if nobody studies this phenomenon, we are in danger of losing any possibility of transparency in our theories and analyses.

As software becomes more complex and data sets become larger, it is harder for human beings to perform “sanity checks” or apply “common sense” to the reports produced. Results that emerge from de facto “black boxes” of calculation based on collections of information that are so huge that no lone unsupported human can hope to grasp are very hard to dispute. The only possibility of equal debate is amongst other scientists, and probably only those working in the same field. Helen Longino’s work on science as social practice emphasised the need for equality of intellectual authority, but how do we measure that if the only possible intellectual peer is another computer? The danger is that the humans in the scientific community become even more like high priests guarding the machines that utter inscrutable pronouncements than they are currently. What can we do about this? More education, of course, with the academic community needing to devise ways of exposing the underlying assumptions and the lay community needing to become more aware of how software and algorithms can “code in” biases.

This appears to be a rather obscure academic debate about subjectivity in software development, but it strikes to the heart of the nature of science itself. If science cannot be self-correcting and self-criticising, can it still claim to be science?

A more accessbile example is offered by a recent article claiming that Facebook filters and selects updates. This example illustrates how easy it is to allow people to assume a system is doing one thing with massed data when in fact it is doing something quite different. Most people think that Facebook’s “Most Recent” updates provides a snapshot of the latest postings by all your friends, and if you haven’t seen updates from someone for a while, it is because they haven’t posted anything. The article claims that Facebook prioritises certain types of update over others (links take precedence over plain text) and updates from certain people. Doing this risks creating an echo chamber effect, steering you towards the people who behave how Facebook wants them to (essentially, posting a lot of monetisable links) in a way that most people would never notice.

Another familiar example is automated news aggregation – an apparently neutral process that actually involves sets of selection and prioritisation decisions. Automated aggreagations used to be based on very simple algorithms, so it was easy to see why certain articles were chosen and others excluded, but very rapidly such processing has advanced to the point that it is almost impossible (and almost certainly impractical) for a reader to unpick the complex chain of choices.

In other words, there certainly is a ghost in the machine, it might not be doing what we expect, and so we really ought to be paying attention to it.

Top

Librarians rock! Metadata will take over the world

9th September, 2010 Fran Start a conversation
Estimated reading time 3–4 minutes

The last presentation at the DAM conference back in June was by a very interesting DAM specialist – Mark Davey – of Cliffe Associates and Digital Asset Mangement.org.uk who spoke engagingly about the increasing pervasiveness of metadata and how it is opening up a whole new world of connections and possibilities.

Internet of Things

He talked about the emerging Internet of Things and how this will – essentially – be enabled by metadata. The more sophisticated our metadata management, the more use we will be able to make of links and connections. The semantic web is a bold attempt to link information resources, but if the semantic web and the Internet of Things can be linked, the web leaps out into the real world in unprecedented ways and this, according to Davey, is why librarians rock! It is the people who understand how metadata works who will be forging the links that will create an integrated Internet of Services. A smart fridge could send you text messages telling you what you need to buy when you are in the shops (although I worry about mine scolding me for eating too many cakes!), but there are many more business-focussed applications. For example, RFID tags are being used by Hollywood prop and costume hire companies to help them keep track of and retrieve stock. The location data is useful in that case to physically find missing items, but knowing where customers take their purchases may have all sorts of interesting implications for marketing. A car rental company could plot on a map where people have driven their cars. This would be a fun image in a brochure, but could support business decisions such as where to open branches or whether to provide more small cars for city trips or bigger cars for long distances.

The current Internet is not fit for purpose, because nobody agreed any standards, but if we can start developing standards, we can set digital assets and metadata free so that they can interact with each other in a machine-brokered way. This could be incredibly powerful – everything could end up everywhere! Ways to monetise this new world could involve micro payments – you are watching a video, you see a product you like, you click on it and behind the scenes your credit card or online account is charged and the product is shipped to your door. Meanwhile, the sales team know that you bought that product because it was in that particular scene in the film, they know where you live, and they know what else you have been buying.

Hive minds

This all seems rather scary. On the one hand, people are already referring to the distributed cognition aspects of social networking sites as the hive mind – we are, perhaps, creating some kind of Borg-like merging of identities into one big digital stew. Individuals could find themselves subsumed into a digitally imposed conformity to some overriding norm due to the panoptic, big brotherish combination of AI, RFID and CCTV, especially when you factor in neuromarketing and automated classification of behaviour (which will be subjects of future posts). On the other hand, people have always worried about the homogenising effects of anything that brings people together and the misuse of surveillance.

(I think in practice, metadata alone is just a tool that can be used for good or ill, like a knife. The Internet of Services is likely to be patchy, with some very worthy projects – applications in medicine where scarce resources like organs for transplants needed to be tracked and delivered and surgeons kept informed – and some just for fun – like the Tales of Things Project, where you can add memories, as metadata, to objects.)

Top

‘We Like Lists Because We Don’t Want to Die’

10th January, 2010 Fran Start a conversation
Estimated reading time 2–3 minutes

I heard Umberto Eco lecture on the search for a perfect language about 20 years ago and still find myself referencing him (trying to create a taxonomy that suits everyone would seem to be a similar quest). The lectures were nothing to do with my course really, so I benefited from that serendipitous knowledge discovery that just happens when you have time and space to explore ideas. So I was pleased when a few weeks ago this interview with Eco in der Spiegel happened upon me in the twittersphere (what’s the protocol for referencing tweets?). In the interview, Eco asserts that ‘We Like Lists Because We Don’t Want to Die’ .

It’s arguable that we do most things because we don’t want to die, but I was struck by the depiction of how fundamental the urge to collect and classify is to culture. At the LIKE dinner in early December, Cerys Hearsy said “we like hierarchies. We understand how they work” and she was talking about modern records management. Jan Wyllie in Taxonomies: Frameworks for Corporate Knowledge points out that taxonomies have been used for millennia (something I also reference frequently). Perhaps we like dualities because our brain has two hemispheres and we dream of a taxonomy of everything because then we would have conquered infinity and death itself, but such ideas are way beyond what I can speculate sensibly about. What I can say is that lists and taxonomies have been useful for so long that anyone who bets they are going to vanish anytime soon is facing very long odds. We will create them differently as technology advances, and we will manage without them in many situations where they would be helpful (if New Scientist had a taxonomy, I might have found the article about duality and the brain), but when we really need to be sure, we will create them.

Top

World Audio Visual Archives Heritage day

2nd November, 2009 Fran 1 comment
Estimated reading time 2–2 minutes

I went to an interesting event last Monday night for UNESCO World Audio Visual Archives Heritage day, held at BAFTA in London.

Professor John Ellis (Department of Media Arts, Royal Holloway, University of London) talked about the growing use of TV archives, particularly news footage, in academia, pointing out that over time such material becomes increasingly valuable in such diverse areas as physiology – for example in studying the effects of ageing by analysing footage of presenters and actors who have had long careers, and town planning, as footage can reveal the buildings that previously occupied a site being considered for redevelopment.

As UK law permits academic institutions to record and keep TV and radio broadcasts for purely educational purposes, a database of material has been collected. Academia remains currently a verbal rather than visual culture, but this seems to be changing. All politicians, for example, are now so TV literate that to study them without reference to their TV appearances would be strange.

Fiona Maxwell (Director of Operations at ITV Global Entertainment), then talked about the painstaking restoration of the 1948 film The Red Shoes. She provided lots of technical details about removing mould and correcting registration errors, but also showed “before and after” clips so we could see the huge improvements.

Top

Intelligent Object Recognition

10th July, 2009 Fran Start a conversation
< 1 minute

The Future of the iPhone: Intelligent Object Recognition looks like fun. I suspect it will work better with GPS and possibly RFID as those are a lot more straightforward than image-based retrieval.

Top

The Internet of things

29th June, 2009 Fran 3 comments
Estimated reading time 2–3 minutes

Internet of Things — An action plan for Europe is an EU document describing the EU’s response to “The Internet of Things” (IoT), as technologies such as
RFID, Near Field Communication (NFC), and wireless sensor/actuators now allow objects to be tagged and linked to information.

The EC is financing “research projects in the area of IoT, putting an emphasis on important technological aspects such as microelectronics, non-silicon based components, energy harvesting technologies, ubiquitous positioning, networks of wirelessly
communicating smart systems, semantics, privacy- and security-by-design, software emulating human reasoning and on novel applications.”

As well as obvious information management issues, there are interesting implications for privacy and security. For example, will the IoT reduce property crime or just create a black market for false tags or fake URIs and geolocators? Will criminals set up their own systems to track shipments of contraband? Will we get “object identity theft” with contraband labelled as legitimate goods? This seems to me to be a categorisation issue.

It might be fun to be able to tag my stuff with my own folksonomic labels to help me sort my house out or pack to go on holiday, and then make sure I don’t leave things in hotel rooms, but I suspect it might waste more time than it saves!

Another issue is how long before we extend this kind of tracking to ourselves? A friend said to me the other day that we should all have our own URI, which would save having to update our records when people change their phone numbers, email, addresses, etc. Add that to the geolocation tracking that is already happening, and no-one will get to be anywhere without it being recorded. Is that really useful, or scarily Big Brotherish?

There is a lovely metpahor of “Favela chic” (subversive, non-commerical) versus “Gothic High Tech” (repressive regime) in Twitter and The Web of Flow: Talking with Stowe Boyd & Bruce Sterling about Microsyntax, Squelettes, Favela Chic and the State of Now which I found via Open Intelligence (on Twitter!).

Top

Understanding Computers and Cognition

6th June, 2009 Fran Start a conversation
< 1 minute

Understanding Computers and Cognition: A New Foundation for Design is another classic I should almost certainly have read ages ago. It gives very straigthforward explanations of why language and cognition are complex social processes and how this presents huge challenges for designers and for the whole field of AI.

I also enjoyed the wonderful predictions that by 1988 we would have “thinking computers” and advertisements from 1982 offering “programs that understand you so that you don’t have to understand them”. Technology progresses, hype remains a constant!

It is also interesting that “not having to understand” was promoted, rather than “being easy to understand”, even back then. I’ve always thought of usability about being helpful and increasing clarity, rather than about encouraging people not to think at all.

Top

Wave goodbye to Twitter and Facebook?

30th May, 2009 Fran 1 comment
< 1 minute

Is Google Wave a Twitter Killer? heralds the new kid soon to be on the block. Facebook seem to fall into the “so over” category a while back, and now Twitter has hit the mainstream, clearly it is next to go.

Google Wave: Five Things You Must Know says a bit more about what Wave will do.

The combination of easy transfer of time-invested work – such as carefully written documents – with instant communication should appeal to businesses but will present some records management challenges, I’m sure. I can’t decide whether we need social media coalescence – just give me everything in one place – or clearer fragmentation – Wave for work, Facebook for family, etc. to help with information overload.

UPDATE: Amplified clip on Wave.

Top

Tools to analyse weak signals

19th May, 2009 Fran Start a conversation
< 1 minute

I liked the way this Pasta&Vinegar post highlighted the different information sources used to generate different measures of technology adoption. It also reminded me of Dave Snowden‘s emphasis on the importance of detecting weak signals. At the “prophecy/fantasy” stage the important signals will inevitably be weak, and surrounded by a lot of noise. Spotting trends once they have happened is one thing, but the prediction game is quite different.

Top

Communities of Practice

9th April, 2009 Fran Start a conversation
Estimated reading time 3–5 minutes

I found Communities of Practice (CoP) by Etienne Wenger to be one of those strange books that lots of people told me I must read – and it is relevant to taxonomy work (although this post digresses) – but when I did read it, it all seemed so totally obvious I could hardly believe it had taken until the 1980s to be formulated. Barbara Rogoff and Jean Lave also pioneered the thinking, but I feel sure the ideas must date back at least to medieval trade guilds. It is one of the odd features of academia that sometimes the obvious has simply not been noticed and it is the recognition of the obvious that is revolutionary.

The core ideas are that we don’t just learn about doing something or even how to do something, we learn to be a person that does those things, and this shapes our identities. So, I can get my editorial assistants to read Judith Butcher on copy editing to teach them about editing, I can give them practical exercises so they learn how to copy edit, but it is only after they have been given real copy editing work, amongst other copy editors, that they experience how copy editors behave, and so learn how to be copy editors. Learning is therefore a continuous lifelong process.

In the UK there has traditionally been a divide between learning about (academic) and learning how (vocational), with learning to be happening outside the educational system, in workplaces (e.g. via apprenticeships). Wenger emphasises the need to encourage learning to be, and of course it is vital, but politically it worries me that too much responsibility for this is currently falling on academia and not enough on employers (I’m probably misrepresenting Wenger here). As an employer I think I ought to invest in training new staff (and in ongoing staff development), mainly because I can train staff to be exactly the way they need to be in the specific employment context. There is no practical way that a national education system could be so specific, unless it only caters to a handful of big corporations, which don’t need the help or the additional social power. On the other hand, I really don’t want to have to teach new staff lots of learning about – grammar and spelling, for example – that can be taught perfectly well in the classroom.

I think a civilised society should be willing to pay collectively for some essentially uncommercialised public spaces (e.g. universities) where people can just think in order to get better at thinking. A vocational element is great (I have personally enjoyed and benefited from the vocational aspects of my course) but part of my motivation for returning to university was to have time to explore questions and experiment with ideas without limiting myself to only those that I could show in advance would bring in some cash.

How does all this relate to taxonomy work? A taxonomy may be needed within a single community of practice, in which case recognising the user group as a CoP may help make sense of the project and the terminology required. Conversely, a taxonomy may need to be a boundary object between CoPs, perhaps even linking numerous CoPs together. By recognising and identifying different CoPs in an organisation, a taxonomist can get a picture of the different dialects and practices that exist and need to be taken into account.

A new taxonomist also needs to learn to be a taxonomist, and the taxonomy communities of practice (both specific and theoretical) already out there play a vital role in this process.

Top

« Previous 1 2 3 4 5 6 Next »

Category Archives: culture