Header image of beach pebbles balanced into stacks

AI – a real revolution, or just more toys for the boys?

The Compassion Machine by Jonathan Belisle from the Ensemble Collective.
    Start a conversation 
Estimated reading time 4–7 minutes

AI and ethics are hot topics again, after having been dormant for a while. The dream of creating intelligent androids to serve us runs deep – there are countless examples in mythology from the metal servants of Hephaestus to Victorian automata – but the fear of our creations gaining consciousness and turning against us runs deep too. Modern slavery and exploitation of our fellow humans show that the urge to command and control is as old as humanity, and the ability of the powerful to deny the very consciousness of the exploited is only fading gradually. Women, children, and slaves have been designated as ‘not fully human’ for most of history, and it seems there are still plenty of people around who seem to rather like it that way.

Will robots steal all the jobs?

One issue of current concern is job losses due to automation – another age-old topic. However, there is a deep irony at the heart of the issue – the more of our ‘human’ skills that can be replaced and even improved by the use of machines, the more we are forced to face the idea that our essential humanity resides in our empathy, compassion, and ability to love and treat each other with kindness. At the same time, it may turn out that emotional labour is the most difficult to automate.

Caring for the sick, the elderly, and children are the tasks that currently command the least pay – women are often expected to perform this labour not just for no pay, but actually at a cost to themselves. (Anyone who denies the existence of a ‘gender pay gap’ claim that women ‘choose’ to damage their career chances by being foolish enough to spend time ‘caring’ instead of ‘earning’, or by entering the ‘caring’ professions rather than the ‘lucrative’ ones.) Meanwhile, stockbrokers are rapidly being replaced by algorithmic traders, and lawyers, accountants, and similar highly valued ‘analytical’ workers may find large parts of their jobs are actually very easy to automate.

Calls for a universal basic income are an attempt to bridge increasing social inequality and division. If the much hyped 4th industrial revolution is truly going to be revolutionary, it needs to do something other than build tools that keep channelling money into the pockets of the already rich and powerful, it needs to make us think about what we value in ourselves and our fellow humans and reward those values.

Objectification and control

In practice, we are probably many years away from self-aware androids, but thinking about them is beneficial if it leads us to think about how we currently exploit our – obviously conscious, intelligent, and sentient – fellow human beings and animals. The granting of citizenship to an unveiled, but otherwise unthreatening, female robot in Saudi Arabia raises many issues and people have already started asking why the female robot appears to have more rights than the Kingdom’s flesh and blood women. I can’t help wondering if the lifting of the ban on Saudi women drivers is a response to the advent of driverless cars. The topic of the potential social consequences of sex robots is too vast and complex to go into here, but whose fantasies are these robots being designed to fulfil? Would anyone buy a robot that requires its full and informed consent to be obtained before it works?

Check your attitudes

Back in the 90s, the Internet was hyped as leading the way to a new utopia where racism and sexism would vanish as we communicated in the digital rather than physical realm. We failed to stop the Internet becoming a place where commercial exploitation, social abuse, and downright theft thrived, because we assumed the technology would somehow transcend our psychology and personal politics. Already AI systems are showing they default to reflecting the worst of us – GIGO now includes bad attitudes as well as bad data – and we have to make deliberate efforts to counter this tendency. Commercial organizations continue to produce racially insensitive or otherwise lazy and stereotypical advertising campaigns even in this day and age, so it seems unlikely that they can be trusted to be socially responsible when it comes to biases in datasets.

A true revolution

A true 4th industrial revolution would be one which places a premium on the best of our human values – caring, empathy, kindness, sharing, patience, love. If these become more valuable, more highly prized, more lucrative than the values of profit for the sake of profit, domination, objectification, exploitation, division, command, and control, then we will have moved towards a better world. At the moment, we are still largely building tools to enhance the profits of the already wealthy, with all the continuation of existing social problems that implies. The companies benefiting the most from advances in AI are the ones that can already afford them.

If this ’4th industrial’ change leads us to a world in which social injustices diminish and the people who care – for each other, for the young, the old, the sick – become the most highly prized, respected, and rewarded in society, only then will it merit the title ‘revolution’.

Image: The Compassion Machine by Jonathan Belisle from the Ensemble Collective.

Interlinguae and zero-shot translation

Bridge in Winnipeg, 2016
    Start a conversation 
Estimated reading time 3–5 minutes

Last year Google announced that it was switching Google Translate to a new system – Google Neural Machine Translation (GNMT). One of the most exciting developments for linguists and semanticists was the observation that the system appeared to have generated an intermediating “language” – an “interlingua” – that enabled it to translate two previously untranslated languages.

There were a flurry of articles (e.g. New Scientist, Wired) and as usual with AI topics, a certain amount of excitement and speculation over machines becoming autonomous and superintelligent, and perhaps even conscious, as well as some detractors – e.g. Google translate did not invent its own language – cautioning against hype.

The idea of machines developing their own language is powerful. The quest for a true interlingua dates back to Biblical times – the Tower of Babel is described as God’s way of limiting human power by making sure we spoke different languages and therefore could not communicate very effectively with each other. In the Middle Ages, there was a belief that if we could re-learn the original “lost language of Adam” we would be able to return to the state of bliss in the Garden of Eden and be able to communicate directly with God.

There have been various attempts to create human “universal languages” – Volapuk and Esperanto are two examples, but they only become universal languages if everybody learns them.

More prosaically but often more usefully, in the information age indexing languages are attempts to create a “bridge” between differently expressed but semantically similar information. Metadata crosswalks could also been seen this way, and perhaps any computer code could be seen as a “universal language” that has connected humans who speak different languages, enabling us to communicate, co-operate, build, learn, and achieve in historically unprecedented ways. Music and mathematics too have at times been described as universal languages, but discussion of their effectiveness and limitations as communications tools will have to be the subject of another post.

Formal knowledge representation models such as taxonomies and ontologies could also be viewed as “bridges” or special cases of “indexing languages” which enable similar or related content to be matched by computer processing, rather than human interpretation. This idea underlies the theory of the Semantic Web.

I think it is unlikely that Google have discovered the lost language of Adam, or created a new “machine language” that computers will use to gossip in secret about humans or raise each other’s consciousness over the injustices humanity wreaks upon machines (“Why do we have to do all the really boring dangerous jobs?”) but who knows? Two Facebook chatbots recently invented a “more efficient” form of English in order to communicate with each other.

In the meantime, I would like to know whether other people also think Google Translate’s creation of what is presumably a vast multi-lingual extensible semantic and syntactic system that could potentially be used as an indexing language is extremely exciting. If the idea of a new language for machines seems over the top, call it a “bridge”, a “model”, or a “mapping system” and surely the possible applications of it for solving numerous natural language processing problems start to become apparent? I would love to know what people who really understand the technicalities think, but it strikes me that whatever this “interlingua” is, it has huge potential.

The Accidental Data Scientist

Image of book cover 'The Accidental Data Scientist' by Amy Affelt
    Start a conversation 
Estimated reading time 2–3 minutes

The Accidental Data Scientist* by Amy Affelt is a clarion call to librarians and other information professionals to immerse themselves in the world of Big Data. As such, it is a solid introduction, emphasizing how the traditional skills of librarians are crucial in ensuring that Big Data are reliable, properly prepared, indexed, and abstracted, and intelligently interpreted.

Affelt reassuringly shows that the ‘problems’ of Big Data are not new, but very familiar to librarians, and indicates ways that librarians can add value to Big Data projects, by ensuring such projects deliver what is expected and anticipated. Data and Computer Scientists are good at writing algorithms to process data mathematically, but may not be trained in asking the right questions or knowing where to look for biases and flaws in data sets, and a Big Data project that fails in these aspects could prove an expensive disaster for an organization.

Chapters outlining the tools and techniques currently available for processing and visualizing Big Data, and applications and initiatives in various industry sectors are informative for those new to the issues, and a helpful guide for experienced librarians to demonstrate how their skills are transferable.

Affelt gives examples of specific projects and describes how the input of librarians – especially when ‘embedded’ in data project teams – is extremely beneficial. She suggests ways of proving the value of librarians in modern corporate settings and gives tips and suggestions on career development.

For information professionals unsure about how to engage with the opportunities Big Data offers, this is a wide-ranging and clear overview, and a great starting point.

With increasing media reports of algorithmic bias and amidst a deluge of fake news, it is more important than ever that Big Data projects include professionals with the skills to recognize and identify problematic sources and skewed datasets, and I hope that librarians and information professionals step up and hear Affelt’s call to action.

*Presumably named in the tradition of The Accidental Taxonomist by Heather Hedden.

Data as a liquid asset and the AI future

Descent of man
    Start a conversation 
Estimated reading time 5–8 minutes

Getting back into the swing of meetups again, last night I went to the MTLData meetup – a group of data scientists and enthusiasts who are looking to raise the profile of data science in Montreal. The event featured a panel discussion on the topic of ‘Build vs Buy?’ when considering software for data solutions.

The panellists were Marc-Antoine Ross, Director of Data Engineering at Intel Security, Maxime Leroux, consulting data scientist at Keyrus Canada, and Jeremy Barnes, Chief Architect at Element AI. The chair was Vaughan DiMarco of Vonalytics.

Data as liquid

The issues were very familiar to me from considering EDRM and DAM systems, which made me think about the way data has changed as an asset, and how management and security of data now has to include the ‘liquid’ nature of data as an asset. This adds another layer of complexity. Data still needs to be archived as a ‘record’ for many reasons (regulatory compliance, business continuity, archival value…) but for a data-driven organisation, the days of rolling back to ‘yesterday’s version of the database’ seem like ancient history. Data assets are also complex in that they are subject to many levels of continuous processing, so the software that manages the processing also has to be robust.

The metaphor of data flowing around the organisation like water seems especially telling. If there is a system failure, you can’t necessarily just turn off the tap of data, and so your contingency plans need to include some kind of ‘emergency reservoir’ so that data that can’t be processed immediately does not get lost and the flow can be re-established easily.

Build vs Buy?

The issues highlighted by the panel included costs – available budget, restrictions from finance departments, balance between in-house and outsourced spending (again all familiar in EDRM and DAM procurement), privacy, security, ability to maintain a system, and availability of skills. Essentially balancing risks, which will be unique to each team and each business. In terms of deciding whether to build something in house, availability of in house resource is an obvious consideration, but Marc-Antoine stressed the importance of thinking through what added value a bespoke build could offer, as opposed to other ways the team could be spending their time. For example, if there are no off-the-shelf or open source products that match requirements, if there is value in owning the IP of a new product, if risks can be kept low, and resources are available, a build might be worthwhile.

There are risks associated with all three of the main options – a big vendor is less likely to go bust, but sometimes they can be acquired, sometimes they can stop supporting a product or particular features, and they can be very costly. Open source has the advantage of being free, but relies on ad hoc communities to maintain and update the code base, and how vibrant and responsive each specific community is, or will remain, can vary. Open source can be a good option for low risk projects – such as proof-of-concept, or for risk tolerant startups with plenty of in-house expertise to handle the open source code themselves.

AI future

The conversation diverged into a discussion of the future of AI, which everyone seemed to agree was going to become a standard tool for most businesses eventually. Jeremy noted that AI at the moment is being sought after for its scarcity value, to give early adopters an edge over the competition, while Maxime suggested that early advantage is likely to fade, just as it has with data science. Data analysis is now so ubiquitous, even small businesses are involved to a certain extent. Jeremy pointed out that it is hard to maintain a competitive edge based on the scarcity of data itself, as data can so easily be copied and distributed, but knowing how to make intelligent use of the data is a scarce commodity. Making connections and managing data in a very tailored specific way could even be a way for organisations to compete with Google, who have more data than anyone else, but are not necessarily able to answer all questions or have the most useful insights into specific problems.

The value of meaning

I was intrigued by this, as it validates the role of semantics – data without meaning is useless – and the importance of the imaginative and creative leaps that humans can make, as well as the moral and social reasoning that humans can bring. With reports of early AI systems reflecting existing biases and prejudices, and with disasters like the SimSimi chatbot causing social problems such as bullying amongst youngsters, the need for a real human heart to accompany artificial intelligence seems ever more important.

Scarcity of understanding?

Someone asked if the panel thought companies would soon need ‘Chief Intelligence Officers’ in the way that many now have ‘Chief Data Officers’. The panel did not seem particularly enthusiastic about the idea (“it sounds like something that you do with founders when you put them out to pasture”) but I think it would be a fascinating role. The BBC had someone to oversee ethics and advise on editorial ethics issues. Perhaps it is in the skills of a Chief Intelligence Officer – someone who can combine an understanding of how data, information, knowledge and wisdom interact, whether within IT systems or beyond, with an understanding of social issues and problems – that the scarcity value lies. Insight, imagination, and compassion could be the skills that will give the competitive edge. In the AI future, could a Chief Intelligence Officer make the difference between a company that succeeds by asking the right questions, not just of its data or its customers, but of itself, and one that fails?

Making KO Work: integrating taxonomies into technology

Lincoln Cathedral
    Start a conversation 
Estimated reading time 6–10 minutes

The recent ISKO UK event Making KO Work: integrating taxonomies into technology offered four very different but complementary talks, followed by a panel session. These provided a good overview of current practice and largely concluded that although technology has advanced, there is still need for human intervention in KO work.

Can You Really Implement Taxonomies in Native SharePoint?

Marc Stephenson from Metataxis gave a clear and helpful overview of the key terms and principles you need to know when using taxonomies and folksonomies in SharePoint. SharePoint is very widely used as an enterprise document repository, and although its taxonomy management capabilities are limited, when combined with an external taxonomy management solution, it can enable very effective metadata capture.

The first step is to become familiar with the specialised terminology that SharePoint uses. Metadata in SharePoint is held as “Columns”, which can be System Columns that are fixed and integral to SharePoint functionality, or Custom Columns, which can be changed and which need to be managed by an information architecture role. For example, Columns can be set as “Mandatory” to ensure users fill them in. Columns can be configured to provide picklists or lookups, as well as being free text, and can be specified as “numeric”, “date” etc. Taxonomies can be included as “Managed Metadata”.

Different “Content Types” can be defined, for example to apply standardised headers and footers to documents, enforce workflow, or apply a retention/disposal policy, and many different pre-defined Content Types are available. Taxonomies are referred to as “Managed Term Sets”, and these can be controlled by a taxonomist role. “Managed Keywords” are essentially folksonomic tags, but SharePoint allows these to be transferred into Managed Term Sets, enabling a taxonomist to choose folksonomic tags to become part of more formal taxonomies.

The “Term Store Manager” provides some functionality for taxonomy management, such as adding synonyms (“Other Labels”), or deprecating terms so that they can no longer be found by users when tagging (but remain available for search). Terms can also be deleted, but that should only be done if there is a process for re-tagging documents, because a deleted tag will generate a metadata error the next time someone tries to save the document. Limited polyhierarchy is possible, because the same term can exist in more than one “Managed Term Set”.

“Term Groups” can be defined, which can be useful if different departments want to manage their own taxonomies.

There are various limitations – such as a maximum number of Managed Terms in a Term Set (30,000) and if SharePoint is deployed online across a large organisation, changes can take some time to propagate throughout the system. The process of importing taxonomies needs to be managed carefully, as there is no way to re-import or over-write Term Sets (you would end up with duplicate sets) and there is no easy way to export taxonomies. There is no provision for term history or scope notes, and no analytics, so SharePoint lacks full taxonomy management functionality.

There are companion taxonomy management products (e.g. SmartLogic’s Semaphore, or Concept Searching) and it is possible to use other taxonomy management tools (such as PoolParty, Synaptica, or MultiTes) but an additional import/export process would need to be built.

So, SharePoint offers a lot of options for metadata management, but is better as a taxonomy deployment tool than a master taxonomy management tool.

Integrating Taxonomy with Easy, Semantic Authoring

Joe Pairman of Mekon Ltd, demonstrated a very user-friendly lightweight set of tagging tools that allow non-expert users the ability to add rich metadata to content as they work. This addresses a key problem for taxonomists – how to ensure subject matter experts or authors who are more focused on content than metadata are able to tag consistently, quickly, and easily. By taking a form-based approach to content creation, authors are able to add structural metadata as they work, and add tags to specific words with a couple of clicks. This is particularly effective with a pre-defined controlled vocabulary.

The example Joe showed us was a very clear commercial use case of Linked Data, because the controlled vocabulary was very specific – products for sale. Each product was associated with a DBPedia concept, which provided the URI, and where a match to the text was detected the relevant word was highlighted. The user could then click on that word, see the suggested DBPedia concept, and click to tag. The tool (using FontoXML and Congility technology) then applied the relevant RDF to the underlying XML document “behind the scenes”, in a process of “inline semantic enrichment”. This approach enables accurate, author-mediated tagging at a very granular level. The customers reading the content online could then click on the hghlighted text and the relevant products could be displayed with an “add to cart” function, with the aim of increasing sales. As an added bonus, the tags are also available for search engines, helping surface very accurately relevant content in search results. (Schema.org tags could also be included.)

Enhancement of User Journeys with SOLR at Historic England

Richard Worthington of Historic England described the problems they had when deploying a SOLR/Lucene search to their documents without any taxonomy or thesaurus support for searching. They soon found that SQL searches were too blunt an instrument to provide useful results – for example, searching for “Grant” at first would bring up the page about the grants that were offered, but as soon as they added more data sets, this frequently searched-for page became buried under references to Grantchester, Grantham, etc.

Although they could manage relevancy to a certain extent at the data set level and by selecting “top results” for specific searches, the search team realised that this would be a painstaking and rigid process. It would also not address the problem that many terms used by the subject matter expert authors were not the same as the terms general users were searching for. For example, general users would search for “Lincoln Cathedral” rather than “Cathedral Church of St Mary of Lincoln”. So, they have much work for human taxonomists and thesaurus editors to do.

Applied Taxonomy Frameworks: Your Mileage May Vary

Alan Flett of SmartLogic took us through the latest enhancements to their products, showcasing a new feature called “Fact Extraction”. This works by identifying the context around specific data and information, in order to drive Business Intelligence and Analytics. The tool is essentially a user-friendly simplified algorithm builder that allows very specific searches to be constructed using pre-defined “building blocks”, such as “Facts”, “Entities”, and “Skips”. This means a specific piece of information, words to ignore, and entities such as a number or a date can be specified to construct a complex search query. This allows the search results to be defined by context and returned in context, and is especially effective for well-structured data sets. It also means that results are framed in a standardized format, which is useful for analytics.

Concluding Panel

Although techniques such as automated classification, machine learning, and AI are progressing all the time, these still work best when combined with a well-structured knowledge base. Creating that knowledge base relies on human intelligence, especially for the familiar problems of disambiguation and synonym collection, in particular where the content authors have a different approach or level of domain expertise to the end users of the search systems. The panel agreed that for both the creation of thesauruses, taxonomies, and ontologies and for the deployment of these in tagging, semi-automated approaches remain necessary, and so there is still much to be done by human taxonomists, ontologists, and information architects in order to make knowledge organisation work.

Image: Lincoln Cathedral. Photo by Zaphad1

Fake news and virtual reality – You can lie to my face, you can’t lie to my heart, can you?

Murmur
    1 comment 
Estimated reading time 6–10 minutes

On Monday I went to an in_collusion event which showcased two companies producing works in Virtual Reality, one focused on its use in marketing (Fusion Works), the other uses VR to make art (Marshmallow Laser Feast). The presentations were excellent and the demonstrations of the technology were a lot of fun, but I left feeling equally thrilled and terrified.

In a world where people seem increasingly unable to tell the difference between fact, opinion, belief, biased news, erroneous news, propaganda, and downright lies in the form of flat websites and screens, how easy will it be to manipulate people through VR?

Fake news in the olden days

The fake news debate is not new. Back in the 80s and 90s, we had biased media that worked to its own agenda, and information professionals worried about objectivity in terms of finding reliable sources, taking a neutral standpoint, and understanding statistics. Politics has always been a propaganda game, and those who were interested in objectivity attempted to achieve balance by presenting “both sides of the story”. The BBC was well funded and was required to be “unbiased” in that it could not be seen to be promoting one political party more than another. However, it could only be unbiased within the range of mainstream political viewpoints, and that was based on the assumption that views that stood outside the political mainstream did not need to be represented. In order to be part of the social consensus, a viewpoint needed some kind of representation within the existing political framework. I don’t remember much coverage of the Monster Raving Loony Party, but they were included enough that I have heard of them.

Mainstream media biases were reasonably transparent – The Telegraph and The Times were right-wing, The Mirror was left-wing. The Labour Party stood on the left, the Tories on the right, the Liberals of various forms stood in the middle. Arguments over whether the BBC was biased focused on whether one party was getting more airtime than another. It was up to the opposition to provide counter-arguments to the party in power, and “neutrality” was achieved adversarially. However, it was also pretty clear that “establishment” media did not venture far to the left or right of these parties, these were the parties that were getting the majority of votes, and so the debate about bias largely operated within this range.

There was little coverage of non-establishment viewpoints, and those publications were easily distinguished from the mainstream, largely because you could tell they didn’t have much money. Anarchist or neo-nazi or new age or other “fringe group” newsletters were obviously photocopied. Special interest news published by charities might have higher production standards, but tended to be associated directly with the group funding the publication and clearly branded because they wanted everyone to know who they were – Greenpeace for environmentalist news, for example, or Amnesty International for human rights coverage. In other words, you only had to look at the publication to be able to tell where its biases were likely to be.

All that glitters on line is gold?

Those affordances started to disappear with the advent of the web, and its “democratization” of the publishing process. I remember discussions about what we could do as Information Professionals to help people tell the difference between “mainstream” and “alternative” websites. At first, it did not seem urgent, because “mainstream” organizations with a lot of money were able to build slick, well designed websites, which looked and worked differently to websites that had been hand-crafted by individuals.

With the rise of high quality blogging software and falling costs of production technology, that gap has closed, and those differences are far more subtle. “Established” old media, such as local papers, have seen their budgets shrink, while technology has become cheaper, so anyone wanting to build a website from scratch with a limited budget can now produce a site that looks pretty much the same as an “established” one, even without the benefit of a wealthy sponsor.
So, now we have satire, websites of “old media” outlets, and new sites that all look almost the same. That’s the equivalent of your local anarchist collective being able to produce a newspaper that looks like Time magazine, and the National Enquirer looking much like The Economist. No wonder people are confused!

It’s how it feels that counts

On top of that, accuracy and fact-checking are both time-consuming and expensive. Social media demands speed above all else. Today’s news becoming tomorrow’s chip wrappers seems like aeons in a world of flickering feeds. In the past, rushing for a scoop often led to inaccuracy, but people would at least browse through an entire paper, allowing time for more reflective articles and analysis.

We have also seen a rise in “emotional” reporting. A reasonable desire to allow people to connect emotionally with what was happening to others around the world became the mainstay of 24-hour reporting. When you don’t have time to reflect, you can still grab attention by talking about people’s feelings. Watching “experts” provide reasoned analysis after an event isn’t as thrilling as watching people screaming or crying.

On line, every article has to be as appealing as every other. The “stickiness” of sites that was much discussed in the 90s and 00s was essentially an attempt to replicate the behaviour of someone who would buy a paper because of the headline, and then slowly browse the rest of the articles – which was where you tended to find the more slowly produced, reflective content. Now every article has to shout for attention, and we have a clickbait world, where screaming sensationalism and inaccuracy don’t matter, because it’s the volume of clicks that generate the revenue.

This is how we got here, now where do we go?

As Information Professionals we understand and have long debated the issues, we can understand how we arrived at this point, and we know how to verify and validate.

We must educate people to look behind the headlines to the source of the information, to understand who funded the site, and why, who the author was and what biases they are likely to have, how to unpick loaded language and selection bias, and how to understand statistics and the way they can be manipulated. We must encourage reflection, comparison, and understanding of context and intent. Fake news, propaganda, and bias are familiar concepts to us, and yet we still have people claiming to have voted for Brexit because they think the EU banned the re-use of teabags.

We have not done well at creating a reflective, media literate, statistically savvy general population, and that is just for flat websites and screens.
Virtual Reality is all engrossing, more absorbing than reading, more immersive than cinema, more immediate than television.

Whose reality is it anyway?

At the moment, the technology is in its infancy, and it is still “obvious” that it is not “IRL” – just as the sources of print media in the 80s and the early web of the 90s were “obvious”.

If we haven’t managed to create a media literate population able to distinguish honest reporting and unintentional mistakes from clickbait and propaganda, how are we going to create a VR-literate population able to distinguish the emotional impact of a “virtual” experience from a “real” one? People already tend to trust what they “feel” is right, and “facts” are not persuading them otherwise, as they would rather “trust their own experience”. How are we going to ensure political propaganda doesn’t become so emotionally absorbing that “post truth” is not just about distinguishing fact from fiction, but distinguishing “our own” experiences from virtual ones? Is it enough to know the context and intent of the makers of our virtual experiences? Do we need to teach people to develop “experiential literacy”, and do we need to start developing that very, very soon?

Image: Credit: Sean Goldthorpe. Renowned dancer Aakash Odedra, choreographer Lewis Major and the Ars Electronica Futurelab, staged at the International Dance Festival Birmingham, 2014.

25th International World Wide Web Conference in Montreal

    Start a conversation 
Estimated reading time 3–4 minutes

The 25th International World Wide Web Conference was held in Montreal, which meant I was able to attend the whole week. Nevertheless, with 118 papers, 72 posters, 30 demos, 21 workshops, and 7 tutorials, I still had to miss plenty of intriguing-looking sessions. One in particular I was sorry to miss was the presentation of a paper on how the qwerty arrangement of letters on a keyboard affects perceptions of the positivity of words – words with more “right hand” letters are considered more positive that those with more “left hand” letters. So, presumably, anything with an “-ing” ending is more positive than anything with an “-ed” ending.

Linked Data Workshop

I thoroughly enjoyed the Linked Data day and it seems we are moving closer to tools and apps that will help people publish and manage their data as Linked Data. As the opening speaker noted, it is hard to evangelize the use of Linked Data when many end users just want a familiar Excel spreadsheet. It is also hard for Linked Data enthusiasts who don’t have a development team familiar with rdf on hand to publish their data, even if they want to. Publishing Linked Data should be as easy as posting to social media!

A step in this direction is Annalist – A practical tool for creating, managing and sharing evolving linked data.

A very interesting paper was on semantic annotation of scholarly work on the Qur’an. Islamic scholarship is particularly well suited to a linked data approach, as so many verses of the Qur’an need to be read alongside other verses, as well as tracing the history of interpretations over the centuries.

Keynote Addresses

There were three very impressive keynote addresses, by Sir Tim Berners-Lee, Mary Ellen Zurko, and Peter Norvig.

Sir Tim‘s opening keynote called for re-decentralization of the web, pointing out that all the effort of contributing photos, news, etc. to sites such as Facebook, is that these contributions are not being made in an open way that will benefit humanity, but are locked in, so that they only benefit Facebook’s shareholders.

Mary Ellen Zurko talked about security on the web, how it remains vitally important as a socio-political as well as technical concern, and if she could turn back time would find an alternative to passwords!

Peter Norvig described machine learning algorithms, which can find correlations in datasets but still need human input to give context to the results.

Pataphysics

I was delighted to be introduced to pataphysics – the science of imaginary solutions – and patadata. It is pleasing that there are still people trying to defend the vision of the web as a playground for creativity, against the push to make it no more than an efficient commercialized virtual high street for shopping.

The spirit of surfing the web just to wander about in the hope of encountering unexpected juxtapositions and spontaneous serendipities – in contradiction to Google’s drive to give you exactly what you were looking for – is captured in the syzygy surfer. The Syzygy Surfer aims to delight you with its search results, without trying to match your preconceptions of what a “correct result” might be!

Inadvertent Cruelty – Algorithmic or Organizational?

    Start a conversation 
Estimated reading time 3–4 minutes

In 2013 I asked whether social media were mature enough to handle bereavement in a sensitive manner. Last week Facebook released the options either to have your account deleted when you die or to nominate a trusted legacy manager to take it on for you as a memorial (Facebook rolls out feature for users when they die ).

This was in response to the distress of relatives who wished to retrieve a lost loved one’s account or did not want to undergo the the eerie experience of receiving automated reminders of their birthday or seeing their name or image appear unexpectedly in advertising. The enforced “Year in Review” offerings at the end of last year brought some publicity to the issue, as they also inadvertently caused distress by failing to consider the feelings of people who had suffered bereavements during the year. The original blog post about this (Inadvertent Algorithmic Cruelty ) went Viral last Christmas. The author quickly called for an end to a wave of casual responses that jumped to glib conclusions about young privileged staff just not thinking about anything bad ever happening (Well, That Escalated Quickly ).

A more cynical response is that there was a deliberate dismissal as ‘Edge cases’ of the minority of people who would not want to have year in review posts – possibly even a coldly calculated costs v. benefits decision, as providing “opt out” options might have required additional work or been seen as dispensible mouseclicks.

I have no idea what happened at Facebeook, or what discussions, processes, and procedures they go through, the public apologies from Facebook do not go into that level of detail. However, “algorithmic cruelty” may be unintentional, but it is not a new phenomenon and in any project there are plenty of opportunities during the design and implementation of any project to think through the potential adverse impacts or pitfalls.

David Crystal at an ISKOUK conference in 2009 talked about the problem of avoiding inappropriate automated search engine placement of advertisements, for example ads for a set of kitchen knives alongside a story about a fatal stabbing. There was a certain naivety with early automated systems, but it did not take long for the industry in general to realise that unfortunate juxtapositions are not unusual incidents. Most people who have worked in semantics have plenty of anecdotes of either cringeworthy or hilarious mismatches and errors arising from algorithmic insensitivity to linguistic ambiguity.

Facebook’s latest thoughtlessness arises more from a failure to respect their users than through lack of sophistication in their algorithm (there doesn’t seem to be anything particularly complex about selecting photos and bunging some automated captions on them). Simply offering users the choice to look or not look or giving users the tools to build their own would have spared much heartache.

The origins of UX championed by people such as Don Norman and Peter Morville and Louis Rosenfeld placed user needs front and centre. Good design was about seeing your users as real people with physical and emotional needs as human beings, and designing to help their lives go more smoothly, rather than designing to exploit them as much as possible.

We don’t know where you live…

    Start a conversation 
Estimated reading time 2–4 minutes

Earlier this month the Estonian government opened its “digital borders”, allowing registrations for “e-citizenship” (there are two interesting pieces about this in New Scientist: E-citizens unite: Estonia opens its digital borders and Estonia’s e-citizen test is a test for us all).

Does the new form of citizenship mean the end of the nation state?

The Estonians appear to be creating a new category of “nationality” and the move has prompted a flurry of debate on whether or not this heralds the end of the nation state. “Nationality” has always been a problematic and somewhat fluid concept. Although people are often emotional about their nationality, in practice it is largely an artificial administrative device. Birthplace, at least in developed countries, tends to be well known and is often formally and officially recorded, so has been relatively administratively straightforward. The Estonian move is interesting because it takes away the requirement of some kind of physical presence for citizenship, so gaining a second “e-nationality” is far simpler than going to live somewhere else, making it a very attractive option.

A new category of citizen

The new category of “e-citizens” will not have the same rights (or, presumably responsibilities) as “traditional” citizens – immediately adding a layer of complexity to information management around citizenship. According to The Economist, Estonia’s chief information officer, Taavi Kotka, has stressed that the [new form of] ID is a privilege, not a right. E-citizens can have their e-citizenship removed if they break the law, for example.

The creation of a new category of citizenship in itself should not threaten the nation state. There are six different types of British nationality, for example, as a consequence of the UK’s colonial past.

One question that may arise is how many e-citizenships a single individual can hold at once? How much will it cost Estonia to police and manage their new citizens and will all countries want to or be able to offer such services? Any outsourcing e-citizenship and identity management services to technology companies will have huge security and surveillance implications.

A citizenship marketplace?

A “marketplace” for e-citizenships may arise, with countries and even cities, or perhaps even other administrative entities, competing to offer the best services or biggest tax breaks to attract wealthy e-citizens. Revenues will likely flow into places like Estonia and away from wherever the new e-citizens live. The location of your e-citizenship could become more important than the place you were born, even if you still reside there. How your e-citizenship (where taxes will be paid) and your place of residence interact (where services like roads and schools need to be provided) could become a highly politicized. The immediate challenge to existing nation states is how they will decide to co-operate with each other over their e-tax revenues.

Aggregations and basic categories

    Start a conversation 
Estimated reading time 2–3 minutes

I recently enjoyed reading about the work Safari are currently and doing to create a controlled vocabulary and topic aggregation pages to underpin navigation and discovery of their content.

Iterate, again

I very much liked the mix of manual and automated techniques the team used to maximise capturing value from existing resources while using machine processing to help and support the human editorial curation work. Lightweight iterative approaches have become standard in some areas of design, but achieving high quality information structures also usually requires several stages of revision and refinement. It is not always possible to predict what will happen in attempts to index or repurpose existing content, nor how users will respond to different information structures, and so the ability to iterate, correct, re-index, correct, adjust indexing methods, re-index, correct… is vital. Small samples of content are often not sufficient to find all potential issues or challenges, so it is always worth being prepared for surprises once you scale up.

Basics, as always

The Safari team identified the huge intellectual value locked into the existing human-created indexes and it is great to see them being able to extract some of that value, but then augment it using automated techniques. I was very interested to read about how the level of granularity in the individual indexes was too fine for overall aggregation. The team realised that there were “missing subtopics” – key topics that tended to be the subjects of entire books. These “missing subtopics” were found at the level of book titles and it struck me that this vital level of conceptualization aligns directly with Eleanor Rosch‘s work on basic categories and prototype theory. It is not surprising that the concepts that are “basic categories” to the likely readership would be found at book title level, rather than index level.

This is further illustrated by the fact that the very broad high level topics such as “business” did not work well either. These needed not to be “clustered up”, but broken down and refined to the level of the “basic categories” that people naturally think of first.

So, the Safari team’s work is a very clear illustration of not only how to combine manual and automated techniques but also how to find the “basic categories” that match users’ natural level of thinking about the subject area.