Header image of beach pebbles balanced into stacks

We don’t know where you live…

    Start a conversation 
Estimated reading time 2–4 minutes

Earlier this month the Estonian government opened its “digital borders”, allowing registrations for “e-citizenship” (there are two interesting pieces about this in New Scientist: E-citizens unite: Estonia opens its digital borders and Estonia’s e-citizen test is a test for us all).

Does the new form of citizenship mean the end of the nation state?

The Estonians appear to be creating a new category of “nationality” and the move has prompted a flurry of debate on whether or not this heralds the end of the nation state. “Nationality” has always been a problematic and somewhat fluid concept. Although people are often emotional about their nationality, in practice it is largely an artificial administrative device. Birthplace, at least in developed countries, tends to be well known and is often formally and officially recorded, so has been relatively administratively straightforward. The Estonian move is interesting because it takes away the requirement of some kind of physical presence for citizenship, so gaining a second “e-nationality” is far simpler than going to live somewhere else, making it a very attractive option.

A new category of citizen

The new category of “e-citizens” will not have the same rights (or, presumably responsibilities) as “traditional” citizens – immediately adding a layer of complexity to information management around citizenship. According to The Economist, Estonia’s chief information officer, Taavi Kotka, has stressed that the [new form of] ID is a privilege, not a right. E-citizens can have their e-citizenship removed if they break the law, for example.

The creation of a new category of citizenship in itself should not threaten the nation state. There are six different types of British nationality, for example, as a consequence of the UK’s colonial past.

One question that may arise is how many e-citizenships a single individual can hold at once? How much will it cost Estonia to police and manage their new citizens and will all countries want to or be able to offer such services? Any outsourcing e-citizenship and identity management services to technology companies will have huge security and surveillance implications.

A citizenship marketplace?

A “marketplace” for e-citizenships may arise, with countries and even cities, or perhaps even other administrative entities, competing to offer the best services or biggest tax breaks to attract wealthy e-citizens. Revenues will likely flow into places like Estonia and away from wherever the new e-citizens live. The location of your e-citizenship could become more important than the place you were born, even if you still reside there. How your e-citizenship (where taxes will be paid) and your place of residence interact (where services like roads and schools need to be provided) could become a highly politicized. The immediate challenge to existing nation states is how they will decide to co-operate with each other over their e-tax revenues.

Aggregations and basic categories

    Start a conversation 
Estimated reading time 2–3 minutes

I recently enjoyed reading about the work Safari are currently and doing to create a controlled vocabulary and topic aggregation pages to underpin navigation and discovery of their content.

Iterate, again

I very much liked the mix of manual and automated techniques the team used to maximise capturing value from existing resources while using machine processing to help and support the human editorial curation work. Lightweight iterative approaches have become standard in some areas of design, but achieving high quality information structures also usually requires several stages of revision and refinement. It is not always possible to predict what will happen in attempts to index or repurpose existing content, nor how users will respond to different information structures, and so the ability to iterate, correct, re-index, correct, adjust indexing methods, re-index, correct… is vital. Small samples of content are often not sufficient to find all potential issues or challenges, so it is always worth being prepared for surprises once you scale up.

Basics, as always

The Safari team identified the huge intellectual value locked into the existing human-created indexes and it is great to see them being able to extract some of that value, but then augment it using automated techniques. I was very interested to read about how the level of granularity in the individual indexes was too fine for overall aggregation. The team realised that there were “missing subtopics” – key topics that tended to be the subjects of entire books. These “missing subtopics” were found at the level of book titles and it struck me that this vital level of conceptualization aligns directly with Eleanor Rosch‘s work on basic categories and prototype theory. It is not surprising that the concepts that are “basic categories” to the likely readership would be found at book title level, rather than index level.

This is further illustrated by the fact that the very broad high level topics such as “business” did not work well either. These needed not to be “clustered up”, but broken down and refined to the level of the “basic categories” that people naturally think of first.

So, the Safari team’s work is a very clear illustration of not only how to combine manual and automated techniques but also how to find the “basic categories” that match users’ natural level of thinking about the subject area.

Semantic Theatre gets practical

    Start a conversation 
Estimated reading time 2–2 minutes

I have started to look into the CIDOC Conceptual Reference Model for cultural heritage metadata as part of my investigation of the concept of Semantic Theatre.

An events-based approach is used in a lot of ontological modelling. Thanks to Athanasios Velios, I learned that bookbinding can be broken down into a sequence of events, and this is an obvious route to try when thinking about how to model a performance event.

I think there is potential for relating the “objects” in the play – the performers as well as set, props, etc, – to concepts within the play. So far, I have been focusing mainly on modelling relationships between ideas within the script (e.g. lines where this character uses the ocean as a metaphor for life) and possibly comparing across scripts (e.g. which lines reference King Lear) but it would be interesting to include props and actors as well (e.g. in which scenes is a clock used as a reference to death). The use of a prop could easily be modelled as a distinct event within a play, and this would facilitate relating literary and metaphorical ideas to the object rather than just to the words in the script.

The play itself – Ocean Opera – will be performed at the Montreal Fringe Festival in June.

Adventures in Semantic Theatre

ship sailing into the full moon on the horizon
    2 comments 
Estimated reading time 5–8 minutes

I have been investigating the idea of using semantic techniques and technologies to enhance plays, along with the Montreal Semantic Web meetup group. There have been far fewer Semantic Web projects for the humanities than the sciences and even fewer that have examined the literary aspects of the theatre. Linked Open Data sets associated with the theatre are mostly bibliographic, library catalogue metadata, which treat plays from the point of view of simple objective properties of the artefact of a play, not its content: a play has an author, a publisher, a publication date, etc. Sometimes a nod towards the content is made by including genre, and there has been work on markup of scripts from a structural; perspective – acts, characters, etc. There are obvious and sound reasons for these kind of approaches, meeting bibliographic and structural use cases (e.g. “give me all the plays written by French authors between 1850-1890”; “give me the act, scene, and line references for all the speeches over ten lines long by a particular character”; “give me all the scenes in which more than three characters appear on stage at once”).

Modelling literary rather than physical connections

Once we started discussing at the meetups how we could model the content itself, especially in a qualitative manner, we quickly became embroiled in questions of whether or not we needed to create entire worldviews for each play and how we could relate things in the play to their real world counterparts.

One of the plays we are working on – Ocean Opera by Alex Gelfand (to be performed at the Montreal Fringe Festival this June) – included the Moon as a character. How and by what relationships could we link the Moon of the play to the Moon in the sky, and then how could we link it to other fictional and literary Moons?

Another play we analysed – Going Back Home by Rachel Jury – was a dramatization based on real people and historical events. It seemed obvious these should be linked to their real counterparts, and would a simple “is a fictional representation of” suffice? How should we relate depictions of historical events in the play to eyewitness accounts from the time or to newspaper reports?

Should we define the world view of each play? Would it matter when defining relationships if there were events in the play that were counterfactual or scientifically impossible?

How could we capture intertextuality and references to other plays? Should there be a differentiation between quotations and overt references by the author to other texts and less explicit allusions and shared cultural influences?

Artistic Use Cases

One of the most appealing aspects of this project to me is that we have no strict commercial or business requirements to meet. A starting point was the idea of a “literary search engine” that ranked relevance not according to information retrieval best practice, but under its own terms as art, or perhaps even defined its own “relevance within the world of the play”. In other words, we would be trying to produce results that were beautiful rather than results that best answered a query.

However, there are also a number of very practical use cases for modelling the literary world of a play, rather than just modelling a play as an object.

Querying within a play

Navigating within the text by answering such queries as ‘in which scenes do these two characters appear together’ answers one set of use cases. The BBC’s Mythology Engine was designed to help users find their way around within a lot of brands, series, and episodes, and characters and events were modelled as central.

An equivalent set of queries for literary aspects would be “how many scenes feature metaphors for anger and ambition” or “which monologues include references to Milton”.

Querying across many plays

If you extend such use cases across a body of plays, recommendation scenarios become possible. For example, “if you liked this play which frequently references Voltaire and includes nautical metaphors, then you might also like this play…” and there are clear commercial implications for the arts in terms of marketing and promotion, finding new audiences, and even in planning new work.

These kind of “metaphorical use cases” could also serve as a rich seam for generating interesting user journeys through a literary archive and as a way of promoting serendipitous discovery for students and researchers.

Storyline use cases

A lot of work that has been done at the BBC has been based around the concept of an ‘event’, and the relationship of events to storylines. This is particularly relevant for many practical and creative aspects of writing, compiling, broadcasting, archiving, and re-using content. For example, being able to distinguish the name of the journalist from the names of people who are mentioned within the story, and to distinguish between more and less significant people within a story according to whether they are mentioned as part of the main event or only in association with consequent or secondary events.

Literary and metaphorical use cases might take a similar approach but decompose the events in a story in terms of the emotional development of the characters.

Fictional worlds use cases

One of the ideas that I find the most appealing, but is the hardest to pin down, is the idea of modelling the internal ontological world of a work of fiction. In a fictional ontology, you can have relationships that make no sense in the ‘real’ world, so modelling them cannot rely on the kind of sense-testing and meeting of requirements that we use so much in commercial contexts.

In discussions, some people reacted very strongly against the idea of even attempting to model fictional worlds, which I found fascinating, while others immediately saw the idea as just another aspect of literary creation – an artistic endeavour in its own right.

There is an epistemological tangent in ontological thinking that goes into a debate about realism versus anti-realism that I haven’t fully got to grips with yet.

Where next?

I am at the very early stages of thinking through all this, and not sure where it will go, but am enjoying starting to gather a community of interest. If you would like to know more, I have written in more detail about it all on the project blog: http://www.semantictheatre.org.

The value of forgetting

    Start a conversation 
Estimated reading time 3–4 minutes

Two years ago I was thinking a lot about social media and bereavement and I wrote a post (I friend dead people – Are Social Media Mature Enough to Cope with Bereavement?). Today, by strange coincidence, I happened upon this post: AI resurrector lets people Skype their dead relatives. As the post points out, this appears to be an incarnation of an episode of Black Mirror by Charlie Brooker, so apart from worrying about which other of his dystopias people are going to invoke next, I was again prompted to think about forgetting and remembrance as information processes.

In the past, human beings have found it very easy to forget and have struggled to remember. Oral histories and stories preserved by poets and carvings in stone to record conquests and kings were early memorializations and were important precisely because so little was recorded. Pre-Renaissance librarians and archivists were often more concerned with gathering and preserving scant records than with information overload, or even systematic organization of knowledge, simply because the volume of materials they had to work with was limited. As printing technologies developed and more informational records were paper-based, archivists had to balance the urge to preserve with practical considerations such as the costs of space required to store documents. During the 20th century, the massive surge in the volume of paper documents generated meant that we had to start thinking carefully about what we would deliberately forget.

The digital age seemed to suggest that somehow storage would become so cheap and search engines so intelligent that we would be able to save everything and find it again without a worry and many people seemed to see this as a good thing – archival management becomes a lot easier if you do not bother to select and manage a collection. Professional archivists have pointed out the pitfalls of this attitude and on a personal level, so far, we have – by and large – used our PCs, cameraphones, scanners, etc., to generated huge unmanageable collections of data without regard for what we want to remember and what we want to forget. The urge to “just keep everything” is strong. Charlie Brooker’s dystopias are valuable in showing us the psychological pressures we will have to deal with in this new world.

Our traditions of marking anniversaries, building memorials, and remembering our past have led us to equate memorializations with respect and love, against a background where most things get forgotten. However, as humans we need to forget pain and grief, we need to “let go” and “move on”, otherwise we cause ourselves psychological problems, so we need to be careful with our digital memorializations as extensions of our social networks (for example Facebook video memorials). They may seem like works of love and respect, but there is a danger they will lead people into unhealthy obsession with the past. We have, after all, never before lived in a world where it is harder to forget than it is to remember.

Update: More Google ‘forget’ requests emerge after EU ruling

For Claire – not forgotten.

The Information Master – Louis XIV’s Knowledge Manager

    2 comments 
Estimated reading time 4–6 minutes

I recently read The Information Master: Jean-Baptiste Colbert‘s Secret State Intelligence System by Jacob Soll. It is a very readable but scholarly book that tells the story of how Colbert used the accumulation of knowledge to build a highly efficient administrative system and to promote his own political career. He seems to have been the first person to seize upon the notion of “evidence-based” politics and that knowledge, information and data collection, and scholarship could be used to serve the interests of statecraft. In this way he is an ancestor of much of the thinking that is commonplace not only in today’s political administrations but also in all organizations that value the collection and management of information. The principle sits at the heart of what we mean by the “knowledge economy”.

The grim librarian

Jean-Baptiste Colbert (1619-83) is depicted as ruthless, determined, fierce, and serious. He was an ambitious man and saw his ability to control and organize information as a way of gaining and then keeping political influence. By first persuading the King that an informed leadership was a strong and efficient leadership, and then by being the person who best understood and knew how to use the libraries and resources he collected, Colbert rose to political prominence. However, his work eventually fell victim to the political machinations of his rivals and after his death his collection was scattered.

Using knowledge to serve the state

Before Colbert, the scholarly academic tradition in France had existed independently from the monarchy, but Colbert brought the world of scholarship into the service of the state, believing that all knowledge – even from the most unlikely of sources – had potential value. This is very much in line with modern thinking about Big Data and how that can be used in the service of corporations. Even the most unlikely of sources might contain useful insights into customer preferences or previously unseen supply chain inefficiencies, for example.

Colbert’s career was caught up with the political machinations of the time. He worked as a kind of accountant of Cardinal Mazarin, but when Mazarin’s library was ransacked by political rivals and his librarian fell out of favour, Colbert restored the library and built a unified information system based on the combination of scholarship and administrative documentation, ending the former division between academia and government bureaucracy.

Importance of metadata

Colbert also instinctively grasped the importance of good metadata, cataloguing, and an accurate network of links and cross references in order to be able to obtain relevant and comprehensive information quickly, issues that remain even more urgent than ever given the information explosions modern organizations – and indeed nations – face. This enabled him to become a better administrator than his rivals and by becoming not only the source of political expedient information but also the person who knew how to use the information resources most effectively, he was able to gain political influence and become a key minister under Louis XIV.

A personal vision

I was struck by how much his vast library, archive, and document management system was the result of his own personal vision, how it was built on the dismantling and rebuilding of work of predecessors, but also how, after his death, the system itself fell victim to political changes and was left to collapse. This pattern is repeated frequently in modern information projects. So often the work of the champion of the original system is wasted as infighting that is often not directly connected to the information project itself leads to budget cuts, staff changes, or other problems that lead to the system decaying.

Soll argues that the loss of Colbert’s system hampered political administration in France for generations. Ironically, it was Colbert’s own archives that enabled successive generations of political rivals to find the documents with which to undermine the power of the crown, showing the double-edged nature of information work. It is often the same collections that can both redeem and condemn.

Secrecy or transparency?

Another theme that ran throughout Colbert’s career, with both political and practical implications, was the tension between demands for transparent government and the desire for a secret state. Much of the distinction between public and private archives was simply a matter of who was in control of them and who had set them up, so the situation in France under the monarchy was different to the situation in England where Parliament and the Monarchy maintained entirely separate information systems. In France, an insistence on keeping government financial records secret eventually undermined trust in the economy. Throughout his career Colbert was involved in debates over which and how much government information should be made public, with different factions arguing over the issue – arguments that are especially resonant today.

On being the only girl in the room

    1 comment 
Estimated reading time 3–5 minutes

Perhaps it is because I am settling into a new culture, or perhaps it is because my new time zone has altered the nature of what I see in my Twitter feed, but there seem to have been a spate of articles lately about sexism faced by women working in technology, which makes me very sad. This was on my mind when I received from a former colleague a copy of a report we had co-authored. As I read the list of names, I was struck by how wonderful a group of guys they were, how intelligent, creative, and technically knowledgeable, and what a pleasure it had been to be the only girl in the room. Those guys were utterly supportive, thoughtful, generous of spirit, and full of interest in and encouragement of my contributions.

I am from an editorial background and I don’t really write code, but never once in that group did I experience any kind of tech snobbery. Whenever there was something that I didn’t know about, or unfamiliar acronyms or jargon, someone would provide a clear explanation, without every being patronising, appearing bored or impatient, or making any assumptions about what anyone “ought” to know. I was never made to feel I had asked a stupid question, said something foolish, or that I did not belong. At the same time, these men were always keen and interested to hear my perspectives, and to learn from my experiences. The group dynamic was one of free and open exchange of ideas and of working collaboratively to find solutions to problems. All contributions were valued and everything was considered jointly and equally authored.

I didn’t remain the only girl in the room. I was learning so much in the meetings that I invited my (now former) colleague to join us, bringing a new set of expertise and skills that were welcomed. I had not a moment of concern about inviting a younger and even less technical female colleague in to the group, because I knew she would be made welcome and would have a fantastic opportunity to learn from some brilliant minds.

Of course I have encountered much sexism in my career, but it is not necessary and it is not inevitable. I hate the thought of young women being put off technology as a career because of fears of sexism and discrimination. I know this happens a lot – it happened to me, although I found my way into tech eventually. I do not know whether there is “more” sexism in technology – a charge some of the post I have read have levelled – than there is anywhere else, but I do know that there is sexism in all industries, so you might as well ignore it as a factor and choose a career based on aspects like intellectual stimulation or good career prospects. Technology certainly offers those. I personally have encountered sexism in so-called “female friendly” industries such as publishing and teaching, and I am quite sure it is suffered by nurses, waitresses, actresses, pop singers…. Since I have been working in technology, I have often been the only girl in the room but almost always that room has been a fascinating, welcoming, and inspiring place to be.

This is not primarily written for the specific individuals nor for all the other fantastic guys in tech I have met or worked with (there are so many I can’t possibly name them all), although I hope they enjoy it. This post is intended to promote positive male role models and examples of decent male behaviour for boys and young men to follow, and as a mythbuster for anyone who thinks sexism and geekiness are somehow intrinsically linked.

It is also written for women, as a reminder that although we must speak out against sexist and otherwise toxic behaviour when we encounter it, approval and affirmation are very powerful motivators of change, so we also help by shouting about and celebrating when we find fabulous guys in tech and in life who are getting it right.

How semantic search helps girls and boys, but in different ways

    Start a conversation 
Estimated reading time 2–4 minutes

While researching something else, I happened upon this rather cheering paper: The effect of semantic technologies on the exploration of the web of knowledge by female and male users. Gender issues only tangentially affect my core research, as I generally focus on linguistic communities that are defined by organizational or professional context, so gender effects are rather diluted by that point. I also prefer collapsing false dichotomies rather than emphasizing difference and division, and so I was very heartened that this article shows how semantic techniques can be unifying.

The study is based on observing the search strategies of a group of male and a group of female students in Taiwan. Given non-semantic search systems to use, the male students tended to search very broadly and shallowly, skimming large numbers of results and following links and going off on tangents to find other results. This enabled them to cover a lot of ground, often finding something useful, but also often left them with a rather chaotic collection of results and references. The female students tended to search very deeply and narrowly, often stopping to read in depth a paper that they had found, and trying to fully grasp the nature of the results that had been returned. This meant they tended to collect fewer results overall, the results tended to be clustered around a single concept, and they risked falling into the rather wonderfully named “similarity holes”. These “similarity holes” are search traps where a single search term or collection of terms leads to a small set of results and are essentially “dead ends”.

How did semantic search help?

When the students were given semantic search tools, the male students continued to search broadly and shallowly but the semantic associations helped them to conceptualize and organize what they were doing. This meant that they ended up with a far more coherent, relevant, and useful set of search results and references. In contrast, the female students using the semantic associations offered, found it far easier to broaden their searches and to come up with alternative search terms and approaches enabling them to avoid and break out of any “similarity holes” they fell into.

Gender effects dissipate

I was very heartened that improvements in technology can be gender-neutral – they can simply be improvements of benefit in different ways to everyone, they don’t have to deliberately try to account for gender difference. I was also very heartened to note that the researchers found that gender differences in search strategies dissipated once students were taught advanced information seeking and knowledge management strategies. Gender differences were only apparent in novice, inexperienced searchers. So, in information seeking work at least, any biological or socially created gender differences are weak and easily overcome with some well directed instruction and semantic techniques are a help rather than a hindrance.