Making KO Work: integrating taxonomies into technology

17th March, 2017 Fran Start a conversation
Estimated reading time 6–10 minutes

The recent ISKO UK event Making KO Work: integrating taxonomies into technology offered four very different but complementary talks, followed by a panel session. These provided a good overview of current practice and largely concluded that although technology has advanced, there is still need for human intervention in KO work.

Can You Really Implement Taxonomies in Native SharePoint?

Marc Stephenson from Metataxis gave a clear and helpful overview of the key terms and principles you need to know when using taxonomies and folksonomies in SharePoint. SharePoint is very widely used as an enterprise document repository, and although its taxonomy management capabilities are limited, when combined with an external taxonomy management solution, it can enable very effective metadata capture.

The first step is to become familiar with the specialised terminology that SharePoint uses. Metadata in SharePoint is held as “Columns”, which can be System Columns that are fixed and integral to SharePoint functionality, or Custom Columns, which can be changed and which need to be managed by an information architecture role. For example, Columns can be set as “Mandatory” to ensure users fill them in. Columns can be configured to provide picklists or lookups, as well as being free text, and can be specified as “numeric”, “date” etc. Taxonomies can be included as “Managed Metadata”.

Different “Content Types” can be defined, for example to apply standardised headers and footers to documents, enforce workflow, or apply a retention/disposal policy, and many different pre-defined Content Types are available. Taxonomies are referred to as “Managed Term Sets”, and these can be controlled by a taxonomist role. “Managed Keywords” are essentially folksonomic tags, but SharePoint allows these to be transferred into Managed Term Sets, enabling a taxonomist to choose folksonomic tags to become part of more formal taxonomies.

The “Term Store Manager” provides some functionality for taxonomy management, such as adding synonyms (“Other Labels”), or deprecating terms so that they can no longer be found by users when tagging (but remain available for search). Terms can also be deleted, but that should only be done if there is a process for re-tagging documents, because a deleted tag will generate a metadata error the next time someone tries to save the document. Limited polyhierarchy is possible, because the same term can exist in more than one “Managed Term Set”.

“Term Groups” can be defined, which can be useful if different departments want to manage their own taxonomies.

There are various limitations – such as a maximum number of Managed Terms in a Term Set (30,000) and if SharePoint is deployed online across a large organisation, changes can take some time to propagate throughout the system. The process of importing taxonomies needs to be managed carefully, as there is no way to re-import or over-write Term Sets (you would end up with duplicate sets) and there is no easy way to export taxonomies. There is no provision for term history or scope notes, and no analytics, so SharePoint lacks full taxonomy management functionality.

There are companion taxonomy management products (e.g. SmartLogic’s Semaphore, or Concept Searching) and it is possible to use other taxonomy management tools (such as PoolParty, Synaptica, or MultiTes) but an additional import/export process would need to be built.

So, SharePoint offers a lot of options for metadata management, but is better as a taxonomy deployment tool than a master taxonomy management tool.

Integrating Taxonomy with Easy, Semantic Authoring

Joe Pairman of Mekon Ltd, demonstrated a very user-friendly lightweight set of tagging tools that allow non-expert users the ability to add rich metadata to content as they work. This addresses a key problem for taxonomists – how to ensure subject matter experts or authors who are more focused on content than metadata are able to tag consistently, quickly, and easily. By taking a form-based approach to content creation, authors are able to add structural metadata as they work, and add tags to specific words with a couple of clicks. This is particularly effective with a pre-defined controlled vocabulary.

The example Joe showed us was a very clear commercial use case of Linked Data, because the controlled vocabulary was very specific – products for sale. Each product was associated with a DBPedia concept, which provided the URI, and where a match to the text was detected the relevant word was highlighted. The user could then click on that word, see the suggested DBPedia concept, and click to tag. The tool (using FontoXML and Congility technology) then applied the relevant RDF to the underlying XML document “behind the scenes”, in a process of “inline semantic enrichment”. This approach enables accurate, author-mediated tagging at a very granular level. The customers reading the content online could then click on the hghlighted text and the relevant products could be displayed with an “add to cart” function, with the aim of increasing sales. As an added bonus, the tags are also available for search engines, helping surface very accurately relevant content in search results. (Schema.org tags could also be included.)

Enhancement of User Journeys with SOLR at Historic England

Richard Worthington of Historic England described the problems they had when deploying a SOLR/Lucene search to their documents without any taxonomy or thesaurus support for searching. They soon found that SQL searches were too blunt an instrument to provide useful results – for example, searching for “Grant” at first would bring up the page about the grants that were offered, but as soon as they added more data sets, this frequently searched-for page became buried under references to Grantchester, Grantham, etc.

Although they could manage relevancy to a certain extent at the data set level and by selecting “top results” for specific searches, the search team realised that this would be a painstaking and rigid process. It would also not address the problem that many terms used by the subject matter expert authors were not the same as the terms general users were searching for. For example, general users would search for “Lincoln Cathedral” rather than “Cathedral Church of St Mary of Lincoln”. So, they have much work for human taxonomists and thesaurus editors to do.

Applied Taxonomy Frameworks: Your Mileage May Vary

Alan Flett of SmartLogic took us through the latest enhancements to their products, showcasing a new feature called “Fact Extraction”. This works by identifying the context around specific data and information, in order to drive Business Intelligence and Analytics. The tool is essentially a user-friendly simplified algorithm builder that allows very specific searches to be constructed using pre-defined “building blocks”, such as “Facts”, “Entities”, and “Skips”. This means a specific piece of information, words to ignore, and entities such as a number or a date can be specified to construct a complex search query. This allows the search results to be defined by context and returned in context, and is especially effective for well-structured data sets. It also means that results are framed in a standardized format, which is useful for analytics.

Concluding Panel

Although techniques such as automated classification, machine learning, and AI are progressing all the time, these still work best when combined with a well-structured knowledge base. Creating that knowledge base relies on human intelligence, especially for the familiar problems of disambiguation and synonym collection, in particular where the content authors have a different approach or level of domain expertise to the end users of the search systems. The panel agreed that for both the creation of thesauruses, taxonomies, and ontologies and for the deployment of these in tagging, semi-automated approaches remain necessary, and so there is still much to be done by human taxonomists, ontologists, and information architects in order to make knowledge organisation work.

Image: Lincoln Cathedral. Photo by Zaphad1

Top

Aggregations and basic categories

4th October, 2014 Fran Start a conversation
Estimated reading time 2–3 minutes

I recently enjoyed reading about the work Safari are currently and doing to create a controlled vocabulary and topic aggregation pages to underpin navigation and discovery of their content.

Iterate, again

I very much liked the mix of manual and automated techniques the team used to maximise capturing value from existing resources while using machine processing to help and support the human editorial curation work. Lightweight iterative approaches have become standard in some areas of design, but achieving high quality information structures also usually requires several stages of revision and refinement. It is not always possible to predict what will happen in attempts to index or repurpose existing content, nor how users will respond to different information structures, and so the ability to iterate, correct, re-index, correct, adjust indexing methods, re-index, correct… is vital. Small samples of content are often not sufficient to find all potential issues or challenges, so it is always worth being prepared for surprises once you scale up.

Basics, as always

The Safari team identified the huge intellectual value locked into the existing human-created indexes and it is great to see them being able to extract some of that value, but then augment it using automated techniques. I was very interested to read about how the level of granularity in the individual indexes was too fine for overall aggregation. The team realised that there were “missing subtopics” – key topics that tended to be the subjects of entire books. These “missing subtopics” were found at the level of book titles and it struck me that this vital level of conceptualization aligns directly with Eleanor Rosch‘s work on basic categories and prototype theory. It is not surprising that the concepts that are “basic categories” to the likely readership would be found at book title level, rather than index level.

This is further illustrated by the fact that the very broad high level topics such as “business” did not work well either. These needed not to be “clustered up”, but broken down and refined to the level of the “basic categories” that people naturally think of first.

So, the Safari team’s work is a very clear illustration of not only how to combine manual and automated techniques but also how to find the “basic categories” that match users’ natural level of thinking about the subject area.

Top

Semantic Search – Call for Papers for Special Issue on Semantic Search for Aslib Journal

3rd August, 2013 Fran 1 comment
Estimated reading time 4–6 minutes

This special issue aims to explore the possibilities and limitations of Semantic Search. We are particularly interested in papers that place carefully conducted studies into the wider framework of current Semantic Search research in the broader context of Linked Open Data.

Research into Semantic Search and its applications has been gaining momentum over the last few years, with an increasing number of studies on general principles, proof of concept and prototypical applications. The market for Semantic Search applications and its role within the general development of (internet) technologies and its impact on different areas of private and public life have attracted attention. Simultaneously, many publicly funded projects in the field of cultural heritage were initialised. Researchers in many disciplines have been making progress in the establishment of both theories and methods for Semantic Search. However, there still is a lack of comparison across individual studies as well as a need for standardisation regarding the dissociation of Semantic Search of other search solutions, agreed upon definitions as well as technologies and interfaces.

Semantic Search research is often based on large and rich data sets and a combination of techniques ranging from statistical bag of words approaches and natural-language-processing enriched via a subtle utilisation of metadata over classificatory approaches right up to ontological reasoning. Over the last 10 years a lot of initial technical and conceptual obstacles in the field of Semantic Search have been overcome. After the initial euphoria for Semantic Search that resulted in a technically driven supply of search solutions, appraisal of successful and less successful approaches is needed. Amongst other things the limitations of working with open world solutions on – only apparently comprehensive – linked open data sets compared to small domain specific solutions need to be determined.
One ongoing challenge for semantic search solutions is their usability and user acceptance, as only highly usable walk-up-and-use-approaches stand a chance in the field of general search.

For this special issue, we invite articles which address the opportunities and challenges of Semantic Search from theoretical and practical, conceptual and empirical perspectives.

Topics of interest include but are not restricted to:

The history of semantic search – how the latest techniques and technologies have come out of developments over the last 5, 10, 20, 100, 2000… years
Technical approaches to semantic search : linguistic/NLP, probabilistic, artificial intelligence, conceptual/ontological …
Current trends in Semantic Search
Best practice – how far along the road from ‘early adopters’ to ‘mainstream users’ has semantic search gone so far?
Semantic Search and cultural heritage
Usability and user experience of Semantic Search
Visualisation and Semantic Search
Quality criteria for Semantic Search
Impact of norms and standardisation for instance (like ISO 25964 “Thesauri for information retrieval“) and the potential of Semantic Search?
How are semantic technologies fostering a need for cross-industry collaboration and standardisation?
How are Semantic Search techniques and technologies being used in practice?
Practical problems in brokering consensus and agreement – defining concepts, terms and classes, etc.
Curation and management of ontologies
Differences between web-scale, enterprise scale, and collection-specific scale techniques
Evaluation of Semantic Search solutions
Comparison of data collection approaches
User behaviour and the evolution of norms and conventions
Information behaviour and information literacy
User surveys
Usage scenarios and case studies

Submissions

Papers should clearly connect their studies to the wider body of Semantic Search scholarship, and spell out the implications of their findings for future research. In general, only research-based submissions including case studies and best practice will be considered. Viewpoints, literature reviews or general reviews are generally not acceptable.

Papers should be 4,000 to 6,000 words in length (including references). Citations and references should be in our journal style.

Please see the author guidelines at http://www.emeraldinsight.com/products/journals/author_guidelines.htm?id=ap for more details and submission instructions.
Submissions to Aslib Proceedings are made using ScholarOne Manuscripts, the online submission and peer review system. Registration and access is available at http://mc.manuscriptcentral.com/ap.

Important Dates

Paper submission: 15.12.2013
Notice of review results: 15.02.2013
Revisions due: 31.03.2014
Publication: Aslib Proceedings, issue 5, 2014.

About the Journal

Aslib Proceedings (ISSN: 0001-253X) is a peer-reviewed high-quality journal covering international research and practice in library and information science, and information management. The journal is the major publication for ASLIB – the Association for Information Management in the United Kingdom – a membership association for people who manage information and knowledge in organisations and the information industry.
Information about the journal can be found at
http://www.emeraldinsight.com/products/journals/journals.htm?id=ap

Contact the guest editors

Prof. Dr. Ulrike Spree
– Hamburg University of Applied Sciences –
Faculty Design, Medien and Information
Department Information
Finkenau 35
20081 Hamburg
Phone: +49/40/42875/3607
Email: ulrike.spree@haw-hamburg.de

Fran Alexander
Information Architect, BCA Research (2013- )
Taxonomy Manager, BBC Information and Archives (2009-13)
Email: fran@vocabcontrol.com
Twitter: @frangle

Top

Libraries, Media, and the Semantic Web meetup at the BBC

2nd December, 2012 Fran Start a conversation
Estimated reading time 3–4 minutes

In a bit of a blog cleanup, I discovered this post languishing unpublished. The event took place earlier this year but the videos of the presentations are still well worth watching. It was an excellent session with short but highly informative talks by some of the smartest people currently working in the semantic web arena. The Videos of the event are available on You Tube.

Historypin

Jon Voss of Historypin was a true “information altruist”, describing libraries as a “radical idea”. The concept that people should be able to get information for free at the point of access, paid for by general taxation, has huge political implications. (Many of our libraries were funded by Victorian philanthropists who realised that an educated workforce was a more productive workforce, something that appears to have been largely forgotten today.) Historypin is seeking to build a new library, based on personal collections of content and metadata – a “memory-sharing” project. Jon eloquently explained how the Semantic Web reflects the principles of the first librarians in that it seeks ways to encourage people to open up and share knowledge as widely as possible.

MIMAS

Adrian Stevenson of MIMAS described various projects including Archives Hub, an excellent project helping archives, and in particular small archives that don’t have much funding, to share content and catalogues.

rNews

Evan Sandhaus of the New York Times explained the IPTC’s rNews – a news markup standard that should help search engines and search analytics tools to index news content more effectively.

schema.org

Dan Brickley’s “compare and contrast” of Universal Decimal Classification with schema.org was wonderful and he reminded technologists that it very easy to forget that librarians and classification theorists were attempting to solve search problems far in advance of the invention of computers. He showed an example of “search log analysis” from 1912, queries sent to the Belgian international bibliographic service – an early “semantic question answering service”. The “search terms” were fascinating and not so very different to the sort of things you’d expect people to be asking today. He also gave an excellent overview of Lonclass the BBC Archive’s largest classification scheme, which is based on UDC.

BBC Olympics online

Silver Oliver described how BBC Future Media is pioneering semantic technologies and using the Olympic Games to showcase this work on a huge and fast-paced scale. By using semantic techniques, dynamic rich websites can be built and kept up to the minute, even once results start to pour in.

World Service audio archives

Yves Raimond talked about a BBC Research & Development project to automatically index World Service audio archives. The World Service, having been a separate organisation to the core BBC, has not traditionally been part of the main BBC Archive, and most of its content has little or no useful metadata. Nevertheless, the content itself is highly valuable, so anything that can be done to preserve it and make it accessible is a benefit. The audio files were processed through speech-to-text software, and then automated indexing applied to generate suggested tags. The accuracy rate is about 70% so human help is needed to sort out the good tags from the bad (and occasionally offensive!) tags, but thsi is still a lot easier than tagging everything from scratch.

Top

The Shape of Knowledge – review of ISKOUK event

6th September, 2012 Fran 1 comment
Estimated reading time 1–2 minutes

On Tuesday I attended a very interesting event about information visualization and I have written a review for the ISKO UK blog.

I was particularly fascinated by the ideas suggested by Martin Dodge of mapping areas that are not “space” and what this means for the definition of a “map”. So, the idea of following the “path” of a device such as a phone through the electromagnetic spectrum brings a geographical metaphor into a non-tangible “world”. Conversely, is the software and code that devices such as robots use to navigate the world a new form of “map”? Previously, I have thought of code as “instructions” and “graphs” but have always thought of the “graph” as a representation of coded instructions, visualized for the benefit of humans, rather than the machines. However, now that machines are responding more directly to visual cues, perhaps the gap between their “maps” and our “maps” is vanishing.

Top

SLA Conference in Chicago

11th August, 2012 Fran Start a conversation
Estimated reading time 3–5 minutes

Last month I had a wonderful time at the SLA (Special Libraries Association) conference in Chicago. I had never previously been to an SLA conference, even though there is a lively SLA Europe division. SLA is very keen to be seen as “not just for librarians” and the conference certainly spanned a vast range of information professions. The Taxonomy Division is thriving and there seem to be far more American than British taxonomists, which, although not surprising, was a pleasure as I don’t often find myself as one of a crowd! The conference has a plethora of receptions and social events, including the “legendary” IT division dance party.

There were well over 100 presentation sessions, as well as divisional meetings, panel discussions, and networking events that ranged from business breakfasts to tours of Chicago’s architectural sights. There was plenty of scope to avoid or embrace the wide range of issues and areas under discussion and I focused on taxonomies, Linked Data, image metadata, and then took a diversion into business research and propaganda.

I also thoroughly enjoyed the vendor demonstrations, especially the editorially curated and spam-free search engine Blekko, FastCase, and Law360 legal information vendors, and EOS library management systems.

My next posts will cover a few of the sessions I attended in more detail. Here’s the first:

Adding Value to Content through Linked Data

Joseph Busch of Taxonomy Strategies offered an overview of the world of Linked Data. The majority of Linked Data available in the “Linked Data Cloud” is US government data, with Life Sciences data in second place, which reflects the communities that are willing and able to make their data freely and publicly available. It is important to keep in mind the distinction between concept schemes – Dublin Core, FOAF, SKOS, which provide structures but no meanings – and semantic schemes – taxonomies, controlled vocabularies, ontologies, which provide meanings. Meanings are created through context and relationships, and many people assume that equivalence is simple and association is complex. However, establishing whether something is the “same” as something else is often far more difficult than simply asserting that two things are related to each other.

Many people also fail to use the full potential of their knowledge organization work. Vocabularies are tools that can be used to help solve problems by breaking down complex issues into key components, giving people ways of discussing ideas, and challenging perceptions.

The presentation by Joel Richard, web developer at the Smithsonian Libraries, focused on their botanic semantic project – digitizing and indexing Taxonomic Literature II. (I assume they have discussed taxonomies of taxonomy at some point!) This is a fifteen-volume guide to the literature of systemic botany published between 1753 and 1940. The International Association for Plant Taxonomy (IAPT) granted permission to the Smithsonian to release the work on the web under an open licence.

The books were scanned using OCR, which produced 99.97% accuracy, which sounds impressive but that actually means 5,000-12,000 errors – far too many for serious researchers. Errors in general text were less of a concern than errors in citations and other structured information, where – for example, mistaking an 8 for a 3 could be very misleading. After some cleanup work, the team next identified terms such as names and dates that could be parsed and tagged, and selected sets of pre-existing identifiers and vocabularies. They are continuing to look for ontologies that may be suitable for their data set. Other issues to think about are software and storage. They are using Drupal rather than a triplestore, but are concerned about scalability, so are trying to avoid creating billions of triples to manage.

Joel also outlined some of the benefits of using Linked Data, gave some examples of successful projects, and provided links to further resources.

Top

Building bridges: Linking diverse classification schemes as part of a technology change project

19th June, 2012 Fran Start a conversation
< 1 minute

My paper about my work on the linking and migration of legacy classification schemes, taxonomies, and controlled vocabularies has been published in the Journal for Business Information Review.

Top

Building, visualising and deploying taxonomies and ontologies; the reality – Content Intelligence Forum event

6th June, 2012 Fran Start a conversation
Estimated reading time 1–2 minutes

I have been trying to get to the Content Intelligence Forum meetups for some time as they always seem to offer excellent speakers on key topics that don’t tend to get the attention they deserve, so I was delighted to be able to attend Stephen D’Arcy’s talk a little while ago on taxonomies and ontologies.

Stephen has many years of experience designing semantic information systems for large organisations, ranging from health care providers, to banks, to media companies. His career illustrates the transferability and wide demand for information skills.

His 8-point checklist for a taxonomy project was extremely helpful – Define, Audit, Tools, Plan, Build, Deploy, Governance, Documentation – as were his tips for managing stakeholders, IT departments in particular. He warned against the pitfalls of not including taxonomy management early enough in search systems design, and the problems that you can be left with if you do not have a flexible and dynamic way of managing your taxonomy and ontology structures. He also included a lot of examples that illustrated the fun aspects of ontologies when used to create interesting pathways through entertainment content in particular.

The conversation after the talk was very engaging and I enjoyed finding out about common problems that information professionals face, including how best to define terms, how to encourage clear thinking, and how to communicate good research techniques.

Top

To embed or not to embed – metadata and IDs

23rd April, 2012 Fran Start a conversation
Estimated reading time 5–8 minutes

One of the problems with the word metadata (apart from the fact that no-one can decide whether it should be singular or plural – as a former classicist I am quite happy to use it in the Anglicized singular form!) is that the word covers such a wide range of data required for a huge variety of uses.

At a recent presentation I gave as part of a “knowledge share” session at the digital design agency Tobias and Tobias, I was rightly challenged by Patrick from Golant Media Ventures, when I said that you should not embed metadata in your content, but manage it separately. He pointed out that for copyright and rights management purposes embedded metadata is extremely useful and in fact many content creators are actively campaigning to make sure that software and service providers do not strip metadata out of content when it is transferred or transcoded.

Embedding information versus embedding IDs

He is quite right, but I was right too – just in a different sense. It is a complex and important point, so I thought it was worth expanding on. I was talking about not embedding metadata structures in assets when you can manage structures of primarily semantic metadata separately. You can do this by embedding only IDs in the assets, and then using those as lookups to access the structure as and when you need to, picking up the structure “on the fly”. The principle remains the same whether you are talking about “private” localised IDs or “public” IDs, such as Linked Open Data dereferenceable URIs (i.e. website addresses you can look up). Such an approach allows you to manage the structures and meanings contextualising those IDs separately from managing the assets themselves.

The reason is mainly technical. If you wish to add to or edit the structure of your taxonomy (or ontology) or change the information your URI points to, it is far easier to do this in one place than it is to find all the assets containing that metadata and re-index them all individually every time you make a change. So, if you store taxonomy pathways as hard-coded text strings in a piece of content, but then you decide to alter the hierarchy, you have to go back to each and every occurrence of that text string applied to content and update it, in each and every asset record that contains it. Sometimes this might be fine – if you know that you are hardly ever going to change the structure or if you have very few assets, or if you have a very powerful and sophisticated re-indexing service. Generally, however, given that language is constantly evolving and asset collections are constantly growing and changing, the “hard-coding” approach is going to require an awful lot of processing and so will be very resource hungry.

If, on the other hand, all you embed in your asset record is an ID, you can use an external system to provide the context for that ID – the pathways of the taxonomy, the relationships of an ontology, the semantic sense of a URI. You can then alter your taxonomy’s hierarchies (e.g. adding and moving concept nodes) or develop your ontology (e.g. adding new classes and relationships) in one centralised system without having to go back to every individual indexed asset in turn. This also means that you can de-couple your taxonomy or ontology management system from your digital asset management or content management system. This is important if you want more sophisticated metadata management than standard DAM, search, or CMS software provides, or if you want to future proof your semantic structures.

Modular systems are more future proof

By keeping asset management and metadata management separate you can upgrade either part without having to upgrade the other. As semantic technologies – such as ontology editing systems – are going through a rapid phase of development, and in general evolving faster than search, DAM, and other consuming systems, maintaining your semantic structures in as transferable and system agnostic form as possible shows foresight. Conversely, you may want to invest only a little in a DAM system, with the hope that business will grow and you will be able to upgrade as your content collections increase. If you have a separate metadata management system you should be able to keep that, while changing your DAM system.

Rights management is different

However, all this primarily concerns internal content and metadata management. Where embedding metadata in the asset itself makes most sense is when that metadata is metadata that you want to remain fixed to that asset and be published with it – for example, details of where a photo was taken, who owns the copyright and how to get in touch with them to licence re-use of that photo. This is because making that information hard to strip out means that when your asset wanders out into the public world of the Internet and frequent uncontrollable copying, you want users to be able to find out easily the origins of the image and its ownership.

A huge problem for collection of royalties and licensing payments is that people who would be willing to pay simply don’t know who to pay. Deliberate piracy will always be a cost – just as shops will always have to allow for a certain amount of “shrinkage” due to shoplifting, but physical shops tend to be pretty good at making sure customers who are willing to pay can find plenty of checkout tills, self-service checkouts, or sales assistants. Keeping rights information embedded in assets is the equivalent of the checkout, not the security camera.

How important is being up to date?

Of course, the problem of updating remains – so if copyrights are transferred, all those assets that have gone out with old embedded metadata contain out of date information. So, rights managers are increasingly moving towards a system of embedding dereferenceable IDs as well. One example is the EIDR system that uses this method (as well as other techniques) to manage rights. By embedding an ID that links to a centralised rights registry, information can be updated once within that central registry, and then whenever someone looks up that ID, they get the most up to date details.

So, we are both right in a way. Embedding IDs and managing metadata separately to managing assets has many advantages. Embedding the metadata itself can also be useful, especially if it is rights information of assets that will be released onto the public Internet and is information that you may not need to update, but that you do not want to be lost when the asset is copied.

Top

Holodecks, marketing, and crime scenes – the DAM link between different worlds

13th November, 2011 Fran Start a conversation
Estimated reading time 4–6 minutes

In the last two weeks I have attended three very different conferences, with DAM as the common thread. The first was Media Pro Expo, where I spoke on a panel with the DAM Foundation, alongside Mark Davey, Madi Solomon, and David Lipsey. The second was Createasphere‘s first European DAM conference, and the third (co-located with the Createasphere event) was the SPAR Europe Conference on 3D Imaging and Data Management for Engineering, Construction, Manufacturing, and Security.

The contrast between Media Pro and SPAR, and their respective audiences, was striking, but so were the similarities of the problems they faced, such as the common need to manage rich media assets and huge volumes of data. Media Pro was aimed at marketing companies, and had lots of amusing exhibits showcasing ways of using technology to create engaging and entertaining campaigns. (I enjoyed playing with an interactive magazine cover linked to a camera that allowed you to put your picture “on the cover” and select your favourite headlines.) Marketing companies are concerned with keeping, curating and mining data not just about customers’ contact details, but also their likes, social connections, and shopping habits in order to create personalised campaigns, so they have become great consumers of metadata.

3D Imaging and Data Management

SPAR was all about scanning and mapping, not in the sense that I am familiar with, but literally surveying the Earth and making maps. There were companies that use lasers to create roadmaps, others that carry out aerial surveys, and some that create 3-D representations of buildings. There are systems for surveying and modelling building sites to make sure that construction avoids sewers, pipes, and underground cables, and even a system for creating 3-D photosets of crime scenes to help the police in investigation and evidence gathering.

Createasphere

At Createasphere I talked about managing metadata in complex information environments and how we need to treat metadata as content in its own right. There were a range of excellent and diverse presentations, covering topics from the potential of immersive virtual worlds and the huge volumes of data they produce, to descriptions of technical metadata exchange projects.

I began to think about the crossover point between the creativity and imagination of the media and marketing companies and the power and accuracy of the surveying companies and how this is going to bring about hugely powerful fantasy “Holodeck” worlds that will make Second Life and the Sims look quainter than the Mickey Mouse cartoons of the 1930s.

Better than the real world

One challenge for information professionals is to think about how we can create navigation and search systems that do more than just replicate the real-world paradigms we are used to at the moment – I am thinking of things like road signs and timetables – but how to harness the best of semantic techniques and data mining processes to create reactive intuitive worlds that work better than the real one. Ed Lantz of Vortex Immersion Media spoke of “intelligent spaces” that automatically access our data, our assets, information about us, and arrange themselves to suit us. How do we prepare for a world when the likes of Apple’s speech recognition system Siri aren’t genies in bottles, but are the environment around us? We used to worry about ghosts in the machine, but will we end up as the ghosts inside the machine? We worry about putting our assets out there into the cloud, but perhaps we should be thinking more about what it will be like when we step inside the cloud or bring the cloud into our homes?

There was a post circulating on Twitter recently describing the library of the future as a hellish place where characters from books come alive and stalk the readers in the rooms. It was somewhat derided as a childish joke, but if we create Holodecks and then try to live in them, it could well come true. The implicit warning it contains that we could inadvertently trap ourselves in such a hellish place where privacy, rights, control, and manipulation are so hidden from view that we lose our sense of self seems to be very mature and insightful. Another post I read was about how interface designers are currently working on “pictures under glass” and need to start to use the full tactile, haptic, and 360 degree expressivity of our physical bodies, such as we are beginning to with technologies like the Wii and Kinect.

Making work fun

Theresa Regli of the Real Story Group pointed out that the world we are in now is one in which people still don’t grasp the importance of labelling their images, so immersive virtual worlds seem a long way off, but she also talked of the need for corporate interfaces to embrace “gamification”, as employees are far more productive when their jobs are fun. It may take some time, but I like the idea of a Holodeck meeting room where people make presentations and collaborate on plans by dancing around, rather than sitting staidly at a table. Rather than the hellish library where AI brings fictional monsters to life, it might turn out to be a lot of fun and all that movement may even be good for our health!

Top

1 2 3 … 5 Next »

Category Archives: information architecture