This is another write-up from the Henry Stewart DAM London conference.

Identity and identification

Robin Wilson discussed the issue of content identifiers, which are vitally important for digital rights management, but yet tend to be overlooked. He argued that although people become engaged in debates about titles and the language used in labels and classification systems, people overlook the need to achieve consensus on basic identification.

(I was quite surprised, as I have always thought that people would argue passionately about what something should be called and how using the wrong terminology affects usability, but that they would settle on machine-readable IDs quite happily. Perhaps it is the neutrality of such codes that makes the politics intractable. If you have invested huge amounts of money in a database that demands certain codes, you will argue that those codes are used by everyone else to save you the costs of translation or acquiring a compatible system, and there are no appeals to usability, or brokerage via editorial policy, that can be made. It simply becomes a matter of whoever shouts the loudest gets to spend the least money in the short term. )

Robin argued that the only way to create an efficient digital marketplace is to have a trusted authority oversee a system of digital identifiers that are tightly bound within the digital asset, so they cannot easily be stripped out even when an asset is divided, split, shared, and copied. The authority needs to be trusted by consumers and creators/publishers in terms of political neutrality, stability, etc.

(I could understand how this system would make it easier for people who are willing to pay for content to see what rights they need to buy and who they should pay, but I couldn’t see how the system could help content owners identify plagiarism without an active search mechanism. Presumably a digital watermark would persist throughout copies of an asset, provided that it wasn’t being deliberately stripped, but if the user simply decided not to pay, I don’t see how the system would help identify rights breaches. Robin mentioned in conversation Turnitin’s plagiarism management, which has become more lucrative than their original work on content analysis, but it requires an active process instigated by the content owner to search for unauthorised use of their content. This is fine for the major publishers of the world, who can afford to pay for such services, but is less appealing to individuals, whether professional freelances or amateur content creators, who would need a cheap and easy solution that would alert them to breaches of copyright without their having to spend time searching.)

The identifiers themselves need to be independent of any specific technology. At the moment, DAM systems are often proprietary and therefore identifiers and metadata cannot easily flow from one system to another. Some systems even strip away any metadata associated with a file on import and export.

Robin described five types of identifier currently being used or developed:

  • Uniform Resource Name (URN)
  • Handle System
  • Digital Object Identifier
  • Persistent URL (PURL)
  • ARK (Archival Resource Key).

He outlined three essential qualities for identifiers – that they be unique, globally registered, and locally resolved.

So why don’t we share?

Robin argued that it is easier for DAM vendors to build “safe” systems that lock all content within an enterprise environment, only those with a public service/archival remit tend to be collaborative and open. DAM vendors resist a federated approach online and prefer to use a one-to-one or directly intermediated transaction model. Federated identifier management services exist but vendors and customers don’t trust them. The problem is mainly social, not technological.

One of the problems is agreeing to share the costs of services, such as infrastructure, registration and validation, governance and development of the system, administration, and outreach and marketing.

(Efforts to standardise may well benefit the big players more than the small players and so there is a strong argument for them bearing the initial costs and offering support for smaller players to join. Once enough people opt in, the system gains critical mass and it becomes both easier to join and costs of joining become less of an unquantifiable risk – you can benefit from the experiences of others. The semantic web is currently attempting to acquire this “critical mass”. As marketers realise the potential of semantic web technology to make money, no doubt we will see an upsurge in interest. Facebook’s “like” button may well be heralding the advent of the ad-driven semantic web, which will probably drive uptake far faster than the worthy efforts of academics to improve the world by sharing research data!)