identifiers | VocabControl

I met Rob Wilson of RCWS consulting at the DAM London conference last summer, where he talked about digital identifiers. He has long career working on information architecture for range of organisations and industry standards bodies, including the ultimately doomed e-GMS project, which was an attempt to unify government metadata standards for sharing and interoperability.

Identifiers are a hot topic at the moment, and so I was pleased Rob was willing to talk to me and some of my colleagues about some of the problems and attempts at solving them in the wider industry. Rob described the need for stable and persistent IDs for asset management and that this is not a new problem but one that has been worked on by various groups for many years. He distinguished between ID management and metadata management, pointing out that metadata may change but the ID of an asset can be kept stable. In just the same way that managing metadata as a distinct element of content is important, so IDs need to be managed as distinct from other metadata.

Rob is an independent consultant and has no commercial affiliation with the EIDR content asset identifier system, but suggested it is a robust model in many ways. The EIDR system embeds IDs within asset files, using a combination of steganography, strong encryption, and watermarking and “triangulation” of combined aspects to create a single ID. The idea is that the ID is so deeply ingrained and fragmented within the structure of the file that it cannot easily be removed and can be recovered from a small fraction of the file – e.g. if someone takes a clip from a video, the “parent” ID is discoverable from the clip. Rob thought the EIDR system was better than digital rights management (DRM) methods, which rely on trying to prevent distribution, and “locking” content after a certain amount of time or use, which gives an incentive to everyone who has such content to try to break the DRM system. If an individual can still access their illegally held content despite the ID, they have little incentive to try to remove the ID.

The system does not try to prevent theft from source, but helps to prove when copyright has been breached, because the ID – the ownership – remains within the file. It is intended as a tool to deter systematic copyright theft by large organisations, by making legal cases easy to win. Large organisations typically have the funds, or insurances, to cover large claims, unlike individuals, so it is unlikely to be cost-effective to pursue individuals.

The EIDR system also has an ID resolver that manages rights and supply authentication (as well as ownership) so that when content is accessed, the system checks that the appropriate rights and licences have been obtained before delivery.

Rob also outlined the elements of information architecture that need to be considered for unified organisational information management – establish standards, develop models, devise policies, select tools, maintain governance, etc. He emphasised that IDs are not a “silver bullet” to solve all issues, but that if a few problematic key use cases are known, they can be investigated to see if robust federated ID management architecture would help.

This is another write-up from the Henry Stewart DAM London conference.

Identity and identification

Robin Wilson discussed the issue of content identifiers, which are vitally important for digital rights management, but yet tend to be overlooked. He argued that although people become engaged in debates about titles and the language used in labels and classification systems, people overlook the need to achieve consensus on basic identification.

(I was quite surprised, as I have always thought that people would argue passionately about what something should be called and how using the wrong terminology affects usability, but that they would settle on machine-readable IDs quite happily. Perhaps it is the neutrality of such codes that makes the politics intractable. If you have invested huge amounts of money in a database that demands certain codes, you will argue that those codes are used by everyone else to save you the costs of translation or acquiring a compatible system, and there are no appeals to usability, or brokerage via editorial policy, that can be made. It simply becomes a matter of whoever shouts the loudest gets to spend the least money in the short term. )

Robin argued that the only way to create an efficient digital marketplace is to have a trusted authority oversee a system of digital identifiers that are tightly bound within the digital asset, so they cannot easily be stripped out even when an asset is divided, split, shared, and copied. The authority needs to be trusted by consumers and creators/publishers in terms of political neutrality, stability, etc.

(I could understand how this system would make it easier for people who are willing to pay for content to see what rights they need to buy and who they should pay, but I couldn’t see how the system could help content owners identify plagiarism without an active search mechanism. Presumably a digital watermark would persist throughout copies of an asset, provided that it wasn’t being deliberately stripped, but if the user simply decided not to pay, I don’t see how the system would help identify rights breaches. Robin mentioned in conversation Turnitin’s plagiarism management, which has become more lucrative than their original work on content analysis, but it requires an active process instigated by the content owner to search for unauthorised use of their content. This is fine for the major publishers of the world, who can afford to pay for such services, but is less appealing to individuals, whether professional freelances or amateur content creators, who would need a cheap and easy solution that would alert them to breaches of copyright without their having to spend time searching.)

The identifiers themselves need to be independent of any specific technology. At the moment, DAM systems are often proprietary and therefore identifiers and metadata cannot easily flow from one system to another. Some systems even strip away any metadata associated with a file on import and export.

Robin described five types of identifier currently being used or developed:

Uniform Resource Name (URN)
Handle System
Digital Object Identifier
Persistent URL (PURL)
ARK (Archival Resource Key).

He outlined three essential qualities for identifiers – that they be unique, globally registered, and locally resolved.

So why don’t we share?

Robin argued that it is easier for DAM vendors to build “safe” systems that lock all content within an enterprise environment, only those with a public service/archival remit tend to be collaborative and open. DAM vendors resist a federated approach online and prefer to use a one-to-one or directly intermediated transaction model. Federated identifier management services exist but vendors and customers don’t trust them. The problem is mainly social, not technological.

One of the problems is agreeing to share the costs of services, such as infrastructure, registration and validation, governance and development of the system, administration, and outreach and marketing.

(Efforts to standardise may well benefit the big players more than the small players and so there is a strong argument for them bearing the initial costs and offering support for smaller players to join. Once enough people opt in, the system gains critical mass and it becomes both easier to join and costs of joining become less of an unquantifiable risk – you can benefit from the experiences of others. The semantic web is currently attempting to acquire this “critical mass”. As marketers realise the potential of semantic web technology to make money, no doubt we will see an upsurge in interest. Facebook’s “like” button may well be heralding the advent of the ad-driven semantic web, which will probably drive uptake far faster than the worthy efforts of academics to improve the world by sharing research data!)

Tag Archives: identifiers

Identifiers for asset management

Content Identifiers for Digital Rights Persistence

Identity and identification

So why don’t we share?

Pages

Recently

Blogroll