VOCABCONTROL

Monthly Archives: July 2012

culture | KO | libraries and museums | semantic web

New York Public Library and metadata

31st July, 2012  Fran  Start a conversation 
Estimated reading time 2–2 minutes

I spent a wonderful afternoon at the New York Public Library on July 20th, thanks to Phil Sutton, reference librarian, who was kind enough to talk to me about his work and introduce me to several of his colleagues in the NYPL Labs, website, and local history teams.

As the Library holds such vast and diverse collections, it is not surprising that the metadata work of the Labs team is varied and wide ranging. One project involves rationalising and mapping metadata across collections that use different standards, another involves creating metadata for content strategy and website navigation, while more experimental work includes looking to use Linked Data techniques to open up and cross reference data sets.

What’s on the Menu? is using crowd sourced help to transcribe the Library’s collection of restaurant menus. So far, they have completed 998,899 dishes transcribed from 14,872 menus, and are investigating ways of linking the data to enable researchers to make interesting connections. So far, the data is in a fairly raw form, but is available to access through an API.

The Labs team are also working on the Library’s numerous directories, with an emphasis on helping genealogists, starting with census data from 1940 in the DirectMe project.

Previous projects have opened up collections of stereographs and maps, as well as content related to musical theatre, theatrical lighting, and the Shelley-Godwin archive.

directories | libraries | linked_data | metadata
Top
Digital Asset Management | KO | search

Photo metadata conference

5th July, 2012  Fran  Start a conversation 
Estimated reading time 7–11 minutes

I was very grateful to Sarah Saunders of Electric Lane for inviting me to speak at the CEPIC Conference at the IPTC Congress in May.

These are just a few of my personal highlights from a very full conference.

Image content for mobile devices

Dittmar Frohmann, Director of International Product at iStock and Getty Images, the keynote speaker of the day, covered a lot of ground, but I was struck by his recognition of the need for new business models for photo libraries. As has happened to the book publishing and music industries, the photo industries are reeling from the shock of the transition to a digital world.

Professional photographers are finding it harder to manage rights and licensing of their images, as digital copies are now so cheap and easy to produce and distribute around the world, and at the same time images taken on ubiquitous mobiles phones have become fashionable. “Citizen photographers”, including those taking out-of-focus badly lit mobile phone photos, are producing huge numbers of images that often do not meet traditional professional standards. However, such images are seen as “authentic” and “intimate” and have become popular with consumers in an age of austerity where slick, aspirational hyper-reality and glamorous models (Photoshop handsome?) are increasingly failing to chime with ordinary people.

This means that “un-professional” images are actively being sought by advertising agencies. Photographic styles go in and out of fashion, but never before has it been so easy for “amateurs” to produce high resolution images. At the same time, image libraries find themselves faced with a deluge of digital files and have to manage these files to ensure they don’t inadvertently breach rights agreements, while trying to add value to their services.

For image libraries, rights management and search/retrieval have become the two hottest topics as the key areas where economies of scale can offer improvements over “DIY” online sales and marketing. Libraries are effectively aggregators, and therefore services providers – gathering independent collections and individual photographers in one place can provide a one-stop shop for purchasers. If this is combined with fast and easy rights and re-use clearing services, along with distribution, then the libraries can still provide a useful and profitable service to both the producers of content (the photographers) and the consumers.

(I was surprised that very little was said about an editorial role for image collections – another area that value can be added is through collection curation and branding. So, you know that the best place to get UK landscape shots is from such-and-such a collection, etc. However, this is much harder to maintain, manage, and promote.)

Image metadata

I gave an overview of the history of metadata for knowledge organisation, with an emphasis on aspects that are peculiar to image libraries. For example, still images do not come with text attached, so natural language processing and concept extraction techniques that can drive document and text-based search systems can only be a second step for image libraries, once some text has been generated to associate with stills.

I was very pleased that a couple of the key themes that I introduced in my talk were picked up and elaborated on by other presenters.

Linked Data and crowdsourcing

Mary Forster from Getty Images went into detail about Linked Data and how this is being used to enhance Getty’s services and image management, by using linked data concept URIs to index images. She explained the differences between text matching and concept linking, and how text matching is far more noisy and imprecise than concept linking, and how using concepts enables flexible management of metadata structures so that creation of complex associations can be automated.

Andrew Ellis from the My Paintings project with the Public Catalogue Foundation talked about how they had successfully managed crowdsourcing by putting in place a sophisticated number of ways of managing the capture of the metadata. For example, rather than only offering unconstrained free tagging, taggers were invited to select tags from a dictionary list, in order to disambiguate concepts. They were also invited to select from a number of pre-set facets driven by controlled vocabularies – image type, style, etc. This made it easy to integrate the free tagging within an existing navigational scheme.

Content-based image retrieval

Mathieu from Xerox then talked about content-based image retrieval. Xerox have been working on sophisticated image analysis techniques designed to find images that have similar qualities to other images. They have a series of algorithms that analyse image “texture” and create a “Digital fingerprint” of an image. Other images with very similar fingerprints tend to look similar. This means that you can train the system with sets of example images, and it can then identify similar images in the collection. This can be used as an image autoclassification tool, as you can set up your training sets to be useful categories (famous landmarks, pop stars, tigers, etc.) and then sort your images into these categories. Xerox trained their system’s 706 categories using 1.5 million images.

The system works very well with distinct and easily recognisable images – iconic images like the Sydney Opera house for example – and on large collections where there are clear and obvious “hits” and “misses”. It doesn’t work well with concepts such as politics or history, as it is hard to come up with key images for the training set, nor moods – inspirational, happy, tranquil, etc. However, for large collections with no metadata, it offers a good way of adding structured metadata to make a collection navigable. Another interesting use is to identify duplicate images, so you could use it to assess the contents of a collection to find gaps (“we have hundreds of images of Tower Bridge, but none of the Golden Gate bridge”, etc.).

Perhaps it even has a potential use for TV producers editing rushes on a shoot – “we already have hundreds of shots of the sunset over the mountains, but hardly any close-ups of skiers”, for example.

I guess one day there will be a market for “controlled imageries” – training sets of example images to use as basis for such autoclassification software.

You can try it here.

Rights, IPO, orphan works

Nancy Wolff and Antoinette Graves of the IPO talked about rights and the law. Nancy stated that the need to be found is becoming more critical. Orphan works legislation advocates in the US want to de-risk usage so that images can be used even when it is not clear who they belong to or the owner is known but cannot be found.
Nancy noted that proposals for rights registries are being enthusiastically supported by Google but also that whoever owns such registries will not only make a lot of money but will also control access to and usage of content.

Antoinette pointed out that in the UK at present there is no diligent search that will allow for the use of an “orphan work”. This makes it very hard for publishers to be sure that they will not be prosecuted. There is a notable difference between “old” orphan works in museums, etc. and “new” orphans caused by metadata stripping.

Future of image search and rights management

In the afternoon I attended an interesting breakout session on the future of search, with a large and impressive panel. Rights management was a cited as a huge issue to resolve, with a call for slick seamless user-friendly payment systems, to enable people to buy images and re-use them legally, without friction and effort. Technology was seen as the answer to an essentially technology-created problem. Free distribution over the internet meant that people had a sense of entitlement – a sense that content ought to be free, mistaking the differences between free content and freedom of information.

Managing digital rights is not the same as imposing “lockout” DRM systems. There is a need to devise licensing methods that are based on understanding machine-to-machine communication, rights description metadata, etc. No-one wants to invest in content creation any more, largely because the protection of rights is so difficult, making content creation a very risky business. If this trend is to be reversed, technological solutions to the problems of rights clearances must be found.

Predictions for the future were that crowd sourcing would become increasingly important. Interestingly crowd-sourcing relies on the notion of people working for nothing, and I couldn’t help noticing the contrast between the professional photographers trying to stop “amateurs” destroying their living by providing images without expecting payment, but being perfectly happy for people to add metadata without being paid for their work.

The need to get money into the system somewhere in order to enable anyone to get paid was emphasised and I suppose when an industry is facing diminishing returns, everybody involved in the supply chain puts pressure on everyone else to cut their costs or work for nothing.
I can’t help thinking that the deluge of images from all sources is going to mean that findability – and hence metadata – will become even more significant as more and more images chase fewer and fewer users willing to pay for them.

CBIR | images | metadata | photos | rights | tagging
Top
Archives
  • August 2018 (1)
  • October 2017 (1)
  • August 2017 (2)
  • April 2017 (1)
  • March 2017 (1)
  • February 2017 (1)
  • April 2016 (1)
  • February 2015 (1)
  • October 2014 (2)
  • May 2014 (1)
  • March 2014 (1)
  • February 2014 (1)
  • January 2014 (1)
  • December 2013 (1)
  • October 2013 (1)
  • September 2013 (2)
  • August 2013 (1)
  • July 2013 (1)
  • June 2013 (2)
  • May 2013 (1)
  • April 2013 (1)
  • March 2013 (1)
  • January 2013 (1)
  • December 2012 (1)
  • October 2012 (2)
  • September 2012 (3)
  • August 2012 (4)
  • July 2012 (2)
  • June 2012 (3)
  • May 2012 (2)
  • April 2012 (2)
  • March 2012 (3)
  • February 2012 (1)
  • January 2012 (1)
  • December 2011 (1)
  • November 2011 (2)
  • October 2011 (3)
  • September 2011 (7)
  • August 2011 (2)
  • July 2011 (2)
  • June 2011 (5)
  • May 2011 (1)
  • April 2011 (1)
  • March 2011 (1)
  • February 2011 (1)
  • January 2011 (5)
  • December 2010 (2)
  • November 2010 (4)
  • October 2010 (2)
  • September 2010 (3)
  • August 2010 (1)
  • July 2010 (1)
  • June 2010 (1)
  • May 2010 (1)
  • April 2010 (2)
  • March 2010 (1)
  • February 2010 (1)
  • January 2010 (2)
  • December 2009 (2)
  • November 2009 (3)
  • October 2009 (3)
  • August 2009 (4)
  • July 2009 (6)
  • June 2009 (6)
  • May 2009 (7)
  • April 2009 (7)
  • March 2009 (8)
  • February 2009 (6)
  • January 2009 (5)
  • December 2008 (6)
  • November 2008 (9)
  • October 2008 (10)
  • September 2008 (11)
  • August 2008 (8)
  • July 2008 (10)
  • June 2008 (9)
  • May 2008 (4)
  • April 2008 (4)
  • March 2008 (6)
  • February 2008 (5)
  • January 2008 (7)
  • December 2007 (2)
  • November 2007 (10)
  • October 2007 (2)
Pages
  • 10th ISKO International Conference
  • About me
  • Essentials Conference
  • Ist ISKO UK Conference
  • Talks online
Recently
  • Get your Instant News in The Daily Snap
  • AI – a real revolution, or just more toys for the boys?
  • Interlinguae and zero-shot translation
  • The Accidental Data Scientist
  • Data as a liquid asset and the AI future
Blogroll
  • Above and beyond KM
  • Boxes and arrows
  • Green Chameleon
  • ia play
  • ISKO UK
  • John Battelle's searchblog
  • Karen Blakeman’s Blog
  • KOnnect
  • Lorcan Dempsey - librarian
  • Making Knowledge Work
  • Not otherwise categorized…
  • Peter Morville’s blog
  • Rachel Lovinger - (not very up to date, but lots of useful resources)