This is the first of a series of summaries of the Henry Stewart DAM London conference on June 30, chaired by David Lipsey. The panels (one of which included me) were a pleasing mix of very practical information and more theoretical discussion.

Classic DAM vendor “overstatements”

Theresa Regli, who does a great job as a “professional sceptic” stressed the need for a calm and considered approach to procurement with the most important stage being the testing stage. You wouldn’t buy a car without taking it for a test drive, but people buy software without finding out if it can handle their content. Nobody’s assets and business processes are exactly the same, and just because a system suited somebody else perfectly doesn’t mean it is right for you. Vendors will say that they can do anything, but that’s their job so don’t take their word for it. Don’t be distracted by the coolest of the cool new features or other bells and whistles. Cool costs – but may not make – money for your business. On the one hand, if the cool features don’t actually improve your specific business processes, they won’t benefit you, and on the other, vendors have become increasingly adept at marketing the same old features in new ways, so it is very important to dig beneath the surface to find out how they are doing what they claim. Surprisingly little has changed technologically in the DAM vendor landscape over the last five years. So, a wonderful new system for automatically indexing images directly may in fact just be the familiar territory of analysing textual metadata associated with images.

Speech to text

One area that has moved on is the technology to convert speech to text. This means that you can, to an extent, subtitle a film automatically (which isn’t quite the same as a system that can “watch a movie and understand what’s going on scene by scene”). This then gives you a chunk of textual metadata you can search and analyse (“understanding” what’s going on relies on sentiment analysis – looking up words in thesauruses, so, for example, if the dialogue mentions guns, shooting, and bullets a lot, the software could suggest it is a gunfight scene). However, accuracy rates are patchy and the systems require training, which could be labour intensive, so you need to make sure those training costs and the time required are included in budgets and schedules. The systems work best if you can get everything read by someone like Patrick Stewart, as he has very clear and even enunciation. Anyone with an unusual accent or who mumbles is far more difficult to process. As usual, the software is easiest to train if you are working within a specific context, so you can focus on relevant words and accents, rather than anything anyone anywhere in the world might happen to say.

A clever use of the technology is by the car industry to save time analysing focus group interviews. They asked interviewers to “audio index” their interviews by saying a key “trigger” word when somebody in the focus group said something interesting. The technology was set to clip out a section of video a few seconds before and after the trigger word, so the interviewers could then automatically generate “edited” versions of the interviews, saving a lot of time. I can see this being a great tool for anyone processing ethnographic data or conducting UX or similar testing based on interviews.

Zooming in on the detail

Another feature Theresa demonstrated was a high definition zooming tool, so that you can see very fine detail in your digital images – lovely for museums and art galleries but costly in terms of storage space and bandwidth. I could see it working well as an in-gallery interactive guide to certain collections. It wouldn’t be so good if you were trying to access it externally from a dodgy wifi or bandwidth-limited connection.

(The British Museum’s Magnificent Maps collection – which I saw on a London IA visit – has an interesting interactive zoom feature that works entirely differently, but was very popular. It worked by using a “magnifying glass” – actually a device with some LED transmitters that send an infrared signal to a webcam to trigger a zoom response through a special display interface.)

Procurement process tips

The other panel members talked through various DAM system procurement processes, from a huge global project for Cambridge University Press that began with a list of 452 vendors, through to a very detailed process for adidas with a smaller initial list but a large number of criteria to be fulfilled. It was pleasing that the panel agreed that cultural fit can be as important as any technical specifications. A state of the art or very large vendor who just doesn’t get your world is very unlikely to provide you with a good solution, but a mid-range vendor who really understands your particular context is much more likely to find or develop something that matches your business processes.

Although the use of personas (popular in the UX world) in procurement is quite unusual, Theresa suggested that user stories could be more effective than requirements spreadsheets. Vendors are likely to tick all the boxes in the spreadsheet without getting to grips with the business processes behind them. It is also hard to explain complex interactions as sets of requirements, but telling a story can make it clear what the system as a whole should provide, e.g. Sue has to research images for marketing campaigns and make sure that editors based in offices around the globe can see them to approve them and designers need to be able to access them remotely and then they need to be output in a variety of formats for publication both in print and online.

It is also worth making sure that any arrangements with outsourced suppliers are checked. Sometimes vendors will provide case studies of a successful implementation but not mention that they have never worked with your supplier before.

I noted the emphasis panellists placed on making sure taxonomies and vocabularies are user-friendly and effective in order to get the best out of any DAM system.

Manage your metadata

Sarah Saunders of Electric Lane discussed the importance of controlled vocabularies and managed metadata for image search and management. Speech-to-text software can’t help with stills collections, or when part of your collection is video without accompanying audio (e.g. a rushes collection – the “spare” footage that wasn’t used in a broadcast and which often has no associated dialogue or voiceover script). She described advances in visual sorting software that use a combination of textual metadata and content-based image retrieval (CBIR) to refine search results. Although CBIR is still in its infancy, when running over a small image set pre-selected by text searching it can be very helpful. CBIR can identify basic features like the colour that is used the most in an image, not much help if you run it over a large image collection with no other metadata (i.e. “give me all the mainly red pictures” will bring up images of everything from fire engines to strawberries – fun if all you want is inspiration, not so good if you have something more specific in mind). However, if you have a set of images of the Eiffel Tower for example, it could distinguish between close-ups and shots with lots of blue sky. If you like the blue sky ones, you can click on one and ask for “more like this” and be offered other mainly blue sky ones.

The second panel will be the subject of my next post.