Keeping your Taxonomy Fresh and Relevant – SLA Chicago

    1 comment 
< 1 minute

Matt Johnson from EMC gave an extremely clear and useful presentation gave an overview of the taxonomy migration and revision project I have been working on for the past couple of years.

Matt and I were delighted to have such a big and lively audience for our session, especially as it was at 8 am! Thank you to everyone who joined us, to SLA’s Taxonomy division for organzing the session, to the session sponsor Gale Cengage Learning, and to Larry Lempert for moderating.

Skeptical Knowledge-Seeking: Business Research in the Age of ‘Truthiness’ – SLA Chicago

    Start a conversation 
Estimated reading time 2–4 minutes

Although I don’t work in business research at the moment, subjectivity/objectivity is one of my pet topics, so I enjoyed hearing about how “truthiness” is being affected by online publishing and social media.

[“Truthiness” is a term invented by US comedian Stephen Colbert and used in his political satire to refer to politiciins who seek to persuade us that something must be true because it “feels right” rather than because of the weight of evidence or rational argument to support it. ]

Beware the echo chamber

Cynthia Lesky of Threshold Information talked about the seductiveness of the “echo chamber” effect in persuading people to think that a report must be true because it is being circulated widely and cited repeatedly. The Internet has exacerbated this effect because automated online content aggregators will regurgitate content without any editorial control, so there is no differentiation between accurate and inaccurate reports. It is also very easy for PR “spin” and propaganda to be replicated via aggregators and social media sites very quickly and with little fact checking and scant opportunity for counter-arguments to be put forward.

She offered some very useful tips to avoid being duped. Firstly, the researcher should work out not what is important to them or what is the most significant point being made in an article, but what is most important to the client who has commissioned the research. This enables the researcher to target fact-checking efforts most effectively. So, for example, in a piece about the opening of a factory to sell a new product, depending on the nature of the clients’ business, some will care about the effects on the market for that product, some will care about the effect on property prices in the area near the factory, and others will care most about employment opportunities.

Understanding the ways statistics can be presented is also useful. Cynthia offered an example of a survey in which 20% of people felt that their age had been a problem for them in gaining promotion. The survey was reported in one publication as evidence of a terrible blight of ageism in the workplace, and by another publication as evidence that only a minority of older people felt that they had been affected by age discrimination while 15% of respondents saw their age as a positive advantage. Publications will do this to exploit “confirmation bias” amongst their readers. People enjoy reading something that confirms views that they already hold, so reflecting back to readers what they already believe is an easy way of pleasing an audience.

Informed intuition

The researcher should use their “informed intuition” as a “defence against spin and error”. Researchers should also not shy away from telling their clients about problems with the research, gaps, and areas where further work ought to be undertaken. By showing to the client the difficulties inherent in the work, researchers do not make themselves look unprofessional, they demonstrate to the client the value of their skills and why it is worth paying for trained and experienced researchers.

In other words, if you use your own sense of “truthiness” wisely and treat it carefully, it can work to your advantage rather than leading you up the garden path.

SLA Conference in Chicago

    Start a conversation 
Estimated reading time 3–5 minutes

Last month I had a wonderful time at the SLA (Special Libraries Association) conference in Chicago. I had never previously been to an SLA conference, even though there is a lively SLA Europe division. SLA is very keen to be seen as “not just for librarians” and the conference certainly spanned a vast range of information professions. The Taxonomy Division is thriving and there seem to be far more American than British taxonomists, which, although not surprising, was a pleasure as I don’t often find myself as one of a crowd! The conference has a plethora of receptions and social events, including the “legendary” IT division dance party.

There were well over 100 presentation sessions, as well as divisional meetings, panel discussions, and networking events that ranged from business breakfasts to tours of Chicago’s architectural sights. There was plenty of scope to avoid or embrace the wide range of issues and areas under discussion and I focused on taxonomies, Linked Data, image metadata, and then took a diversion into business research and propaganda.

I also thoroughly enjoyed the vendor demonstrations, especially the editorially curated and spam-free search engine Blekko, FastCase, and Law360 legal information vendors, and EOS library management systems.

My next posts will cover a few of the sessions I attended in more detail. Here’s the first:

Adding Value to Content through Linked Data

Joseph Busch of Taxonomy Strategies offered an overview of the world of Linked Data. The majority of Linked Data available in the “Linked Data Cloud” is US government data, with Life Sciences data in second place, which reflects the communities that are willing and able to make their data freely and publicly available. It is important to keep in mind the distinction between concept schemes – Dublin Core, FOAF, SKOS, which provide structures but no meanings – and semantic schemes – taxonomies, controlled vocabularies, ontologies, which provide meanings. Meanings are created through context and relationships, and many people assume that equivalence is simple and association is complex. However, establishing whether something is the “same” as something else is often far more difficult than simply asserting that two things are related to each other.

Many people also fail to use the full potential of their knowledge organization work. Vocabularies are tools that can be used to help solve problems by breaking down complex issues into key components, giving people ways of discussing ideas, and challenging perceptions.

The presentation by Joel Richard, web developer at the Smithsonian Libraries, focused on their botanic semantic project – digitizing and indexing Taxonomic Literature II. (I assume they have discussed taxonomies of taxonomy at some point!) This is a fifteen-volume guide to the literature of systemic botany published between 1753 and 1940. The International Association for Plant Taxonomy (IAPT) granted permission to the Smithsonian to release the work on the web under an open licence.

The books were scanned using OCR, which produced 99.97% accuracy, which sounds impressive but that actually means 5,000-12,000 errors – far too many for serious researchers. Errors in general text were less of a concern than errors in citations and other structured information, where – for example, mistaking an 8 for a 3 could be very misleading. After some cleanup work, the team next identified terms such as names and dates that could be parsed and tagged, and selected sets of pre-existing identifiers and vocabularies. They are continuing to look for ontologies that may be suitable for their data set. Other issues to think about are software and storage. They are using Drupal rather than a triplestore, but are concerned about scalability, so are trying to avoid creating billions of triples to manage.

Joel also outlined some of the benefits of using Linked Data, gave some examples of successful projects, and provided links to further resources.