Tag Archives: technology

On being the only girl in the room

    1 comment 
Estimated reading time 3–5 minutes

Perhaps it is because I am settling into a new culture, or perhaps it is because my new time zone has altered the nature of what I see in my Twitter feed, but there seem to have been a spate of articles lately about sexism faced by women working in technology, which makes me very sad. This was on my mind when I received from a former colleague a copy of a report we had co-authored. As I read the list of names, I was struck by how wonderful a group of guys they were, how intelligent, creative, and technically knowledgeable, and what a pleasure it had been to be the only girl in the room. Those guys were utterly supportive, thoughtful, generous of spirit, and full of interest in and encouragement of my contributions.

I am from an editorial background and I don’t really write code, but never once in that group did I experience any kind of tech snobbery. Whenever there was something that I didn’t know about, or unfamiliar acronyms or jargon, someone would provide a clear explanation, without every being patronising, appearing bored or impatient, or making any assumptions about what anyone “ought” to know. I was never made to feel I had asked a stupid question, said something foolish, or that I did not belong. At the same time, these men were always keen and interested to hear my perspectives, and to learn from my experiences. The group dynamic was one of free and open exchange of ideas and of working collaboratively to find solutions to problems. All contributions were valued and everything was considered jointly and equally authored.

I didn’t remain the only girl in the room. I was learning so much in the meetings that I invited my (now former) colleague to join us, bringing a new set of expertise and skills that were welcomed. I had not a moment of concern about inviting a younger and even less technical female colleague in to the group, because I knew she would be made welcome and would have a fantastic opportunity to learn from some brilliant minds.

Of course I have encountered much sexism in my career, but it is not necessary and it is not inevitable. I hate the thought of young women being put off technology as a career because of fears of sexism and discrimination. I know this happens a lot – it happened to me, although I found my way into tech eventually. I do not know whether there is “more” sexism in technology – a charge some of the post I have read have levelled – than there is anywhere else, but I do know that there is sexism in all industries, so you might as well ignore it as a factor and choose a career based on aspects like intellectual stimulation or good career prospects. Technology certainly offers those. I personally have encountered sexism in so-called “female friendly” industries such as publishing and teaching, and I am quite sure it is suffered by nurses, waitresses, actresses, pop singers…. Since I have been working in technology, I have often been the only girl in the room but almost always that room has been a fascinating, welcoming, and inspiring place to be.

This is not primarily written for the specific individuals nor for all the other fantastic guys in tech I have met or worked with (there are so many I can’t possibly name them all), although I hope they enjoy it. This post is intended to promote positive male role models and examples of decent male behaviour for boys and young men to follow, and as a mythbuster for anyone who thinks sexism and geekiness are somehow intrinsically linked.

It is also written for women, as a reminder that although we must speak out against sexist and otherwise toxic behaviour when we encounter it, approval and affirmation are very powerful motivators of change, so we also help by shouting about and celebrating when we find fabulous guys in tech and in life who are getting it right.

Google is not perfect

Estimated reading time 6–10 minutes

Perhaps I am starting to suffer from “deformation professionelle”, but I am constantly surprised by how often I am still asked “Why do we need classification now we have free text search and Google?”. This post is designed to answer the question. If you are an info pro, it won’t tell you anything you don’t already know, but as always I’d appreciate suggestions and additions.

The question seems to me a bit like asking “Why do we need scalpels now we have invented scissors?”. Scissors are a brilliant invention and they do many wonderful things – just like Google – they make all sorts of cutting quick and easy, but there are also many situations when they are not the right tool for the job. I don’t want a surgeon cutting me open with scissors except in a real emergency.

Google is excellent when searching text for something specific and known – pdf of a tube map of London, “Ode to Autumn by John Keats”; documents that contain the phrase “small furry creatures from alpha centauri”. However, you may get poor results if you don’t spell all the words correctly (or they have not been spelled correctly in your source material) or you get the form of the words wrong (“The Tales of the Arabian Nights”; “The Tales of the Arabian Knights”; “1001 Arabian Nights”; “A Thousand and One Arabian Nights”; etc.). So in order to get good results, you already need to know quite a lot about what you are looking for.

Of course most people chuck in the first couple of words that occur to them and hope for the best. This works fine if you have plenty of time to wade through lots of irrelevant results, think up lots of alternative words if the first ones you tried didn’t work, are prepared to chase around to get to where you are trying to go (sometimes misspellings are linked to correct spellings), and are not particularly fussy about the source (if you just want a rough idea of what the main exports of Ecuador are to settle a pub bet, rather than the most up-to-date analysis to help you to decide whether or not to invest a large sum in a trading company). The sheer volume of information in Google means that almost every search throws up far more results than the casual searcher will need. They may not be the best results, but they’ll usually do.

It gets messier when the words you are searching on refer to a number of different things (do you mean Titanic the ship, the film, the song, etc.; “budget” and “Spain” as in the Spanish economy, not budget holidays in Spain). This sort of search can produce thousands, if not millions of irrelevant results, so classification that can provide disambiguation – sorting Spanish holiday pages from Spanish economy pages – has real value in terms of saved time. This is why enterprise search solutions – where employees’ wasted time is an expense to the company – offer classification as a fundamental aspect of the service. This is why dictionaries and encyclopedias make clear the difference between Mercury the metal, the Roman god, the planet, etc., depression in economics, meteorology, geography, psychiatry, etc., and is why Wikipedia’s disambiguation pages are so useful.

Imperfect prior knowledge
Google is not very helpful when you don’t know the exact title or an exact phrase in a document (was it Birmingham City Council’s guide to recycling, Birmingham Council guide to waste and recycling, West Midlands waste management policy…?) and practically no help at all when you only have circumstantial information relating to a subject area (what’s that story where they are captured by aliens and only get let out when they build a cage and catch a little animal in it to prove they are intelligent too? are their any laws about importing pet parrots from France? what was that sad music I heard on the radio last night?).

It is a laborious process of elimination to try different sets of search terms in Google, but a classification narrows the scope of your search so making it more likely you will find what you need (short stories >science fiction immediately means you are not searching the whole of literature, a set of documents under the heading EU>laws>animals>pets means you don’t have to wade through all EU agricultural law; radio>date of broadcast>soundtracks means you are not trawling through all the recorded music available on the Internet).

If you are researching an unfamiliar topic you probably don’t know the sort of words that are likely to have been used, so classifications are invaluable in showing you what other things are related to that topic, whether or not they use the only words or phrases you have previously encountered. Educational products have always used classification to aid knowledge discovery.

The words contained within the text may not give a full sense of what that text is about. If you are looking for a poem to read at a wedding, the best poems may never use the word “wedding” or “marriage” or even “love”. You’d be more likely to find a suitable poem using a classification poems>weddings. Synonym and thesaurus functions offer associated results as well as direct searching. Ontologies cluster vocabularies and taxonomies to create concept-based classifications.

Free text search on its own cannot provide the richness of suggestions that a classified system can offer. As far as I know, Google relies on source material to provide useful synonyms. (Incidentally I’ve found it remarkably tricky to find good references to how Google works via searching on Google…)

Complex queries
Google is also not helpful at answering complex queries (what is the fourth largest city in the EU by population? how many countries have majority Muslim populations?) that require combinations of sources. This is a gap spotted by “answer engines” such as True Knowledge and Wolfram Alpha, but both their systems depend on highly crafted classifications (taxonomies and ontologies). +Google Squared is Google’s own version.

Google is not a management system. Because of the vagaries described above, you can’t use Google to tell you how many documents you hold about a particular subject, or which document is the most authoritative or up to date, unless you have been very careful to add consistent metadata to each one. Even then, Google might miss the most up-to-date document because its Page Rank is mainly based on popularity, and popularity takes time to cultivate, especially in niche areas. This is why digital asset management systems have metadata functions that provide controlled and filtered searching.

Sound and vision
Google still is a bit patchy in still image, video, and audio search. Technologies are improving all the time, but we still have to be patient. Most still rely on text attached to images or captured from audio tracks, so all the problems already mentioned with free text searching apply. Companies such as imense are using an interesting range of options in generating keywords to tag images, but still use taxonomies for specialist terminology.

In short, Google is great when you know what you are looking for, when it’s not that important, and when you have plenty of time. In other words, for casual leisure searching. For any search that requires discovery and exploration, certainty, completeness, and precision, and when you want the right results quickly, you need classification.

The future of classification will be one of increasing automation, but that means the indexer or cataloguer’s job becomes more sophisticated and complex. Indexers of the future will be constructing rules for ontology and taxonomy building, training systems for specialised domains, and investigating errors in the automated systems. This may mark a change in the nature of traditional jobs, but it certainly does not mean the end of classification. Taxonomies have been around for millennia, they aren’t likely to disappear overnight.

The very fact that Google engineers are busily working on content analysis, language processing, and other new methods in order to increase the amount of classification Google can apply to its results (e.g. How can we improve our understanding of low level representations of images that goes beyond bag of words modeling?) shows that even the master of the free text search recognises more can be done.

Taxonomies and the semantic web

    Start a conversation 
< 1 minute

The Taxonomy Tango discusses a multi-taxonomy management method and tool invented by Mobile Content Networks. “People are just getting comfortable with their own taxonomies and now they are realizing the world is full of taxonomies”, MCN CTO Phyllis Reuther is quoted as saying.

“MCN Query Broker and Taxonomy Engine enables MobileSearch.net to make real-time queries to any number of relevant content sources, return results from those sources, and then group, sort, and rank results according to advanced algorithms and partner rules”, according to the MCN website.

It would be interesting to know more about the rules that power this. MCN provide some pretty diagrams, which say things like it classifies, it sorts, but only question marks where the “magic ” happens!