Category Archives: search

ISKO UK | Google Ups its Stakes

    Start a conversation 
Estimated reading time 1–2 minutes

ISKO UK’s KOnnect blog notes that at least Google is taking metadata seriously.

Chatting about Wolfram Alpha the other day, it was pointed out to me that specialist knowledge for a general audience is actually a very niche area, and this is the source of the hype. You need to persuade your VC funders you are revolutionary, when actually you have a very tricky business model. Serious researchers will be using specialised systems already and most people want to look up things like train times rather than atomic weights of elements, so your market is people like students and journalists, who have an intermediate level of interest. Perhaps there are enough of them in the world to generate plenty of advertising revenue, but it seems like a tough call.

I hope the funders are happy with the old reference publishing model – lots of investment up front, in the hope not that the finished product will generate huge initial profits, but will have a long steady life. Wolfram Alpha employed 150 people in essentially traditional content creation roles and it will be interesting to see how they get their money back. Google doesn’t have to pay for its own content or metadata creation!

BBC NEWS | Wolfram Alpha ‘as important as Google’

    1 comment 
Estimated reading time 2–2 minutes

BBC NEWS | Technology | Web tool ‘as important as Google’. Here’s a new search tool that will – apparently – be “like interacting with an expert, it will understand what you’re talking about, do the computation, and then present you with the results”. Dr Wolfram says: “Wolfram Alpha is like plugging into a vast electronic brain…It computes answers – it doesn’t merely look them up in a big database.”

It is clearly a very sophisticated search engine – I imagine it has a bit of natural language processing with some “mashup” algorithms – and all such developments are very exciting. I am sure it will be very, very useful in relevant contexts and will have lots of very productive applications. It just seems to me to be ironic that the experts who devote themselves to promoting knowledge and understanding are so bad at picking words to describe in a sensible way what they have achieved. Is it marketing departments gone mad? Are they all misquoted by mischievous journalists? I hope if I spoke about this to Dr Wolfram he would understand what I’m talking about…

UPDATE: There’s a New Scientist preview of Wolfram Alpha, which explains a bit more about how it works. As far as I can work out, they have built a big database and are promoting its “authoritativeness” – so back to the “quality information has been mediated by experts” model.

ANOTHER UPDATE: First impressions from BBC technology: Wolfram Alpha first impressions.

Karen Blakeman’s Blog » Blog Archive » Wolfram Alpha is out – hmmm…

Human-Machine Symbiosis for Data Interpretation

    4 comments 
Estimated reading time 2–4 minutes

I went to the ISKO event on Thursday. The speaker, Dave Snowden of Cognitive Edge was very entertaining. He has already blogged about the lecture himself.

He pointed out that humans are great at pattern recognition (“intuition is compressed experience”) and are great satisficers (computers are great at optimising), and that humans never read or remember the same word in quite the same way (has anyone told Autonomy this?). I suppose this is the accretion of personal context and experience affecting your own understanding of the word. I remember as a child forming very strong associations with names of people I liked or disliked – if I disliked the person, I thought the name itself was horrible. This is clearly a dangerous process (and one I hope I have grown out of!) but presumably is part of the way people end up with all sorts of irrational prejudices and also explains why “reclaiming” words like “queer” eventually works. If you keep imposing new contexts on a word, those contexts will come to dominate. This factors into taxonomy work, as it explains the intensity people feel about how things should be named, but they won’t all agree. It must also be connected to why language evolves (and how outdated taxonomies start to cause rather than solve problems – like Wittgenstein’s gods becoming devils).

Snowden also talked about the importance of recognising the weak signal, and has developed a research method based on analysing narratives, using a “light touch” categorisation (to preserve fuzzy boundaries) and allowing people to categorise their own stories. He then plots the points collected from the stories to show the “cultural landscape”. If this is done repeatedly, the “landscapes” can be compared to see if anything is changing. He stressed that his methodology required the selection of the right level of detail in the narratives collected, disintermediation (letting people speak in their own words and categorise in their own way within the constraints), and distributed cognition.

I particularly liked his point that when people self-index and self-title they tend to use words that don’t occur in the text, which is a serious problem for semantic analysis algorithms (although I would comment that third party human indexers/editors will use words not in the text too – “aboutness” is a big problem!). He was also very concerned that computer scientists are not taught to see computers as tools for supporting symbiosis with humans, but as black box systems that should operate autonomously. I completely agree – as is probably quite obvious from many of my previous blog posts – get the computers to do the heavy lifting to free up the humans to sort out the anomalies, make the intuitive leaps, and be creative.

UPDATE: Here’s an excellent post on this talk from Open Intelligence.

More on semantics

    Start a conversation 
< 1 minute

Here’s a straightforward mini-review of the state of semantic search from the Truevert search engine, dividing semantic search techniques into four groups, with no mention of digital essences or other mysticism.

Meaning – solved

    3 comments 
< 1 minute

Ever pondered what exactly something meant? Not sure if you’d missed some subtext subtlety or failed to grasp a nuance of technical term usage? Wonder no more. The merger of Autonomy and Interwoven will explain it all. According to Mike Lynch, founder and chief executive of Autonomy, in an interview published in Information World Review in March, the merger “will let Autonomy put its technology inside Interwoven products and make them capable of understanding meaning….meaning-based computing extracts the digital essence of information and understands the meaning of content and interactions.”

I really need to get my hands on the new product so it can explain to me what “the digital essence of information” means!

Now keyword search is dead…

    Start a conversation 
Estimated reading time 1–2 minutes

I can’t help thinking the information world has become very morbid. There was Green Chameleon’s Dead KM Walking debate, CMS Watch’s Taxonomies are dead punt, and now keyword search is dead, according to the Enterprise Search Center (via Taxonomy Watch).

Stephen Arnold says “Established system vendors and newcomers promise silver bullets that will kill the werewolves plaguing enterprise search. Taxonomies resonate in some vendors’ marketing spiels. Others focus on natural language processing… ” This makes taxonomies sound like they are some new fangled techie trick, rather than the traditional sorting out we’re all used to. He then states that users expect “a search system to … Offer a web page that gives users specific suggestions and options with hotlinks to topics, categories, and key subjects … provide the user with point­ and-click options … Allow the user to drill down or jump across topics.” Are those not taxonomies for navigation?

Truevert: What is semantic about semantic search?

    Start a conversation 
Estimated reading time 1–2 minutes

Truevert: What is semantic about semantic search? is an easy introduction to the thinking behind the Truevert semantic search engine. I was heartened by the references to Wittgenstein and the attention Truevert have paid to the work of linguists and philosophers. So much commercial search seems to have been driven by computer scientists with little interest in philosophy, or if they did they kept quiet about it (any counter examples out there?)! Perhaps philosophers have not been so good at promoting themselves either. Perhaps the Chomskyian attempt to divide linguistics itself into “hard scientific” linguistics and “fuzzy” linguistic disciplines like sociolinguistics has not helped.

As a believer in interdisciplinary and collaborative approaches, I have always wondered why we seemed to be so bad at building these bridges and information science has always struck me as a natural crossing point. Of course, there has been a lot of collaboration, but my impression is that academia has been rather better at this than the commercial world, with organisations like ISKO UK working hard to forge links. Herbert Roitblat at Truevert is obviously proud of their philosophical and linguistic awareness, and more interestingly, thinks it is worth broadcasting in a promotional blog post.

National Centre for Text Mining

    Start a conversation 
< 1 minute

The National Centre for Text Mining is “the first publicly-funded text mining centre in the world”. It is an initiative of Manchester and Liverpool universities, working with the University of California at Berkeley and the University of Tokyo. They appear to be working mainly on biology texts at the moment, but I enjoyed the explanations of their techniques and processes, despite the technicality. There are links to events and seminars that are aimed at the scientific community but some would probably be of interest to more general semantic web enthusiasts.