I enjoyed this article in New Scientist about using statistical analysis on the Voynich manuscript to try to work out whether it is a meaningful but secret code or just gibberish.
Ultimately, I remain puzzled as to what the statistics actually tell us. They identify patterns, but meaning is more than simply patterns. However, the fact that certain sets of symbols in the Voynich text appear to cluster in sections with common illustrations suggests it is code. The counter-argument that you could deliberately fake such clustering by mechanical means is intriguing. Without far larger samples, and an understanding of random clusterings, I have no idea whether this sort of faking would produce the same patterns as natural language. I am sure clusters must appear all over the place, without bearing any meaning whatsoever.
I also thought it was interesting that one of the arguments in favour of gibberish was that there were no mistakes. It strikes me there could be many reasons for the lack of proofing and correction and I would want to know more about the rate of correction in similar works before I could assess that argument. I know that standardization of spelling came relatively late, presumably before then far more “mistakes” would have been tolerated.
Nevertheless, a fascinating mystery and one that perhaps cannot be resolved by analysis but by coincidental discovery of the key (if it exists!) – if it is gibberish, perhaps we will never know. Either way, I am sure it would have amused the author to know that their work would still be a controversial topic hundreds of years after it was written!
Truevert: What is semantic about semantic search? is an easy introduction to the thinking behind the Truevert semantic search engine. I was heartened by the references to Wittgenstein and the attention Truevert have paid to the work of linguists and philosophers. So much commercial search seems to have been driven by computer scientists with little interest in philosophy, or if they did they kept quiet about it (any counter examples out there?)! Perhaps philosophers have not been so good at promoting themselves either. Perhaps the Chomskyian attempt to divide linguistics itself into “hard scientific” linguistics and “fuzzy” linguistic disciplines like sociolinguistics has not helped.
As a believer in interdisciplinary and collaborative approaches, I have always wondered why we seemed to be so bad at building these bridges and information science has always struck me as a natural crossing point. Of course, there has been a lot of collaboration, but my impression is that academia has been rather better at this than the commercial world, with organisations like ISKO UK working hard to forge links. Herbert Roitblat at Truevert is obviously proud of their philosophical and linguistic awareness, and more interestingly, thinks it is worth broadcasting in a promotional blog post.
Women, Fire, and Dangerous Things: what categories reveal about the mind by George Lakoff is a hefty tome and a core text in cognitive science. It is 587 pages long, so there are a lot of ideas in there and I am not going to do it justice in this little blog post! Basically, Lakoff starts by bringing together aspects of the work of philosophers such as Ludwig Wittgenstein and J.L. Austin, anthropologists, and psychologists – primarily Eleanor Rosch to show how the notion of meaning being rooted in context rather than in some external objective ideal has risen to prominence since the middle of the last century.
Most important for taxonomists is the work of Rosch, whose experiments in the way people form and understand categories shows that categories do not always conform to the “classical” or “folk theory” of categorisation. Since Aristotle, people have assumed that categories are made by noticing “real” properties of things and grouping things by matching those properties. Rosch showed that people actually form categories in various ways, sometimes by grouping matching properties, but sometimes by taking a “central example” and matching similar things that may not actually share any particular properties (e.g. a desk chair is a more typical kind of chair than a bean bag chair, and the two things don’t really have much in common except that we can see they are both sorts of chair). Other ways to form categories include metaphorical association (e.g. communication as liquid in channels) or by metonymy, where a part of something is taken to represent the whole thing (e.g. hands meaning workers).
The categories we choose are also rooted in our nature as physical beings – our colour categories are dependent on the structure of the eye, for example. We also tend to operate most naturally at an “intermediate” level of specificity – the level of the ordinary everyday objects we interact with – books, chairs, dogs, cats, etc – rather than the more abstract level – furniture, animals, etc – or the more specific – paperback novels, deckchairs, Dalmatians, Felix the cat. Children seem to learn these mid-level terms first, and my instinct is that as taxonomists it is typically the middle levels of granularity that are the most troublesome.
Lakoff uses such experimental evidence to argue against objectivism and in favour of “experiential realism” (or “experientialism”) – that our conceptual systems, including the way we form categories – come from our physical bodies and the social and physical environment we find ourselves experiencing. Truth, categories, knowledge, are not “out there” for us to perceive, but are generated from within our subjective experience. (This means that there is no “right” taxonomy for anything – there are only taxonomies that work in particular contexts.)
There’s more detail in this summary and in Donna Mauer’s presentation on the book.
It also has its detractors – this is one critique that I am still working my way through.
Language and Social Identity is a collection of fascinating sociolinguistic papers. Dealing with gender and ethnicity, the researchers seek to show how stereotypes often arise from simple linguistic misunderstandings. For example, one paper argues that speakers of Indian English tend to use pronouns, conjunctions, and intonation very differently to speakers of UK English. UK speakers typically fail to pick up on the Indian English speakers’ cues and assume that what they are saying is confused or incoherent. Conversely, Indian English speakers think the UK English speakers must be either daft or extremely patronising because of their apparent failure to understand very simple logic. Another paper claims that men and women typically use utterances like “mm hmm” to mean different things. Women mean simply “I’m listening”, whereas men mean emphatically “I agree”. Men then think that women keep changing their minds and women think men just aren’t listening!
The most relevant paper from a taxonomic point of view was one on the highly charged political nature of language use in Montreal. The need to cut across language differences and negotiate norms of communication when diverse groups feel they have something to lose through compromise mirrors the inter-departmental language mediation that usually needs to happen in taxonomy projects.
Lots of gems in Sociolinguistics: the study of speakers’ choices by Florian Coulmas (2005; Cambridge University Press). A serious introduction to the field aimed at students, with discussion points and references at the end of each chapter, with plenty of pointers to further study. I am interested in how language choice affects taxonomy, in labels and names, and what we perceive to be “natural” or “obvious” categories. Linguistics is a huge area of study, and even ignoring everything other than sociolinguistics still leaves an awful lot to take on board, so this clear and straightforward text was very helpful as a starting point. Coulmas says the “principal task of sociolinguistics is to uncover, describe, and interpret the socially motivated restrictions on linguistic choices” so I believe its findings must have some relevance to taxonomists, in that our main tool is language. I also think there are interesting parallels between what happens when governments try to define and impose language policies on people and when information managers try to impose “corporate language policies”. If they are welcomed and supported by the users they can bring great benefits, but can be disastrous if imposed dictatorially or when one group suffers at the expense of another.
A linguistic mapping experiment. This article from The Journal of Cognition and Culture describes how Olga Stepanova and John D. Coley devised two linguistics experiments to show that Russian and English terms for jealousy and envy are not equivalent. In English “jealous” covers both (broadly) being jealous of a relationship between other people and being “envious” of a quality or possession belonging to another person but Russians have two terms that are not interchangeable. English speakers were far more likely to rank descriptions of “jealousy” situations and “envy” situations as similar than Russian speakers were. Interestingly, Russians who had learned English were less likely to note a clear distinction between the two terms than the monolingual Russians, suggesting that learning English had introduced some conceptual “blurring”.
Conceptual mapping strikes me as a subtle but important issue for taxonomists. It is obvious that mapping in multilingual environoments can be problematic, but presumably the conceptual “blurring” that bilingual people experience can happen within information domains in a single language. In other words, just knowing that other people use a term to mean something different opens up broader categorisation possibilities. Trivially, if you don’t know something has an alternative meaning you will only indicate one place for it in a taxonomy, but conversely knowing the alternative adds a layer of complication to work through. It’s an issue that seems obvious from practical work, but I am always reassured to see experiments supporting apparent common sense.