Having spent years working as an editor fussing over consistency of style and orthography, I shouldn’t have been as surprised as I was to find my tags on even this little blog site, written solely by me, had already become a mess. It didn’t take too long to tidy them up, but there are only a handful of articles here so far.

I worked with some extremely clever people in my first “proper” job back in the 90s, and we used to have a “90%” rule regarding algorithmic-based language processing (we mostly processed very well-structured text). However brilliant your program, you’d always have 10% of nonsense left over at the end that you needed to sort out by hand – mainly due to the vagaries of natural language and general human inconsistency. I’m no expert on natural language processing, but I get the impression that a lot of people still think 90% is really rather good. Certainly auto-classification software seems to run at a much lower success rate, even after manual training. It strikes me that there’s a parallel between folksonomies and this sort of software. Both process a lot of information on cheaply, so make possible processing on a scale that just couldn’t be done before, but you still need someone to tidy up around the edges if you want top quality.

I think the future of folksonomies depends on how this tidying-up process develops. There are various things happening to improve quality – like auto-complete predictive text. Google’s tag game is another approach, and ravelry.com use gentle human “shepherding” of taggers, personally suggesting tags and orthography (thanks to Elizabeth for pointing this one out to me).

I would really like to get hold of some percentages. If 75% is a decent showing for off-the peg auto-categorisation/classification software, and we could get up to 90% with bespoke algorithms processing structured text, what perecentages can you expect from a folksonomic approach?