Tuesday, 6 May 2008

XML in the service of crime

How's that for a headline? I hope it's caught your interest. Strictly speaking, the headline should be "XML in the service of reporting crime", or better still, "XML in the service of making historical accounts of crime available and searchable online", but that doesn't have the same ring to it!

So what on earth am I talking about? Well, two hundred and thirty-nine years worth of reports of proceedings at London's Central Criminal Court, familiarly known as the Old Bailey, have just been made available online in a relaunch of the Old Bailey Online web site.

The "Proceedings", covering the period from 1674 to 1913, started off as privately published journalism and gradually developed into quasi official records. They were discontinued quite abruptly when the law was changed to make it obligatory to have an official court reporter make a verbatim record of every trial.

According to Professor Richard Shoemaker of Sheffield University, the records are a "treasure trove of social, legal and family history....Now everyone from schoolchildren and amateur historians to scholars working in a range of academic disciplines can have easy access to this wealth of information."

I find historical documents like these quite fascinating, particularly when they refer to my home town. I am a very loyal Londoner, and I know the City quite well. I am also a bit of an amateur genealogist and I was eager to search the records to see if any of my forbears were ever "transported for life" or worse. Searching through the records is what brings us neatly back to XML.

The Old Bailey Online site includes considerable information about the project itself, including details of how the original documents were digitised, and how the texts were then marked up using XML tags so that information could be categorised. I am very pleased that this information has been included. Many people are quite dismissive of markup languages, and believe that the availability of full text-search has made markup obsolete. This project makes a fascinating example of why structured markup is useful and important.