21 March 2018

TNA tests handwriting recognition in PROB 11 will collection

A post on The (UK) National Archive's blog Machines reading the archive: handwritten text recognition software reports encouraging results on a pilot project assessing the feasibility of using the handwritten text recognition facility Transkribus on TNA's collection of PROB 11 wills.
Transkribus requires an original collection in a fairly uniform hand and a good sample of human transcribed material to train the recognition process. Training on roughly 37,000 words produced a transcription with a word error rate of 28% and character error rate of 14%.
Although no statistics are given the error rate for proper names (capitalized) appears to be much greater. If they occur a number of times in the document the chances of finding a name at least once is improved. This will be especially useful where the name of interest occurs not as the deceased, already indexed for PROB 11, but an executor, beneficiary or witness.

No comments: