Ever wanted to take the hassle out of re-typing a short quote, found on Google Books? Free OCR is a simple online OCR application that might help.

To test it, I gave it a very unpromising bit of text captured from Google Books using a standard screen-capture utility — slightly skewed, slightly fuzzy, in a non-standard typeface I’m willing to bet no-one has on their system, captured as a JPG at a mere 72 dpi, and just 500 pixels wide…

ocr-test

A few seconds after uploading, it gave me this…

ADVERTISEMENT.
Tms publication of the Works of Jomv KNOx, it is
supposed, will extend to F’ive Volumes. It was thought
advisable to commence the series with his History of
the Reformation in Scotland, as the work of greatest
importance. The next voliune will thus contain the
Third and Fourth Books, which continue the History to
the year 1564; at which period his historical labeurs
maybeconsideredtoterminate. ButtheFi&hBook,
forming a sequel to the History, and published under
his name in 1644, will also be included. His Letters
and Miscellaneous Writings will be arranged in the
subsequent volumes, as nearly as possible in chronolo-
gical order; each portion being introduced by a separate
notice, respecting the manuscript or printed copies from
which they have been taken.
It may perhaps be expected that a Life of the Author
should have been prefixed to this volume. The Life of
Knox., by Ds. M‘Cms, is however a work so universally
, known, and of so much historical value, as to supersede
l any attempt that might be made for a detailed bio-

Not perfect, but not bad for such a poor-quality capture. Stand-alone OCR software usually demands a much better quality source.

The popular screenshot software HyperSnap v6 promises to do the same with its TextSnap feature, but for some unknown reason this feature just doesn’t work with Google Books or the captured image above. I suspect it can only handle text that uses system fonts.

So until we get a neat free OCR Firefox addon (which is a direction I would urge the makers of Free OCR to go in) then screenshot – save image – upload image to Free OCR is a viable and speedy workflow for OCR-ing fair-use quotes found on Google Book Search or other places that only offer plain page-scans.

Oh, and don’t bother doing this for books that are already in the public domain — since last month Google provides the full-text of these for download, and also serves it up via Google Book Search Mobile.

   ** Update: If you have Microsoft Office 2007 or higher, then I find that the included Microsoft OneNote works just as well for OCR on low-res images such as the one above. It also works well on most PDFs that don’t allow copy/paste. See the comments to this post for details.


The Time Machine: a sequel
Buy my new sequel to The Time Machine, for the Amazon Kindle!