Paperwork, a new addition to my A quick guide to desktop search software post…
Paperwork, free open-source software to help a scholar get to grips with their PDF pile, without hooking into some online service that wants to gouge your data — it OCR’s all your PDFs and other documents and then searches across them quickly.
You need a big chunk of spare disk space, it seems, because if you have 25Gb of stuffed-full folders, Paperwork will want to copy all of them over to its own C:\Users\YOURNAME\papers\ folder to OCR them. That makes sense, I guess, so you keep a copy of the original non OCR’d file. But at the cost of using significant disk space for duplicated files.
It comes with optional OCR interpreters for the world’s current languages, but so far as I can see it won’t do German ‘black letter’ (for which you need this).
Under “Settings” there is a “Send anonymous usage statistics” check-box, but this is turned off by default.
It looks good, but suffers from a non-standard Windows user interface which doesn’t appeal. But one could theoretically use it only as the software that watches your “Papers” folder and auto-OCRs any new PDF placed there (for which there seems no other free non-cloud competitor with a GUI). Then you’d point dtSearch at C:\Users\YOURNAME\papers\ for indexing, and use the powerful dtSearch interface for your actual searches.