The New York Public Library (NYPL) and a data partner are combing through copyright registration and copyright renewal records, which in the U.S. are on a zillion scanned card-index files with tricky column-formatting and no OCR. The NYPL’s “early findings” currently estimate that, prior to 1964 in the U.S., only…

25 to 35 percent of books were renewed, while the rest were not.

Sounds good, if it leads to the legal liberation of around 65% of old U.S. books, for re-use and re-purposing. The NYPL’s project started with a 10,000-card pilot programme and XML output…

NYPL partnered with the technology firm Data Conversion Laboratory (DCL) [who] started by adding OCR to all the digital copyright registration files, then using algorithms to automatically structure and sort the data.

NYPL plans to make their XML open source for other libraries

And I’d imagine that many members of the public and non-university scholars would also find uses for it.