How to extract hardcoded subtitles from an old video

VideoSubFinder is Windows freeware to auto-detect and extract hard-coded subtitles from videos, saving the results to a series of screen grabs — containing only the subtitle lettering at large size and thus ready for OCR. VideoSubFinder appears to be the best option for occasional use by media archivists, and also publishers and editors who want to extract to text.

It’s been tested by me and working nicely ‘out of the box’ on an old 17 minute video. It does not appear to have native dependencies other than requiring the Microsoft Visual C++ Redistributable for Visual Studio 2017, which most Windows users will already have installed. Its output does however require Finereader or similar for OCR processing (see below).

The use-case here is: you have an old interview where where the audio is degraded and/or the speaker has heavily accented English, or where the subtitles are translations, which means you can’t just upload it to YouTube and have closed captions automatically generated in a twinkling by eager Googlebots. But you do have good hardcoded English subtitles on the video frames, which someone spent time creating — perhaps decades ago.

Using the software is tricky, despite the simple interface, as there’s no Help. My noted workflow is as follows…

1. Open your video.

2. Scrub the video’s timeline to the desired starting frame. Then on the top menu: Edit | set Beginning Time.

3. Drag down the little sliders (they look like black fly-specks and are easily overlooked) seen in the corners of the video, so as to precisely frame the area where the subtitle line appears.

4. In the lower panel, switch to the OCR tab and press “Create cleared TXT images”. Subtitles should be extracted from the video frames as ‘lettering only’. This should take a while, but less time that actually playing the video. Now might be a good time for a coffee break.

5. Once this process has completed, you then open up the software’s TXTImages folder…


And inside there are a series of large .JPEG images containing the extracted text as large cleaned image-captures, all ready to be OCRd.

So far as I can tell there’s no built in OCR engine with VideoSubFinder, nor any way to plug one in. So now you switch to OCR software such as Finereader.

6. In Finereader, sort the files correctly and then open all the files (Ctrl+A) found in the ..\TXTImages folder. There is no need to resize as Finereader can handle humongous file sizes, unlike the full Adobe Acrobat. Processing should be straightforward and fast, just let it finish. Then save the results out to a single .TXT file and edit.

Apparently, for making new .SRT subtitles, one can then also use this Finereader output file with the “Create Sub From TXT Results” button in VideoSubFinder, and the result should be a timecoded set of subtitles. But for the purposes of an archivist or editor extracting a text interview, this step is not needed.

If you’re going to need to do this sort of thing often and you have a generous boss, then Microsoft Video Indexer is likely to be your friend.

UserScript ‘Google search in several columns’ – temporary fix

This is an update to my January 2020 Google Search in three columns: how to do it in 2020 tutorial post. It’s needed because the key UserScript Google search in several columns has stopped working, due to changes in the Google page code. Even with this script installed, Google Search reverts to a long scrolling page of links, a format highly unsuited to searchers who use a widescreen desktop monitor.

For the time being, the fix is to keeping on running this script, but also run these two at the same time

* Stylus UserStyle Google – show search result in two columns and hack the script to show “3” columns.

* Stylus UserStyle Google Search in columns with “3” columns set on install.

On a widescreen monitor, a manual fix the top of the Stylus UserStyle ‘Google Search in columns’ also helps with overlap between results…

/* columns */

.big .mw,
.s {
max-width: unset !important;


/* columns */

.big .mw,
.s {
max-width: 80% !important;

The result gives imperfect but reasonably acceptable three-column display for Google Search and Books results…

‘Google – show search result in two columns’ will need to be temporarily turned off for Google News results.

Note that I have the UserScript Google search in several columns set not to work on Google Books, having added a couple of lines to the script. See my linked post for instructions on how to add that blocking.

See my full Google Search in three columns: how to do it in 2020 tutorial for details of how to bock other page elements, such as huge ‘video suggestions’ blocks and cover thumbnails for Google Books results.

At the Opera

I’ve only just noticed that the Web browser Opera version 67.x has enabled the security feature DNS-over-HTTPS. It’s found down at the bottom of: Settings | Advanced | Security, under ‘System’.

It’s not enabled by default, as it now is in the Firefox browser (for U.S. users only, last time I heard). In Opera you can use Google DNS or Cloudflare, or plug in one of your own. I’m in the UK and it seems to work fine with Google DNS, and doesn’t appear to be limited to U.S. users.

It doesn’t however enable you to visit those annoying U.S. local and regional newspapers that shut out all non-U.S. traffic, and to get past such blocks you’ll still need to turn on a reliable free VPN and pretend to be in the USA. Luckily, Opera has one of those built in, too.

Vogue Italia for free

Vogue Italia magazine “has opened its digital archive of every issue from 1964 to the present”, free and public…

From March the 17th, readers can also access the Vogue Italia archive completely free of charge [after an email sign-up]. Vogue Archive is a digital fashion archive, inaugurated in 2013 to mark the fiftieth anniversary of Vogue Italia. A valuable repository which encompasses the entire history of the magazine [from 1964]. Features, photography, articles, advertising campaigns and much more besides. All meticulously cataloged and easy to consult thanks to the most advanced search technology.

A great resource for everyone from fashion historians to magazine designers looking for layout inspiration. Note that it’s always been the least self-censored and most arty version of Vogue, and as such will not be ‘safe for work’ viewing in some workplaces. Also, it doesn’t appear that the sister Vogue titles published in Italian are included, just the main Vogue Italia.

Added to JURN

Bulletin of the Institute of Classical Studies (currently free, seemingly back to 1954 — possibly only free for a limited period?)

MUSE (Museum of Art and Archaeology, University of Missouri).

Byzantine Review, The

Teiresias Supplements Online

Neurobiology of Language (MIT)

Research in Generative Grammar (not yet full indexed by Google)

Brazilian Journal of Natural Sciences

Manter : Journal of Parasite Biodiversity

Global leaders ask publishers to make “all COVID-19 research … immediately available to the public”

Issued yesterday from President Trump’s office, but so far unreported in the virus news I’ve seen…

“The U.S. Coronavirus Task Force leader, Dr. Kelvin Droegemeier, and government science leaders including science ministers and chief science advisors from Australia, Brazil, Canada, Germany, India, Italy, Japan, the Republic of Korea, New Zealand, Singapore, and the United Kingdom are asking publishers to make all COVID-19-related research and data immediately available to the public. … Science leaders requested that existing and new articles be made available in machine-readable format to allow full text and data mining with rights accorded for research re-use and secondary analysis.”

UK sales-tax to be removed from digital academic journals

Announced in our Spring Budget speech today, the UK’s Spring budget speech in Parliament, good news for authors and publishers…

From 1st December 2020 [UK] ebooks, newspapers, magazines or academic journals will have no VAT to pay.”

VAT is the UK’s main UK sales tax, and printed publications are already exempt from the tax. At present it’s uncertain if digital audiobooks will also be exempt.