• Directory
  • FAQ: about JURN
  • Group tests
  • Guide to academic search
  • JURN’s donationware
  • openEco: nature titles indexed

News from JURN

~ search tool for open access content

News from JURN

Monthly Archives: March 2020

How to extract hardcoded subtitles from an old video

29 Sunday Mar 2020

Posted by David Haden in JURN tips and tricks

≈ Leave a comment

VideoSubFinder is Windows freeware to auto-detect and extract hard-coded subtitles from videos, saving the results to a series of screen grabs — containing only the subtitle lettering at large size and thus ready for OCR. VideoSubFinder appears to be the best option for occasional use by media archivists, and also publishers and editors who want to extract to text.

It’s been tested by me and is working nicely ‘out of the box’ on an old 17 minute video. It does not appear to have native dependencies other than requiring the Microsoft Visual C++ Redistributable for Visual Studio 2017, which most Windows users will already have installed. Its output does however require Finereader or similar for OCR processing (see below).

The use-case here is: you have an old interview where where the audio is degraded and/or the speaker has heavily accented English, or where the subtitles are translations, which means you can’t just upload it to YouTube and have closed captions automatically generated in a twinkling by eager Googlebots. But you do have good hardcoded English subtitles on the video frames, which someone spent time creating — perhaps decades ago.

Using the software is tricky, despite the simple interface, as there’s no Help. My noted workflow is as follows…

1. Open your video.

2. Scrub the video’s timeline to the desired starting frame. Then on the top menu: Edit | set Beginning Time.

3. Drag down the little sliders (they look like black fly-specks and are easily overlooked) seen in the corners of the video, so as to precisely frame the area where the subtitle line appears.

4. In the lower panel, switch to the OCR tab and press “Create cleared TXT images”. Subtitles should be extracted from the video frames as ‘lettering only’. This should take a while, but less time that actually playing the video. Now might be a good time for a coffee break.

5. Once this process has completed, you then open up the software’s TXTImages folder…

..\VideoSubFinder_4.30_x64\Release_x64\TXTImages

And inside there are a series of large .JPEG images containing the extracted text as large cleaned image-captures, all ready to be OCRd.

So far as I can tell there’s no built in OCR engine with VideoSubFinder, nor any way to plug one in. So now you switch to OCR software such as Finereader.

6. In Finereader, sort the files correctly and then open all the files (Ctrl+A) found in the ..\TXTImages folder. There is no need to resize as Finereader can handle humongous file sizes, unlike the full Adobe Acrobat. Processing should be straightforward and fast, just let it finish. Then save the results out to a single .TXT file and edit.

Apparently, for making new .SRT subtitles, one can then also use this Finereader output file with the “Create Sub From TXT Results” button in VideoSubFinder, and the result should be a timecoded set of subtitles. But for the purposes of an archivist or editor extracting a text interview, this step is not needed.

If you’re going to need to do this sort of thing often and you have a generous boss, then Microsoft Video Indexer is likely to be your friend.

UserScript ‘Google search in several columns’ – temporary fix

26 Thursday Mar 2020

Posted by David Haden in JURN tips and tricks, JURN's Google watch

≈ 1 Comment

This is an update to my January 2020 Google Search in three columns: how to do it in 2020 tutorial post. It’s needed because the key UserScript Google search in several columns has stopped working, due to changes in the Google page code. Even with this script installed, Google Search reverts to a long scrolling page of links, a format highly unsuited to searchers who use a widescreen desktop monitor.

For the time being, the fix is to keep on running this script, but also run these two at the same time…

* Stylus UserStyle Google – show search result in two columns and hack the script to show “3” columns.

* Stylus UserStyle Google Search in columns with “3” columns set on install.

On a widescreen monitor, a manual fix the top of the Stylus UserStyle ‘Google Search in columns’ also helps with overlap between results…


/* columns */

.big .mw,
.s {
max-width: unset !important;

to…


/* columns */

.big .mw,
.s {
max-width: 80% !important;

The result gives imperfect but reasonably acceptable three-column display for Google Search and Books results…

‘Google – show search result in two columns’ will need to be temporarily turned off for Google News results.

Note that I have the UserScript Google search in several columns set not to work on Google Books, having added a couple of lines to the script. See my linked post for instructions on how to add that blocking.

See my full Google Search in three columns: how to do it in 2020 tutorial for details of how to bock other page elements, such as huge ‘video suggestions’ blocks and cover thumbnails for Google Books results.

At the Opera

22 Sunday Mar 2020

Posted by David Haden in Spotted in the news

≈ Leave a comment

I’ve only just noticed that the Web browser Opera version 67.x has enabled the security feature DNS-over-HTTPS. It’s found down at the bottom of: Settings | Advanced | Security, under ‘System’.

It’s not enabled by default, as it now is in the Firefox browser (for U.S. users only, last time I heard). In Opera you can use Google DNS or Cloudflare, or plug in one of your own. I’m in the UK and it seems to work fine with Google DNS, and doesn’t appear to be limited to U.S. users.

It doesn’t however enable you to visit those annoying U.S. local and regional newspapers that shut out all non-U.S. traffic, and to get past such blocks you’ll still need to turn on a reliable free VPN and pretend to be in the USA. Luckily, Opera has one of those built in, too.

Journal of British Studies, 1961-2015

21 Saturday Mar 2020

Posted by David Haden in Spotted in the news

≈ Leave a comment

Archive.org is starting to load entire runs of scholarly journals, mostly under their Public Library-like “Borrow” arrangement. New today is Journal of British Studies, 1961-2015, journal of the North American Conference on British Studies.

Vogue Italia for free

20 Friday Mar 2020

Posted by David Haden in Spotted in the news

≈ Leave a comment

Vogue Italia magazine “has opened its digital archive of every issue from 1964 to the present”, free and public…

From March the 17th, readers can also access the Vogue Italia archive completely free of charge [after an email sign-up]. Vogue Archive is a digital fashion archive, inaugurated in 2013 to mark the fiftieth anniversary of Vogue Italia. A valuable repository which encompasses the entire history of the magazine [from 1964]. Features, photography, articles, advertising campaigns and much more besides. All meticulously cataloged and easy to consult thanks to the most advanced search technology.

A great resource for everyone from fashion historians to magazine designers looking for layout inspiration. Note that it’s always been the least self-censored and most arty version of Vogue, and as such will not be ‘safe for work’ viewing in some workplaces. Also, it doesn’t appear that the sister Vogue titles published in Italian are included, just the main Vogue Italia.

Added to JURN

14 Saturday Mar 2020

Posted by David Haden in New titles added to JURN

≈ Leave a comment

Bulletin of the Institute of Classical Studies (currently free, seemingly back to 1954 — possibly only free for a limited period?)

MUSE (Museum of Art and Archaeology, University of Missouri).

Byzantine Review, The

Teiresias Supplements Online

Neurobiology of Language (MIT)

Research in Generative Grammar (not yet full indexed by Google)


Brazilian Journal of Natural Sciences

Manter : Journal of Parasite Biodiversity

Global leaders ask publishers to make “all COVID-19 research … immediately available to the public”

14 Saturday Mar 2020

Posted by David Haden in How to improve academic search, Spotted in the news

≈ Leave a comment

Issued yesterday from President Trump’s office, but so far unreported in the virus news I’ve seen…

“The U.S. Coronavirus Task Force leader, Dr. Kelvin Droegemeier, and government science leaders including science ministers and chief science advisors from Australia, Brazil, Canada, Germany, India, Italy, Japan, the Republic of Korea, New Zealand, Singapore, and the United Kingdom are asking publishers to make all COVID-19-related research and data immediately available to the public. … Science leaders requested that existing and new articles be made available in machine-readable format to allow full text and data mining with rights accorded for research re-use and secondary analysis.”

UK sales-tax to be removed from digital academic journals

11 Wednesday Mar 2020

Posted by David Haden in Spotted in the news

≈ Leave a comment

Announced in our Spring Budget speech today, the UK’s Spring budget speech in Parliament, good news for authors and publishers…

From 1st December 2020 [UK] ebooks, newspapers, magazines or academic journals will have no VAT to pay.”

VAT is the UK’s main UK sales tax, and printed publications are already exempt from the tax. At present it’s uncertain if digital audiobooks will also be exempt.

Gab Trends

05 Thursday Mar 2020

Posted by David Haden in Spotted in the news

≈ 1 Comment

Gab Trends is a new topical / news search-engine from Gab, probably best described as the ‘free-speech Twitter’. Trends doesn’t currently appear to require a sign-up. Commenting on news stories does however require signing up to the Gab’s sister-project Dissenter. Comment-counts presumably then show up on the Gab Trends search results, but not the comments themselves, which are quarantined on Dissenter. That’s probably just as well, since this is the free-speech Gab and it veers strongly toward the right-wing of politics. Though at present there doesn’t seem to be much speech of any kind going on there.

If testing Gab Trends you’ll probably want to block all images on results. In uBlock Origin that’s…

##*.column-image

Once that’s done, a broad test for keyword alarmism shows about what you’d expect…

Conservative news sites are prominent. The UK’s Daily Mail and The Telegraph newspapers, the USA’s Fox News, and I think ZeroHedge is some sort of libertarian/Bitcoin news site? Sites such as InfoWars and RT (‘official’ Russian news) will fail to pass the sniff-test for many.

A search for virus + UK showed a similar spectrum of results, but with the BBC, Reuters and Yahoo ranking highly. No weird conspiracy-theory pundits in the top results, so far as I could tell.

Without making a search Gab Trends becomes effectively an algorithmic newspaper, giving you the top items as they currently stand in Trends. In that form it veers very strongly toward the tabloid ‘crime and grime’ type of linkbait, and is not much use.

Search results for a keyword can be easily had as an RSS feed, seemingly without sign-up. So you might get something useful out of it in RSS, if you’re prepared to drill down for half an hour. Most RSS feed software can auto-delete a post if its URL contains a certain keyphrase, so you could probably remove unwanted sources that way.

There doesn’t appear to be a list of the news sources, and on my limited tests they feel quite limited at present. One would have expected to find robust conservative magazines like The Spectator, The Federalist, The Critic, Quillette, rather than questionable stuff like the InfoWars and RT sites, but I guess that’s perhaps because the focus is on ‘breaking news’ rather than on commentary. Yet the National Review is in there, which is the U.S. equivalent of The Spectator.

Overall, it’s possibly useful if you want an RSS feed to keep track of what restaurant Milo has been thrown out of this week. But at present it seems a worse choice for tracking news than Google News + a site-blocking script able to remove the news sources you don’t care to read.

Best desktop PDF reader for magazines at 2020

05 Thursday Mar 2020

Posted by David Haden in JURN tips and tricks

≈ 2 Comments

Several years ago I surveyed PDF reader software for desktops, with an eye to: 1) speed of opening, and 2) being a “magazine reader”.

There were only two free ad-free winners, The Windows Reader desktop app and Sumatra PDF. Sumatra won because, unlike MS Reader, you can turn off the “gutter” line for two-page magazine spreads. Being able to do that is a vital feature, for viewing magazines that run pictures across double-page spreads.

I took another look at the range of PDF readers, just now. Surprisingly, no-one has yet produced a dedicated elegant free “PDF magazine reader” for desktops, with a big idiot-proof one-click button for: “two-page spreads + cover-page, no gutter line”. Sumatra PDF is still the closest, with its Book view (Cover + Facing pages) which is found under Settings | Options | Book View. But the gutter line still has to be removed by fiddling in Advanced Settings, to manually change: PageSpacing = 4 4 to PageSpacing = 0 0

I tried a few other free desktop readers, to see if anything had changed and there were any new contenders. I ended up trying the following…

* PDF-XChange Viewer. Painfully slow to render pages, uninstalled. Apparently the whole of CERN is forced to use this, for security. Secure it may be, but fast it is not.

* PDF Architect. The interface looks slick, like MS Office. It’s still available free, but has been superseded by a more advanced paid version. It was a 13Mb download, then it needed to go online to get a “Startup module”. This download stuck at 1% and never completed. Killing the downloader process revealed it was 32-bit anyway, something that was also confirmed by further research. There doesn’t appear to be a standalone version.

* MuPDF, open source… but it’s what Sumatra is built on. Basically it’s Sumatra but without the advanced controls.

* Evince is also open source. Curiously it doesn’t feature on lists of the best Adobe Reader alternatives, or at Major Geeks (now the best freeware directory). Possibly this is because Evince is said to ignore DRM in PDFs, and/or because people think it’s not for Windows. Yet there is Evince for Windows. It’s somewhat fast, but sadly it has a fat ugly gutter on double-page spreads which can’t be removed. Nor can it handle PDFs from Microsoft Publisher, not being able to display semi-transparency correctly. Uninstalled.

* I assume that Adobe Reader is still bloated and also a major security risk.

Thus, as far as I can tell, the only real free / ad-free / nag-free, fast, 64-bit and reasonably secure option for magazines at the start of 2020 is still Sumatra PDF. One can of course send a PDF magazine to your tablet or megavision TV for leisurely sofa-and-chocs browsing, but desktop-based professionals often need a quicker desktop solution for flicking through PDF magazines.

There is a portable version that can run with its own settings file. This can be useful if you are an editor who needs a second installation with a tiny gutter line — to double-check for slight gutter-overlaps in the output PDF.

Subscribe: RSS News Feed.
I'm on Patreon!

JURN:

  • JURN : directory of ejournals
  • JURN : main search-engine
  • JURN : openEco directory
  • JURN : repository search

Related sites:

  • 4 Humanities
  • Academic Freedom Alliance
  • Accuracy in Academia
  • Alliance Defending Freedom
  • ALPSP
  • alt.academy
  • AMIR
  • Anterotesis
  • Arcadia project
  • Art Historicum (German)
  • AWOL
  • Beall's List (updated at 2018)
  • Beall’s List (old)
  • Beyond Search
  • Bibliographic wilderness
  • Booktwo
  • Campus Reform
  • Charleston Advisor
  • Coalition for Networked Information
  • Communia (public domain watchdog)
  • Cost of Knowledge
  • Council of Editors of Learned Journals
  • Dan Cohen
  • Digital Koans
  • Digital Shift
  • Dissernet (Russian anti-plagiarism)
  • DOAJ
  • Don't Block TOR
  • eFoundations
  • EIFL
  • Electronic Frontier Foundation
  • ELO
  • Embargo Watch
  • ePublishing Trust for Development
  • Facebook: Arab Open Access
  • Facebook: Italian Open Access
  • Facebook: Open Access India
  • Film Studies for Free
  • FIRE
  • Flaky Academic Conferences
  • Found History
  • Foundation for Individual Rights in Education
  • Free Speech Union (UK)
  • Google Algorithm
  • Heterodox Academy
  • Iconclass
  • IFLA Serials blog
  • ImpactStory
  • infoDocket
  • InTech Blog
  • Jinfo (formerly Free Pint)
  • Kindle blog
  • L'edition Electronique (French)
  • La Criee : periodiques (French)
  • Leader Statement Database on Free Speech
  • National Association of Scholars
  • National Coalition of Independent Scholars
  • Neil Beagrie
  • OA Lookup : Policies
  • OA Working Group
  • OASPA
  • Online Searcher
  • Open Access Bibliography
  • Open Access Week
  • Open and Shut?
  • Open Electronic Publishing
  • Open Folklore
  • Open Knowledge Maps
  • Open Library of Humanities
  • Periodiques en ligne (French)
  • Peter Murray Rust
  • PKP / OJS
  • Project Gutenberg
  • Publishing Archaeology
  • RBA Blog
  • Reclaim the Net
  • Research Information
  • Research Remix
  • Right to Research
  • River Valley TV
  • ROARS (Italian)
  • Scholarly Electronic Publishing
  • Scholarship Matters
  • Searchblox
  • Searcher
  • Serials Cataloger
  • Serials Review
  • Society of Young Publishers
  • Speech First
  • TaxoDiary (taxonomies news)
  • Taxpayer Access
  • Tentaclii
  • The Scholarly Kitchen
  • Thoughts from Carl Grant
  • Web Scale Discovery
  • Zotero blog

Some of the libraries linking to JURN

  • Boston College Libraries
  • Brooklyn Public Library, NY
  • Duke University
  • Kobe University, Japan
  • Rhode Island College
  • San Jose State University
  • UConn Stamford
  • University of California
  • University of Cambridge (Casimir Lewy Library)
  • University of Cambridge (main)
  • University of Canberra
  • University of Toronto
  • Washington University
  • West Virginia University

Spare BitCoins? Please send donations to JURN via: 17e2KGuyzjzEEE7BsoYTwMo3MtUod6DrjP

Archives

  • January 2023
  • December 2022
  • November 2022
  • October 2022
  • September 2022
  • August 2022
  • July 2022
  • June 2022
  • May 2022
  • April 2022
  • March 2022
  • February 2022
  • January 2022
  • December 2021
  • November 2021
  • October 2021
  • September 2021
  • August 2021
  • July 2021
  • June 2021
  • May 2021
  • April 2021
  • March 2021
  • February 2021
  • January 2021
  • December 2020
  • November 2020
  • October 2020
  • September 2020
  • August 2020
  • July 2020
  • June 2020
  • May 2020
  • April 2020
  • March 2020
  • February 2020
  • January 2020
  • December 2019
  • November 2019
  • October 2019
  • September 2019
  • August 2019
  • July 2019
  • June 2019
  • May 2019
  • April 2019
  • March 2019
  • February 2019
  • January 2019
  • December 2018
  • November 2018
  • October 2018
  • September 2018
  • August 2018
  • July 2018
  • June 2018
  • May 2018
  • April 2018
  • March 2018
  • February 2018
  • January 2018
  • December 2017
  • November 2017
  • October 2017
  • September 2017
  • August 2017
  • July 2017
  • June 2017
  • May 2017
  • April 2017
  • March 2017
  • February 2017
  • January 2017
  • December 2016
  • November 2016
  • October 2016
  • September 2016
  • August 2016
  • July 2016
  • June 2016
  • May 2016
  • April 2016
  • March 2016
  • February 2016
  • January 2016
  • December 2015
  • November 2015
  • October 2015
  • September 2015
  • August 2015
  • July 2015
  • June 2015
  • May 2015
  • April 2015
  • March 2015
  • February 2015
  • January 2015
  • December 2014
  • November 2014
  • October 2014
  • September 2014
  • August 2014
  • July 2014
  • June 2014
  • May 2014
  • April 2014
  • March 2014
  • February 2014
  • January 2014
  • December 2013
  • November 2013
  • October 2013
  • September 2013
  • August 2013
  • July 2013
  • June 2013
  • May 2013
  • April 2013
  • March 2013
  • February 2013
  • January 2013
  • December 2012
  • November 2012
  • October 2012
  • September 2012
  • August 2012
  • June 2012
  • May 2012
  • April 2012
  • March 2012
  • February 2012
  • January 2012
  • December 2011
  • November 2011
  • October 2011
  • September 2011
  • August 2011
  • June 2011
  • April 2011
  • March 2011
  • February 2011
  • January 2011
  • December 2010
  • November 2010
  • October 2010
  • September 2010
  • August 2010
  • July 2010
  • June 2010
  • May 2010
  • April 2010
  • March 2010
  • February 2010
  • January 2010
  • December 2009
  • November 2009
  • October 2009
  • September 2009
  • August 2009
  • July 2009
  • June 2009
  • May 2009
  • April 2009
  • March 2009
  • February 2009

Blog at WordPress.com.

  • Follow Following
    • News from JURN
    • Join 901 other followers
    • Already have a WordPress.com account? Log in now.
    • News from JURN
    • Customize
    • Follow Following
    • Sign up
    • Log in
    • Report this content
    • View site in Reader
    • Manage subscriptions
    • Collapse this bar
 

Loading Comments...