Read aloud on Archive.org

New on Archive.org, a nicely revamped scrub/zoom line, now looking much smarter. It also has an additional icon to launch a read-aloud voice, which starts from the top of the current page, visually highlighting the block of text currently being read. The voice does a fairly good job, about as good as TTS gets at present without some dedicated voice-AI chip on your PC motherboard. There currently seems to be no option to choose your own TTS voice, though you can slow it or speed it up.

WebSatchel

Fancy creating your own personal Wayback Machine (Archive.org)? WebSatchel is a free add-on for Chrome-based Web browsers. When you bookmark in your Web browser… “it creates a full copy of the webpage” on the WebSatchel Cloud service.

Users only get 1Gb storage for free, though. A test-save to PDF, of a fairly regular front-page of a robust blog, weighed in at 2.5Mb. So, at a guess, the free 1Gb might let you save around 350 pages. Pages are indexed in full-text (“every word on a page”) and made keyword-searchable for the user. Making it perhaps useful for a fairly small time-limited research project. I don’t know how easy it might be to “delete and start over” on a new project, or if your 1Gb can only be filled up once.

An additional search option, for searching stable sites as part of a small time-limited project, would be to create a Google CSE. A CSE would give you a wider and perhaps more serendipitous “catch”, and give your search Google’s leading relevancy-ranking. However, so far as I’m aware, there is no one-click bookmarking-like way to add a new site into a Google CSE. Perhaps there should be.

Experimental Musical Instruments

Now on Archive.org, the complete run of the journal Experimental Musical Instruments, 1985-99. April 1989 has what appears to be only article ever written on making clay bells capable of good musical tones. You might have thought that there would be an entire cottage industry on making and understanding clay bells by now, among acoustic researchers, musicologists and crafts makers. But a search of Scholar and JURN, Google Books and Amazon, reveals absolutely nothing. “Ceramic bells” likewise. There’s a groundbreaking thesis topic there, if anyone wants it.

Some musings arising from a search for PDF translation services

A new $25 Translator integrates Google Translate into Adobe InDesign, Adobe’s PDF editing DTP software. An API key with Google Translate is needed, though.

Another recent third-party option for InDesign editing of PDFs is Translate from Id-Extras, which appears to be promisingly low-cost. It appears from my related searches that, rather surprisingly, that there’s no such thing as an Official Adobe PDF Translator plugin. You might have thought Adobe would have been onto that years ago, and made a small fortune for their shareholders off it. Nor has Microsoft slotted Bing Translator into Microsoft Publisher.

Spotting these made me wonder what was similar and available free for LibreOffice, the free Office suite. I find that “working in 2020” is the free PageTranslate. Install of plugin/extensions in LibreOffice is not manual, but done via Tools | Extension Manager. Once installed, this shows up under Tools | Page Translate…

Supported is inline ‘translate and replace’ of English to German, Spanish, French. It works fine in doing this, hooking into Google Translate and allowing both full document translation and translation of “selected and highlighted” text. No API key or other log-on is needed for Google Translate, though you can switch it over to other services that do require keys or logins or suchlike. Its provider settings are found under Options | Language…

Obviously then, if one could rig up a reliable way to convert a PDF to Word, and then translate in place (‘inline’), that would be a useful thing to have on a desktop PC. Especially for those that have slow Internet uplinks, and for whom sending a 80Mb PDF up to the Cloud for translation might take an hour. But I’ve yet to find a reliable freeware for the Windows desktop that offers “PDF to Word, and retain layout 100%”. LibreOffice’s Draw component claims to import PDF, but while it may be adequate for the layout of a plain academic journal it makes an utter hash of the layouts of magazines. This is the sort of layout I’m talking about…

You can see how fiddly it might be to individually copy-paste each block of text to Google Translate, and how easy it would then be to lose track of what bit came from which part of the page. The ideal here would be that some as-yet-unmade software would identify each block of text and its co-ordinates on the page, the text in each would be copied (by OCR if needed) and auto-translated, each text block would be erased then filled with its translated text.

So, PDF to Word… the best genuine conversion freeware I’ve found and tested so far is Nemo PDF to Word 4.0, which is a good try — but does not capture the layouts and font styling 100% on my test PDFs. Maybe 80%, and the remaining messiness may be largely due to font substitution. Which is a problem on my side, not on Nemo’s — my PCs simply lacks the snazzy fonts that the magazine designers were using for their PDF.

There are of course Cloud services and three or four bits of paid software that claim to auto-translate a PDF while retaining 100% layout fidelity, but they all appear to be Cloudy and limited unless you pay. Curiously, none of the ones I’ve looked at offer a few before-and-after “sample conversion” PDFs, by which to judge their wares. Various names include SYSTRAN PDF Translator ($279), Babylon Pro (subscription), Multilizer ($40?) and couple of others. Multilizer does have what is effectively a demo, though. These are at the consumer and small-business level, and I find they are not to be likened to the fiendishly complex pro-translator software suites such as MemoQ and Trados Studio, the latter being designed for translation professionals who have accounts with high-end machine-translation services to assist in their laborious daily work.

One interesting bit of desktop Windows freeware found was Lingoes, but judging by my tests it no longer works in terms of calling in Google Translate. Google tightened up on access a few years back, and it appears to have left several such software makers high and dry. I’d be interested to know if there are still ways to get Lingoes working in 2020, as it otherwise seems be a free alternative to the paid Babylon Pro. Possibly API keys are needed, even for Google Translate?

Finally, I also see that Foxit PDF has just introduced a “Translate PDFs into other languages” free service for those signed up to its Foxit Cloud (also free). No screenshots are included on the blog post, though, so I assume the translation probably “appears” in a sidebar rather than replacing the original text inline in the way that Project Naptha does it.

The free and still-working Project Naptha is exemplary in showing how inline “OCR, translate, erase to white space, paste in translation” should be done. But it can only do English to other languages. Give it a block of text in French, German or Italian and it’s kaput. If someone out there wants to be a major philanthropist to the world, getting Project Naptha able to work with text other than English would be a fine project to fund. The secret to that appears to be getting the free Tesseract OCR engine to work with text other than English.

Stickdemia.edu

It appears that, as of Summer 2020, only inbound links from Google Scholar can trigger a public PDF download from Academia.edu. Other public download attempts, if not logged in to the service, get a “404”. Readers may wish to update any link-lists accordingly.

Sumatra PDF upgrades its core ‘mupdf’ engine

Its been a long while since Sumatra PDF had an update. The last 3.1.2 was in 2016, and the new 3.2 is March 2020. Among other things the new Sumatra PDF has updated to a new faster version of its core engine…

“upgraded core PDF parsing rendering to latest version of mupdf. Faster, less bugs.”

One if its slight drawbacks has been its slowness to render some PDFs on opening. Some users will be familiar with the message given while waiting for the render. As such, an upgrade of the core underlying ‘mupdf’ engine seems worth the slight ‘faff’ of configuring Sumatra PDF once again for magazine reading. Config is required, since such settings are not persistent across version installs. Here’s how to make the required changes…

1. Under Menu | Settings | Options, set “Book View” and “Fit Page”, and while you’re there you may want to uncheck updates…

2. Restart the software by loading a magazine PDF. Check the View menu to make sure you are now in “Book View”.

3. The last step is to get rid of the gutter (i.e. the gap between pages, in double-page spreads). A gap is not desirable on magazines featuring art, nature, gardens etc as these will show double-page spreads which assume widescreen viewing rather than single-page tablet viewing. The gutter gap can be easily fixed by going to advanced settings…

And there changing 4 4 to 0 0…

Save, restart the software, and you’re done.

I see the software also has a manga reading mode, but I have not tested that yet. Manga is the Japanese name for comics. These are read in a particular way, and so presumably the setting is to run the PDF in that manner.

Partial fixes for Google News changes

The Google News page layout has updated, here in the UK. Here’s the latest on how to tame it…


1. Hide thumbnails and icons:

Add these lines to the foot of your uBlock Origin block list, save, reload…

These lines should hide your thumbnails and ID icon on Google News…

! Always autohide Google News thumbnails and ID icons - but retain source name
google.*##*.sYpfDb
google.*##*.QyR1Ze


2. Fix the colours and font size.

Headline text colour and font size is controlled via CSS thus…

/*** Fixes Google news headline colour and font size ***/
.nDgy9d.JheGif
{
color: #3d69ac!important;
font-size: 15px !important;
}

/*** Fixes Google news source-name colour and font size ***/
.WF4CUc.XTjFC
{
color: #4c7d48!important;
font-size: 13px !important;
}

/*** Highlights date on Google news result ***/
.WG9SHc
{
color: #e3732a!important;
font-size: 11px !important;
}

This can be added to the bottom of anything you have controlling the CSS for Google, e.g. the Stylus browser addon and a UserStyle.


3.

Block search suggestions as you type your search query.

! Block Search Suggestions on Google News
google.*##li.gsfs.sbsb_c