• Directory
  • FAQ: about JURN
  • Group tests
  • Guide to academic search
  • JURN’s donationware
  • openEco: nature titles indexed

News from JURN

~ search tool for open access content

News from JURN

Monthly Archives: April 2019

Tutorial: simple web-scraping with freeware

30 Tuesday Apr 2019

Posted by David Haden in JURN tips and tricks, Regex

≈ 1 Comment

This tutorial and workflow is for those who want to do fairly light and basic web-scraping, of text with live links, using freeware.

You wish to copy blocks of text from the Web to your Windows clipboard, while also retaining hyperlinks as they occurred in the original text. This is an example of what current happens…

On the Web page:
This is the text to be copied.

What usually gets copied to the clipboard:
This is the text to be copied.

Instead, it would be useful to have this copied to the clipboard:


Why is this needed?

Possibly your target text is a large set of linked search-results, tables-of-contents from journals, or similar dynamic content which contains HTML-coded Web links among the plain text. Your browser’s ‘View Source’ option for the Web page shows you only HTML code that is essentially impenetrable and/or unfathomable spaghetti — while this code can be copied to the clipboard, it is effectively un-usable.

Some possible tools to do this:

I liked the idea and introductory videos of the WebHarvy ($99) Web browser. Basically this is a Chrome browser, but completely geared up for easy data extraction from data-driven Web pages. It also assumes that the average SEO worker needs things kept relatively simple and fast. It’s desktop software with (apparently) no cloud or subscription shackle, but it is somewhat expensive if used only for the small and rare tasks of the sort bloggers and Web cataloguers might want to do. Possibly it would be even more expensive if you needed to regularly buy blocks of proxies to use with it.

At the other end of the spectrum is the free little Copycat browser addon, but I just could not get it to work reliably in Opera, Vivaldi or Chrome. Sometimes it works, sometimes it only gets a few links, sometimes it fails completely. But, if all you need to occasionally capture text with five or six links in it then you might want to take a look. Copycat has the very useful ability to force absolute URL paths in the copied links.

I could find no Windows-native ‘clipboard extender’ that can do this, although Word can paste live ‘blue underlined’ links from the clipboard — so it should be technically possible to code a hypothetical ‘LinkPad’ that does the same but then converts to plain text with HTML coded links.

My selected free tool:

I eventually found something similar to Copycat, but it works. It’s Yoo’s free Copy Markup Markdown. This is a 2019 Chrome browser addon (also works in Opera and presumably in other Chrome-based browsers). I find it can reliably and instantly capture 100 search results to the clipboard in either HTML or Markdown with URLs in place. You may want to tick “allow access to search engine results”, if you plan to run it on major engines etc. Update: it can also just copy the entire page to the clipboard in markdown, no selection needed! It doesn’t however, copy the “Page Source” HTML, only the displayed DOM version of the HTML powering the page. These can be very different things.


Cleaning the results:

Unlike the Copycat addon, it seems the ‘Copy Markup Markdown’ addon can’t force absolute URL paths. Thus, the first thing to check on your clipboard is the link format. If it’s ../data/entry0001.html then you need to add back the absolute Web address. Any text editor like Notepad or Notepad++ can do this. In practice, this problem should only happen on a few sites.

You then need to filter the clipboard text, to retain only the lines you want. For instance…

Each data block looks like:

    Unwanted header text.
    This is [the hyperlinked] article title.
    Author name.
    [The hyperlinked] journal title, and date.
    Some extra unwanted text.
    Snippet.
    Oooh, look… social-media buttons! [Link] [Link] [Link] [Link]
    Even more unwanted text!

You want this snipped and cleaned to:

    Author name.
    [The hyperlinked] article title.
    [The hyperlinked] journal title, and date.

Notepad++ can do this cleaning, with a set of very complex ‘regex’ queries. But I just couldn’t get even a single one of these to work in any way, either in Replace, Search or Mark mode, with various Search Modes either enabled or disabled. The only one that worked was a really simple one — .*spam.* — which when used in Replace | Replace All, removed all lines containing the knockout keyword. Possibly this simple ‘regex’ could be extended to include more than one keyword.

The fallback, for mere mortals who are not Regex Gods, is a Notepad++ plugin and a script. This takes the opposite approach — marking only the lines you want to copy out, rather than deleting lines. The script is Scott Sumner’s new PythonScript BookmarkHitLineWithLinesBeforeAndAfter.py script. (my backup screenshot). This script hides certain useful but complex ‘regex’ commands, presenting them as a simple user-friendly panel.

This script does work… but does not work in the current version of Notepad++. Unfortunately the new Notepad++ developers have recently changed the plugin folder structure around, for no great reason that I can see, and in a way that breaks a whole host of former plugins or which make attempted installs of these confusing and frustrating when they fail to show up. The easiest thing to do is bypass all that tedious confusion and fiddly workarounds, and simply install the old 32-bit Notepad++ v5.9 alongside your shiny new 64-bit version. On install of the older 32-bit version, be sure to check ‘Do not use the Appdata folder’ for plugins. Then install the Notepad++ Python Script 1.0.6 32-bit plugin (which works with 5.9, tested), so that you can run scripts in v5.9. Then install Scott Sumner’s new PythonScript BookmarkHitLineWithLinesBeforeAndAfter.py script. Install it to C:\Program Files (x86)\Notepad++\plugins\PythonScript\scripts.

OK, that technical workaround diversion was all very tedious… but now that you have a working useful version of Notepad++ installed and set up, in Notepad++ line filtering is then a simple process.

First, in Notepad++…

    Search | Find | ‘Mark’ tab | Tick ‘Bookmark line’.

This adds a temporary placeholder mark alongside lines in the list that contain keyword X…

In the case of clipboard text with HTML links, you might want to bookmark lines of text containing ‘href‘. Or lines containing ‘Journal:‘ or ‘Author:‘. Marking these lines can be done cumulatively until you have all your needed lines bookmarked, ready to be auto-extracted into a new list.

Ah… but what if you also need to bookmark lines above and below the hyperlinks? Lines which are unique and have nothing to ‘grab onto’ in terms of keywords? Such as an article’s author name which has no author: marker? You almost certainly have captured such lines in the copy process, and thus the easiest way to mark these is with Scott Sumner’s new PythonScript (linked above). This has the advantage that you can also specify a set number of lines above/below the search hit, lines that also need to be marked. Once installed, Scott’s script is found under Scripts | Python Scripts | Scripts, and works very simply and like any other dialogue. Using it we can mark one line above href, and two lines below…

Once you have all your desired lines bookmarked, which should only take a minute, you can then extract these lines natively in Notepad++ via…

    Search | Bookmark | ‘Copy Bookmarked lines’ (to the Clipboard).

This whole process can potentially be encapsulated in a macro, if you’re going to be doing it a lot. Perhaps not necessarily with Notepad++’s own macros, which have problems with recording plugins, but perhaps with JitBit or a similar automator. The above has the great advantage that you don’t have to enter or see any regex commands. It all sounds fiendishly complicated, but once everything’s installed and running it’s a relatively simple and fast process.


Re-order and delete lines in data-blocks, in a list?

Scott Sumner’s script can’t skip a line and then mark a slightly later line. Thus the general capture process has likely grabbed some extra lines within the blocks, that you now want to delete. But there may be no keyword in them to the grab onto. For instance…

    [The hyperlinked] article title
    Random author name
    Gibbery wibble
    Random journal title, random date

The Gibbery wibble line in each data block needs to be deleted, and yet each instance of Gibbery wibble has different wording. In this case you need either: the freeware List Numberer (quick) to add extra data to enable you to then delete only certain lines; or my recent tutorial on how to use Excel to delete every nth line in a list of data-blocks (slower). The advantage of using Excel is that you can also use this method to re-sort lines within blocks in a long list, for instance to:

    Random author name
    [The hyperlinked] article title
    Random journal title, random date


Alternatives?:

Microsoft Word can, of course, happily retain embedded Web links when copy-pasting from the Web (hyperlinks are underlined in blue, and still clickable, a process familiar to many). But who wants to wrestle with that behemoth and then save to and comb through Microsoft’s bloated HTML output, just to copy a block of text while retaining its embedded links?

Notepad++ will allow you to ‘paste special’ | ‘paste HTML content’, it’s true. But even one simple link gets wrapped in 25 lines of billowing code, and there appears to be no way to tame this. Doing the same with a set of search engine results just gives you a solid wall of impenetrable gibberish.

There are also various ‘HTML table to CSV / Excel’ browser addons, but they require the data to be in an old-school table form on the Web page. Which search and similar dynamic results may not be.

There are plenty of plain ‘link grabber’ addons (LinkClump is probably the best, though slightly tricky to configure for the first time), but all they can grab is the link(s) and title. Not the link + surrounding plain-text lines of content.

There were a couple of ‘xpath based’ extractors (extract page parts based on HTML classes and tags), but in practice I found it’s almost impossible to grab and align page elements within highly complex code. Even with the help of ‘picker’ assistants. I also found an addon that would run regex on pages, Regex Scraper. But for repeating data it’s probably easier to take it to per-line Markdown then run a regex macro on it in Notepad++.

The free ‘data scraper’ SEO addons all look very dodgy to me, and I didn’t trust a single one of them (there are about ten likely-looking ones for Chrome), even when they didn’t try to grab a huge amount of access rights. I also prefer a solution that will still go on working on a desktop PC long after such companies vanish. Using a simple browser addon, Notepad++ and Excel fits that bill. If I had the cash and the regular need, I would look at the $99 WebHarvy (there’s a 14-day free trial). The only problem there seems to be that it would need to be run with proxies, whereas the above solution doesn’t as the content is simply grabbed via copy-paste.

How to remove every nth line in a list

30 Tuesday Apr 2019

Posted by David Haden in JURN tips and tricks, Regex

≈ 1 Comment

How to remove every nth line in a list. The list is made up of repeating four-line blocks of text.

The situation:

You have a long and mostly cleaned text list that looks like this…

Random article title
Random author name
Gibbery wibble
Random journal title

Random article title
Random author name
Gibbery wibble
Random journal title

Random article title
Random author name
Gibbery wibble
Random journal title

… and you of course wish to delete all the unwanted Gibbery wibble lines. All the Gibbery wibble text is different. Indeed, there’s no keyword or repeating element in each four-line data block for a search-replace operation to the grab onto. The only repeating element is the blank line that separates each data block of four lines.

So far as I can see, after very extensive searching, there’s as yet no way to deal with this in Notepad++, even with plugins and Python scripts.

The slower solution:

The more flexible but longer solution is Excel. However, the latest version of Notepad++ (not the older, 32-bit version) will let you quickly take the first and vital step. You first delete the blank lines with…

Edit | Line Operations | Remove Empty Line

It’s far easier to delete blanks in a long list in Notepad++, rather than wrestling with complex ten-step workflows in Excel, just to do such a simple thing.

Then you copy-paste the list into a new Excel sheet. You then add these two Excel macros and run the first. Both run fine in Excel 2007. The first splits the column into chunks of 4 (if you have three lines per block, change all the 4s in the macro to 3s, if six lines then change to 6s, and so on). Each chunk is placed into a new column on the same sheet.

You can then delete the offending Gibbery wibble row, which will run uniformly across the spreadsheet. In this example, it all runs across row 3.

The second macro is then run and this recombines all the columns back into a long list, and places the recombined list onto a new sheet.

The free ASAP Utilities for Excel can then ‘chunk’ this list back into blocks of four, enabling you to add a blank line between each block. Optionally, you can add the HTML tag for a horizontal rule.

The same core method can be used to re-sort the lines each block, or to add numbering to each line as: | 1. 2. 3. 4. | 1. 2. 3. 4. | These operations are something Notepad++ can’t yet do.

The quicker solution:

If you need a quicker option, and don’t need to re-sort the lines in each data block in Excel, then try the Windows freeware List Numberer. As you can see below, once you’ve used this utility to run a simple operation, then a regex search back in Notepad++ (.*line3.* — used in Replace | Replace All) will clear all the unwanted lines.

“The windy city is mighty purty / But they ain’t got what we got…”

28 Sunday Apr 2019

Posted by David Haden in Ooops!

≈ Leave a comment

The Knowledge UChicago repository for the University of Chicago appears to have recently changed its primary URL path for records.

Was: /handle/

All such URLs now give ‘404’ with no redirects, though are still present on Google Search for now.

Now appears to be: /record/

Added to JURN

28 Sunday Apr 2019

Posted by David Haden in New titles added to JURN

≈ Leave a comment

Verdi Forum : the journal for the American Institute for Verdi Studies

Heroism Science

Global Business Languages

Journal of Problem Solving, The

People and Animals


Journal of Aviation Technology & Engineering, The

Reports of the NASA Specialized Center for Advanced Life Support (ALS-NSCORT)

How to save a multipage Web-book, or full set of journal articles, to a single PDF file.

27 Saturday Apr 2019

Posted by David Haden in JURN tips and tricks, Regex

≈ Leave a comment

How to save a multipage Web-book or full set of journal articles to a single PDF file.

Situation: You sometimes encounter full books online, split up into perhaps 300 or more separate HTML Web pages, each containing a bit of text from the book. You wish to re-combine this chopped-up book into a single offline PDF or ebook file, with the bits assembled in the correct order. You might want to do the same with a large journal issue. You need some Windows freeware to solve this, and don’t wish to use cloud/upload services.

Solution: A free Chrome browser plugin, and a Windows freeware utility.

Test book: Minsky’s The Society of Mind from Aurellem, with nearly 300 HTML pages. These are all linked from the main front page, which shows a linked table-of-contents.

Clicking heck!


Step 1. Install Browsec’s Link Klipper – Extract all links browser addon (for Chrome based browsers, inc. Opera) or similar. Run it on your target Web page. Open the resulting plain-text list of extracted URL links, re-order these as needed, and then copy the list of the links you want to the Windows clipboard.

One problem you may encounter here is that the filenames may be obfuscated, as perhaps jj8er4477-j.html rather than Chapter-1.html. But it seems that Link Klipper follows the URLs down in in-HTML sequence, and thus presents them in a list in the same manner. Linkclump is a good alternative browser addon, for those who need precise and manual control of the URL capture, though it is probably a bit fiddly for the first-time user to get working in that manner.

Note that Link Klipper is meant for the SEO crowd, so it can also do Regex and can save to .CSV for sophisticated link-sorting with Microsoft Office Excel.

Step 2. The genuine Windows freeware Weeny Free HTML to PDF Converter 2.0 can then accept Link Klipper’s simple URL list. Just paste it in…

Weeny is very simple to get running and will then go fetch and save each URL in order, outputting a clean PDF for each (as if it had been saved from a good Web browser). There’s no option to select repeating parts of each page to omit, it saves all-or-nothing. It can’t process embedded videos and similar interactive/multimedia elements.

During the saving process Weeny may appear to freeze, showing ‘Not Responding’, if fed hundreds of HTML pages. However, an inspection of the output folder will show that PDFs continue to be converted and dropped into it one-by-one. Thus, even if Weeny seems to choke and crash on 300+ files, it hasn’t done so. Just let it run until it completes.

If the Link Klipper URL list was in the correct sequence, then a sort ‘By Date’ of the resulting PDF files should place the book parts in their correct order, even if the filenames were obfuscated.

We could have downloaded the pages as HTML, but in practice it’s not viable to then join them up. Inevitably, there’s some broken HTML tag somewhere in the combined file, and that causes problems in the text which start to cascade down. PDF is the more robust format.

Step 3. OK, so that’s fairly quickly and easily done. But, oh joy… you’ll now have nearly 300 PDF files, all very nice-looking… but separate! Weeny is sweet software, but not very powerful and thus it doesn’t also join the PDFs together.

If you have the full paid Adobe Acrobat (not the Reader) then you can combine these PDFs very easily (or ‘bind’ them, in Adobe-speak). Acrobat also offers the great benefit of file re-ordering by dragging.

You’re done, and the whole process should have taken ten minutes at most. If the font is not ideal for lengthy reading, the free Calibre can convert the PDF to .RTF or .DOCX for Word, HTML, and various eBook formats.


“But I don’t have the full Adobe Acrobat”:

For those who need freeware for this last step of combining the PDFs, you need to find one that offers a similar ‘re-ordering by dragging’ to Adobe Acrobat. Such freeware is not at all easy to find. Most such Windows utilities are old and use very clunky up/down buttons for re-ordering. That’s not so useful if you have a file number 298 that needs to be moved up to become file 1 — you’re only going to want to do that by dragging, not by clicking a button 297 times. Why might you need to re-order? Because with a big book, you almost certainly got the file order a little wrong, when glancing down and editing the initial URL list in Link Klipper.

Eventually, I found the right sort of free software to do the job. DocuFreezer 3.0 is free for non-commercial use, only adding a non-obtrusive watermark “Created by free version of Docufreezer”. It’s robust and good-looking 2019 software, and it needs Microsoft .NET Framework 4.0 or higher to run (which many Windows users already have).

DocuFreezer can re-sort the imported PDF files ‘By Date’, or by some slightly fiddly dragging (a feature which seems unique among such freeware). It can even OCR the resulting PDF. You just need to remember to tell it to combine and save as a single PDF, and to do the OCR…

It’s reasonably fast, if you don’t OCR. Removing the watermark, by getting the paid Commercial version, costs $50. Even so, Docufreezer’s free version is no problem if all you want is a personal offline PDF of a ebook for reading — the watermark is quite discreetly placed on the side edge of each page in plain black lettering…

You can also see here that the embedded video, from the original HTML page, was elegantly worked around by Weeny while retaining the page’s images and font styling.


To .CBZ format:

Theoretically one could use this process to then get .JPG files, to compile offline versions of webcomics like Stand Still. Stay Silent., and other primarily visual sequential content. If you have the full Adobe Acrobat then it’s easy to save out the PDFs as big page-image .JPGs in sequence, bundle them into a .ZIP, rename the .ZIP to .CBZ file… and you’re done.

Though you may then encounter a problem in the layout. Unlike mostly-text books, webcomics and other visuals may not fit well on a single portrait-oriented PDF page, without running over. In other words, if you need to scroll down the Web page to see the whole image, then your final PDF page-flow may not be ideal.

In practice, most PDF-to-JPG freeware utilities are not viable in this workflow. I found only a few, and they either contain third-party ‘toolbars’ or just don’t install on modern Windows, or the JPGs they produce are watermarked. They would also need to offer file re-sorting by-dragging, and robust batch processing, and a file mask to rename the .JPG files sequentially — because it’s important for a CBZ to have its filenaming of pages be properly sequential (0001.jpg, 0002.jpg). I’d welcome hearing of such a freeware, but I don’t think it currently exists.

The better option then is simply to read the material online. Or if you really need it offline, then use a free open source website ripper such as HTTrack Website Copier to make a mirror of the website and set to only save the .JPGs to your PC. This assumes that the website doesn’t have traffic surge control or anti-ripper measures in place. But you should really be supporting the comic maker and buying their paid Kindle ebook editions.


“Ooh, does the workflow work on open access journal TOCs?”:

Yes, indeed it does. Not all open/free journals also offer a single-PDF version of the issue (containing all their articles), especially those in a more magazine-like, trade journal, or blog-like format. In such a case, one can run the above quick workflow on the issue’s TOC page, thus quickly providing yourself with a per-issue portable offline single-PDF for your favourite journal in the garden or at the beach. You can then run it through the free Calibre to get it to various ebook formats such as .MOBI (Kindle ereader) and .ePUB.

For a journal issue where PDFs links are already present beside the HTML article links, but there are a great many PDFs, then the browser addon Linkclump is your best option to grab them all.

Clicking heck, that’s a lot of PDFs in an issue! And there’s no single-volume PDF.

You can set up LinkClump to select / open / download all the PDF links (this works even with repository and OJS redirects which use /cgi/), to grab the PDFs for joining with DocuFreezer or some other free desktop PDF joiner. This method is a lot easier than fiddling around with a bulk downloader browser-addon, and picking out the PDFs from a long jumbled list of files. Or you can have LinkClump grab a list of the HTML article URLs for processing to a PDF book, with the above Klipper – Weeny – DocuFreezer workflow.


“My super-mega-combo PDF is too big”:

If your resulting PDF is too large to Send to Kindle (Amazon has as a 50Mb per-file transfer limit, and many people also have very slow uplinks), then there are a couple of PDF shrinkers worth having, from the freeware but rather clunky Free PDF Compressor to the slick and easy $20 PDF Compressor V3 (I like and use the latter a lot).

The world’s remote coral reefs, mapped

26 Friday Apr 2019

Posted by David Haden in Spotted in the news

≈ Leave a comment

Newly published, a “High-resolution habitat and bathymetry maps for 65,000 sq. km of Earth’s remotest coral reefs”. It’s a new world-map of such coral reefs, in an interactive map where the data appears to be Attribution open access…

the Khaled bin Sultan Living Oceans Foundation embarked on a 10-yr survey of a broad selection of Earth’s remotest reef sites — the Global Reef Expedition. [producing a] meter-resolution seafloor habitat and bathymetry maps developed from DigitalGlobe satellite imagery and calibrated by field observations.”

“We are particularly grateful to our long-standing partnership with Dr. Sam Purkis’ remote sensing lab at the NOVA Southeastern University Oceanographic Center. From the satellite acquisition process, to ground-truthing field work, to creating the habitat maps and bathymetry products, Dr. Purkis’ lab is world-class. Additionally, this magnificent web application was created by an outstanding project management team from Geographic Information Services, Inc (GISi). GIS, Inc. was an absolute pleasure to work with on this exciting project.

Here’s my example zoom of the interactive map, down into the Red Sea…

The red dots in the top screenshot are the reefs, and the red dots in the last screenshot indicates the project’s video locations.

Added to JURN

25 Thursday Apr 2019

Posted by David Haden in New titles added to JURN

≈ Leave a comment

Critical Multilingualism Studies

Journal of Universal Language

Arv : Nordic Yearbook of Folklore. It’s now supposed to go OA six months after publication (“Open access: Articles printed in ARV will also be available six months after their publication”). But it looks to me like no-one told the publisher’s webmaster or delivered the post 2016 PDFs. The PDFs are there to 2016, but not the links. So here are the direct links to the PDFs, so as to keep them alive on Google (and thus JURN)…

2018 – appears to be missing.

2017 – appears to be missing.

Arv : Nordic Yearbook of Folklore, 2016

Arv : Nordic Yearbook of Folklore, 2015

Arv : Nordic Yearbook of Folklore, 2014

Arv : Nordic Yearbook of Folklore, 2013 [and a lone Mirror]


Plant Phenomics

Some perspective…

24 Wednesday Apr 2019

Posted by David Haden in Spotted in the news

≈ Leave a comment

“Perspectives on the open access discovery landscape”…

“An open question in the area of OA discovery is what proportion of the total academic literature is available in an open version. In this case, Crossref would be a solid starting point…”

“Under the assumption that a reliable and accurate database is available matching DOIs/titles/other queries to OA URLs, the computational effort to connect the former to the latter is relatively low.”

“In order to improve OA discovery, the scholarly communications community will have to focus on metadata. We expect that improvements in metadata on the side of institutional repositories and preprint servers would be the most effective to support OA discovery tools…”

How to resize a huge image without opening it

23 Tuesday Apr 2019

Posted by David Haden in JURN tips and tricks

≈ Leave a comment

This mini-tutorial may be useful for those working in museums and archives, or doing print work with big maps, who encounter image files so huge they can’t be opened.

Problem: You have been sent a huge image file, and you need to shrink and resize it. It’s very nice that the sendee only sent a 80Mb .JPG rather than their 8Gb original map sheet. But even their 28,000 pixel .JPG can’t be opened on your puny PC. Even Photoshop and IrfanView balk at loading/viewing it. You can’t use the free GigaPXTools (for re-sizing gigapixel images without opening them) because… it prefers .TIFs and can’t work with .JPG files.

Solution: the free IrafanView can shrink it, if used in batch mode.

Method: First close down everything on your PC that’s not essential and might be eating your PC’s memory. Make sure you have the latest 64-bit IrfanView.

1. Then place your target image, on its own, in a new folder called ‘batch’.

2. Open any small image in IrfanView, to launch the software.

3. Press ‘B’ to open Batch conversion. In “Look in:” navigate to your target folder. You may have to do this a few times until your target image appears as a preview thumbnail. Do not click on the preview thumbnail (as this will almost certainly crash IrfanView)! This step will at least show you that file is viable as an image, and not corrupted.

Now you can see why we needed the image on its own in its own folder — IrfanView has enough to do with just showing a preview of this file, let alone previewing any other images in the same folder at the same time. If even this step proves too much for IrfanView, try switching to file-name view only, and then re-starting the process.

4. Then tick “Use Advanced Options” and set “Resize” to 3000 pixels, or whatever reduced size you want. Click OK.

5. Set your target output folder. Without clicking on the Preview thumbnail, click the “Add All” button to add the target file to the list of files to be processed.

6. Click “Start Batch”. Then you go off and have a coffee, as it’s likely to take 20 minutes. A Windows message that the software is “Not responding” is normal. Eventually there will be a message from IrfanView that the batch process has completed.

Lightkey – the free version tested

21 Sunday Apr 2019

Posted by David Haden in JURN tips and tricks, Spotted in the news

≈ Leave a comment

Lightkey – Free Basic Edition. This offers the LightkeyPad text editor with smooth predictive (autocomplete) text, and the editor learns rapidly as you type. The paid version works with Microsoft Office 2010 (and higher) and the Google Chrome Web browser (only a few online apps, and not WordPress or an offline text editor working inside a browser).

But the free version of Lightkey seems fine, albeit after a download and install that seemed to take aeons. There’s also a direct download link for the .exe here, for those who want offline installs. Assume you’ll be spending a while on getting it down, and then installed and up and running. But once you finally get out the other side of that slough, and the profile-building, LightkeyPad turns out to be a pleasing simple text editor with fluid predictive auto-typing and some light-touch spelling/grammar correction.

(Note: If you installed a previous version, then uninstalled, a later re-install may hang. Running the Lightkey uninstaller as Admin should cure this).

Initially I thought it was not very British, as it wanted to offer Stanford for “Stoke-on-Trent”, but it can do the county name “Staffordshire” out of the box. And type “Stoke-on-Trent” a few times and it even gets the hang of that. As such, it’s not necessary to manually set up a personal configuration file of words. It seems the software will learn those as it goes along.

It can even cope with “Lovecraft” after you type his name a few times. “Cthulhu”, too. Four times seems to be the usual times you type a new word before the software “knows” it. It can’t correct “If can’t” to “It can’t”. Nor can it autoclose HTML tags, leaving you to add the URL and title in the middle.

If you find this freeware useful will rather depend on what sort of typist you are. Do you look down at the keyboard as you type with two fingers, or look at the screen while ‘typing blind’? That will also partly depend on when you type, as typing in low light is not so easy either way (unless perhaps you have a snazzy gamer keyboard where the letters glow-in-the-dark).

The user presses the Tab key on the keyboard to confirm a suggested word, and doing this rapidly becomes easy and reflexive.

There’s no post-typing spell-check or grammar checker to run over the entire finished text, and such things happen as you type. There is a word-count for the finished text, which is handy, but the final pass of spellchecking then needs to be done on pasting the text into Word or WordPress. I’m guessing there may be a one-click “send to Word” button in the Pro version of LightkeyPad.

There’s a ‘dark mode’ done in nice midnight-blues, that’s easily accessed via a big button. The icons on the top bar are neat and pleasing.

The software saves to .TXT or .RTF format.

What would improve it?

* The native text editor is painfully simple. It lacks even a simple search-replace, but I guess that would mess with the usefulness of the typing and word analysis.

* A “Select all” menu item is also curiously missing.

* It does not save its configuration. On launch it doesn’t remember the window height/width, or the chosen font from the previous session. (It’s possible to work around the font problem by saving a “template.RTF” with one’s chosen font set up, and then associating .RTF with Lightkey. This only works to fix the font problem, not the windows size). Lightkey also insists on saving to My Documents every time, instead of to the last save location. Update: these basic show-stopper problems still haven’t been fixed in Lightpad, in 2022. Very annoying!

* Loading time could be a touch quicker, as it’s not as instant as Notepad. But if you launch it at Windows Startup then it’s already open and is as quick to spring into action as Notepad is.

* I disliked its Taskbar ‘hidden icons’ panel icon, which — being slightly slanted — looks ugly and jaggy (because it’s not being anti-aliased). On closer inspection, however, this turns out to be the LightKey Control Center .exe, this having been launched as a silent Startup launch, and it’s not LightkeyPad. As such, simply Ctrl-Alt-Del to get to the Windows Startup services tab and from there disable the LightKey Control Center permanently. The LightkeyPad editor doesn’t appear to need it in order to work.

What about privacy? It does create a profile based on your “recent documents”, but there is an optional scan of these at the install point, and…

“the user’s documents and emails, along with their typing data, will NOT be sent or collected by Lightkey’s servers in any way, as they are the user’s private property.”

But some sort of unique cumulative hash from the “typing data” may be being sent, and as such you’ll want to install offline and then uncheck this item ASAP…

It also learns from your typing, so that unique word/keystroke record is presumably being stored somewhere on your PC. That’s potentially a valuable ‘fingerprint’, from which things like business secrets might be somehow reconstructed. In which case a firm won’t want such data slipping out of the user’s PC and being sent off to Whereizitagin. I’m not suggesting any malfeasance here on the part of the makers, but just that you need to be sure that such a local file — if it exists — is truly secure.

Overall LightKey is an interesting development in genuine freeware, and even out-of-the-box it’s not as annoying as you might think an auto-complete text editor would be. I loathe always-wrong Web search auto-complete as much as the next user, but this software does auto-complete and substitutions quite nicely and gets in the way as little as possible. As such I think I have a new first-draft composition Notepad replacement here, and will try it as a replacement for a few weeks.


Desktop alternatives?

* The free Notepad++ can do some Auto Completion natively, albeit via a method that’s not changed in over a decade and in a fiddly-looking way that’s aimed at coders. It can even do autoclose, with the help of several coder-focussed plugins. There’s also the Presage plugin for Notepad++, though that was last updated in 2015. There is what looks like a Presage Windows standalone, but it actually seems like it just runs a service at Windows startup that’s required by the Notepad++ plugin.

* If you really need the search-replace in similar free software, PredictEd 1.1 is open source freeware from 2018 and similar to Lightkey. But it’s very basic in design and far more clunky in its method.

* Predictive Tab Key Auto Complete for Chrome browsers. Again, free, and the only somewhat-recent one on the Chrome store. Not updated since 2015, and it’s about as annoying as you’d expect when composing a WordPress blog post. Very very fast, but not accurate and doesn’t appear to learn in any way I could discern. It would be great to see something like this that could be constrained only to a list of user-defined words and phrases (see Auto Text Expander below), but this doesn’t appear to be it.

* Windows? Well, Apparently Windows 10 has a “Show text suggestions as I type” setting in Windows, but I have no idea if it’s more than ‘Microsoft Clippy reborn’.

* In Word, one can set up custom autocorrects: select Word Options | ‘Proofing’ | ‘AutoCorrect Options’ | ‘Replace Text as you type’. #h can be set to autocorrect to href, for instance. The free LibreOffice apparently offers AutoCorrect that works with words longer than eight letters, and can be similarly customised. But who wants to launch either of these lumbering behemoths, just to do what should be done in Notepad?

* Paid? the $130 Typing Assistant 8.x works with any Windows software and Web browsers, does what is says, is 64-bit and developed. Which is presumably why it’s so expensive. The old 32-bit abandonware Smart Type Assistant for Windows is similar and is now free.


There are also paid ‘abbreviation managers/expanders’ like PhraseExpress and Breevy and FastFox, but they are: i) expensive and mostly way too complicated for such a simple task; and ii) rely on you being able to remember the abbreviation, or to remember to place a # in front of the start of a word, or which letter to stop at to trigger the expansion without overtyping. ShortKeys Lite is a rather ancient and very clunky free choice here, and there are a couple of equally ancient Notepad++ plugins whose makers used descriptions such as ‘snippet’ and ‘substitution’.

Similar Web browser addons are ProKeys, Auto Text Expander and Text Blaze (beta). Also the right-click Paste Email. The latter can paste any snippet from a right-click, not just an email address.

All these are useful for adding your signature, email addresses and boilerplate text, but in practice are not so useful for composing ‘in the flow’.

← Older posts
Subscribe: RSS News Feed.
I'm on Patreon!

JURN:

  • JURN : directory of ejournals
  • JURN : main search-engine
  • JURN : openEco directory
  • JURN : repository search

Related sites:

  • 4 Humanities
  • Academic Freedom Alliance
  • Accuracy in Academia
  • Alliance Defending Freedom
  • ALPSP
  • alt.academy
  • AMIR
  • Anterotesis
  • Arcadia project
  • Art Historicum (German)
  • AWOL
  • Beall's List (updated at 2018)
  • Beall’s List (old)
  • Beyond Search
  • Bibliographic wilderness
  • Booktwo
  • Campus Reform
  • Charleston Advisor
  • Coalition for Networked Information
  • Communia (public domain watchdog)
  • Cost of Knowledge
  • Council of Editors of Learned Journals
  • Dan Cohen
  • Digital Koans
  • Digital Shift
  • Dissernet (Russian anti-plagiarism)
  • DOAJ
  • Don't Block TOR
  • eFoundations
  • EIFL
  • Electronic Frontier Foundation
  • ELO
  • Embargo Watch
  • ePublishing Trust for Development
  • Facebook: Arab Open Access
  • Facebook: Italian Open Access
  • Facebook: Open Access India
  • Film Studies for Free
  • FIRE
  • Flaky Academic Conferences
  • Found History
  • Foundation for Individual Rights in Education
  • Free Speech Union (UK)
  • Google Algorithm
  • Heterodox Academy
  • Iconclass
  • IFLA Serials blog
  • ImpactStory
  • infoDocket
  • InTech Blog
  • Jinfo (formerly Free Pint)
  • Kindle blog
  • L'edition Electronique (French)
  • La Criee : periodiques (French)
  • Leader Statement Database on Free Speech
  • National Association of Scholars
  • National Coalition of Independent Scholars
  • Neil Beagrie
  • OA Lookup : Policies
  • OA Working Group
  • OASPA
  • Online Searcher
  • Open Access Bibliography
  • Open Access Week
  • Open and Shut?
  • Open Electronic Publishing
  • Open Folklore
  • Open Knowledge Maps
  • Open Library of Humanities
  • Periodiques en ligne (French)
  • Peter Murray Rust
  • PKP / OJS
  • Project Gutenberg
  • Publishing Archaeology
  • RBA Blog
  • Reclaim the Net
  • Research Information
  • Research Remix
  • Right to Research
  • River Valley TV
  • ROARS (Italian)
  • Scholarly Electronic Publishing
  • Scholarship Matters
  • Searchblox
  • Searcher
  • Serials Cataloger
  • Serials Review
  • Society of Young Publishers
  • Speech First
  • TaxoDiary (taxonomies news)
  • Taxpayer Access
  • Tentaclii
  • The Scholarly Kitchen
  • Thoughts from Carl Grant
  • Web Scale Discovery
  • Zotero blog

Some of the libraries linking to JURN

  • Boston College Libraries
  • Brooklyn Public Library, NY
  • Duke University
  • Kobe University, Japan
  • Rhode Island College
  • San Jose State University
  • UConn Stamford
  • University of California
  • University of Cambridge (Casimir Lewy Library)
  • University of Cambridge (main)
  • University of Canberra
  • University of Toronto
  • Washington University
  • West Virginia University

Spare BitCoins? Please send donations to JURN via: 17e2KGuyzjzEEE7BsoYTwMo3MtUod6DrjP

Archives

  • January 2023
  • December 2022
  • November 2022
  • October 2022
  • September 2022
  • August 2022
  • July 2022
  • June 2022
  • May 2022
  • April 2022
  • March 2022
  • February 2022
  • January 2022
  • December 2021
  • November 2021
  • October 2021
  • September 2021
  • August 2021
  • July 2021
  • June 2021
  • May 2021
  • April 2021
  • March 2021
  • February 2021
  • January 2021
  • December 2020
  • November 2020
  • October 2020
  • September 2020
  • August 2020
  • July 2020
  • June 2020
  • May 2020
  • April 2020
  • March 2020
  • February 2020
  • January 2020
  • December 2019
  • November 2019
  • October 2019
  • September 2019
  • August 2019
  • July 2019
  • June 2019
  • May 2019
  • April 2019
  • March 2019
  • February 2019
  • January 2019
  • December 2018
  • November 2018
  • October 2018
  • September 2018
  • August 2018
  • July 2018
  • June 2018
  • May 2018
  • April 2018
  • March 2018
  • February 2018
  • January 2018
  • December 2017
  • November 2017
  • October 2017
  • September 2017
  • August 2017
  • July 2017
  • June 2017
  • May 2017
  • April 2017
  • March 2017
  • February 2017
  • January 2017
  • December 2016
  • November 2016
  • October 2016
  • September 2016
  • August 2016
  • July 2016
  • June 2016
  • May 2016
  • April 2016
  • March 2016
  • February 2016
  • January 2016
  • December 2015
  • November 2015
  • October 2015
  • September 2015
  • August 2015
  • July 2015
  • June 2015
  • May 2015
  • April 2015
  • March 2015
  • February 2015
  • January 2015
  • December 2014
  • November 2014
  • October 2014
  • September 2014
  • August 2014
  • July 2014
  • June 2014
  • May 2014
  • April 2014
  • March 2014
  • February 2014
  • January 2014
  • December 2013
  • November 2013
  • October 2013
  • September 2013
  • August 2013
  • July 2013
  • June 2013
  • May 2013
  • April 2013
  • March 2013
  • February 2013
  • January 2013
  • December 2012
  • November 2012
  • October 2012
  • September 2012
  • August 2012
  • June 2012
  • May 2012
  • April 2012
  • March 2012
  • February 2012
  • January 2012
  • December 2011
  • November 2011
  • October 2011
  • September 2011
  • August 2011
  • June 2011
  • April 2011
  • March 2011
  • February 2011
  • January 2011
  • December 2010
  • November 2010
  • October 2010
  • September 2010
  • August 2010
  • July 2010
  • June 2010
  • May 2010
  • April 2010
  • March 2010
  • February 2010
  • January 2010
  • December 2009
  • November 2009
  • October 2009
  • September 2009
  • August 2009
  • July 2009
  • June 2009
  • May 2009
  • April 2009
  • March 2009
  • February 2009

Blog at WordPress.com.

  • Follow Following
    • News from JURN
    • Join 901 other followers
    • Already have a WordPress.com account? Log in now.
    • News from JURN
    • Customize
    • Follow Following
    • Sign up
    • Log in
    • Report this content
    • View site in Reader
    • Manage subscriptions
    • Collapse this bar
 

Loading Comments...