• Directory
  • FAQ: about JURN
  • Group tests
  • Guide to academic search
  • JURN’s donationware
  • openEco: nature titles indexed

News from JURN

~ search tool for open access content

News from JURN

Monthly Archives: December 2016

The tyranny of “relevance” sorting

30 Friday Dec 2016

Posted by David Haden in My general observations

≈ Leave a comment

The tyranny of “relevance” sorting is rather wearing. Why is “relevance” the unchangeable default for various forms of search result? Because they’re so very rarely “relevant” (Google Search aside) and more often than not I’m looking for a “by date” ordering. I’ve been to the site before, and now I just want to see what’s new. If there’s one innovation I’d like to see in 2017 it’s a robust browser add-on, one which can be taught to identify the site’s relevance/date toggle and then auto-switches to “by date”.

Excel example sheet: Sort a list to retain only Names and remove the all-lowercase words

27 Tuesday Dec 2016

Posted by David Haden in JURN tips and tricks

≈ 1 Comment

Here’s a working Microsoft Excel 2007 .xlsx file (11kb) that has a simple formula to split a word list according to the case of each word’s starting letter. For instance, you have a list that runs…

Frodo
Merry
Pippin
riders
Gandalf
Sam
ponies
Strider
mushrooms

You want to remove all the words that do not start with a capital letter, since they are not likely to be personal names or place-names or species etc. Excel can’t do this ‘out of the box’, at least not with the various Sort buttons available in Excel 2007. Nor can plugins like ASAP Utilities. This spreadsheet results in a list with the all-lowercase words pushed down to the bottom of the sorted list, thus…

Frodo
Merry
Pippin
Gandalf
Sam
Strider
ponies
riders
mushrooms

It won’t work properly if you also have words in your list with a capital letter after the first letter, such as “naZgul”. Those words will be flagged as if they start with a capital letter. Numbers, on the other hand, are fine.

sort_lowercase_excel

A survey of automated book index making software

27 Tuesday Dec 2016

Posted by David Haden in JURN tips and tricks

≈ 6 Comments

Updated: 13th July 2020.

Want to home-brew a classic “back of the book” index from a Word file, ideally using freeware? Here are all the current software options I could find:


* TExtract can handle a wide variety of input files and seems to be favoured by pro book indexers. From $79 (use for a single title) to $595 (buy outright). Seems likely to take a while to learn.


* WordEmbed. £80. A MS Word Macro that helps to automate the process where your pre-made book index gets slotted in as an intrinsic ‘living/linked’ part of the MS Word document. It seems to be well regarded as a helping hand, but is not an automated maker of the index in the first place. Not likely to be used by amateurs but it might be something you could tell your hired low-cost ebook freelancer about — they might be interested in learning how to use it and thus adding to their skills-base.


* PDF Index Generator. $69.95, with a free demo limited to the first ten pages of the book. Create a basic automatic index, and then trim back and supplement it as needed.

Version 2.4 added a new feature, a… “new query template has been added to allow indexing capitalized phrases” which works this way: get to “Step 2” in the initial PDF import | “Include words” | Click on pencil icon | “Add Query” | Choose “Capitalised Phrases” from the dropdown | this then forms Query 1 | Make sure Query 1 is ticked, and “Index these words only” | OK.

You now have a vastly more useful starting point for a first-pass at an index than otherwise, with all your place-names and personal names done…

There’s also a filter to get the “surnames, forenames” switched over. You can stack filters and/or run multiple indexes and then merge them (video tutorial link: see video at the 3 minute mark) and thus work in stages.

You’d then un-tick the irrelevancies and cut out the mis-steps, and then go through your book manually and add to the index various concepts and ideas which readers might want to look up. That wouldn’t be the end of making a polished index, but it’d be a big chunk of the grunt-work done.

A note on Java:

However, useful as such automation is, note that PDF Index Generator requires that you install Java to run it, and having Java installed on your PC these days is a very very major and ongoing security risk…

Network World reported that in 2014 U.S. Homeland Security… “recommended users uninstall Java completely” throughout the USA. In 2014 PC Magazine advised “Users should either uninstall Java, disable it entirely in the browser, or take other steps to protect themselves from attacks against Java.” In 2015 InfoWorld magazine wrote… in 2015, it’s really, really tempting [for a network admin] to simply uninstall Java from user machines.” In 2017 even Java World wrote, of yet more new and critical vulnerabilities, that… “Users should uninstall Java from their systems”.

Still… one might safety install Java on an old laptop and run from there, if the laptop has sufficient memory, where it would be quarantined from your main PC. Or, for a one-time use on your main PC, you might: i) download the standalone Java installers, ii) disconnect from the Internet; iii) install Java and then PDF Index Generator; iv) do your indexing output and refining work; v) completely uninstall Java and then re-connect to the Internet. Only with the standalone (full, about 58Mb) Java installer and the Internet disconnected does the installer NOT collect and send your system fingerprint to a remote location at Oracle, makers of Java. After install you should also look down the Java Security settings and disable things like Web browser integration (most Web browser makers block all Java plugins by default, but it’s best to check).

Update, July 2020: As of PDF Index Generator 2.9…

The Windows edition of the program now comes with Java embedded inside it, so you don’t have to worry about installing the right Java edition to run the program.


* Index Generator is un-crippled freeware for PDFs. It’s more basic than PDF Index Generator (above), lacking things like Phrase Query filters, but is quite capable and easy to use. I found that it doesn’t require an install of Java to launch or work. It’s available for Windows, Mac and Linux (the latter two do seem to require Java?). The very major drawback is that it currently appears to lack any Query ability to select only capitalised items such as Names and Place Names, and seems to actually case-shift every word in its pick-list to lower-case! Still, it’s in active development, and we may well see it catching up with PDF Index Generator over time.


* For a simple table of: word | language | times used the free Calibre ebook management and conversion software can also give you a quick output from an ebook of all words in the book. Calibre’s simple word table can then be exported to .csv and thus sorted in MS Excel. To access it from inside Calibre: load your ebook and convert to ePub (it only works with the ePub format) | click the tiny top-right “more” arrows | drop down the extra hidden toolbar | Edit Book | Tools | Reports | Words | Save…

The Word file’s word capitalisation is retained in the resulting Calibre list. On loading into Excel and sorting for capitalised words, one may thus quickly create a rough checklist of important name items, for reference use when selecting words with the likes of Index Generator (which regrettably appears to have no such ‘show capitalised name words only’ function).


* Indiscripts’ IndexMatic 2 plugin for Adobe InDesign (which is Adobe’s flagship DTP software).


Possibly someone will eventually whip up a script to automatically check if a word or phrase in an index has a corresponding Wikipedia or Infogalactic page, thus offering another way to filter a word-list down to the more important items.

Google goes deeper

27 Tuesday Dec 2016

Posted by David Haden in Spotted in the news

≈ Leave a comment

It seems that JURN’s search results have become even more precise over the last year, if a new report by Searchmetrics is to be believed…

“the study found the URLs for pages that feature in the top 20 search results are about 15% longer on average than in 2015. Searchmetrics said this is likely because Google is better able to identify and display the precise pages that answer the search intention, and these pages are more likely to have longer URLs because they possibly lie buried deeper within websites.”

Added to JURN

22 Thursday Dec 2016

Posted by David Haden in New titles added to JURN

≈ Leave a comment

Journal of Burmese Scholarship

Exhibition (journal of the U.S. National Association for Museum Exhibition, with a two issue partial paywall)

Conservar Patrimonio (Portuguese art conservation journal, partly in English)

Fixed indexing of the scielo.org aggregation sites, to make them less verbose in search results. Specifically, several of the Scielo sites recently introduced an ‘export’ page for each and every citation. These ‘export’ pages are now blocked from JURN’s results.

Metadata 2020

17 Saturday Dec 2016

Posted by David Haden in Spotted in the news

≈ Leave a comment

The launch of Metadata 2020 is reported to have slipped to early 2017. They’re apparently hoping that the big publishers will release all their metadata for open public use, and will flag their open access articles with uniform publicly-discoverable tags. Good luck with that one.

Added in 2016

16 Friday Dec 2016

Posted by David Haden in JURN metrics, New titles added to JURN

≈ Leave a comment

For those interested in end-of-year OA tallies, I can report that this blog recorded a total of 340 journals added to JURN in 2016. Nearly all those titles publish in English on topics in the humanities or the natural world. If the 340 were combined with the worthy foreign language journals URLs also added in 2016, then the total OA journals added to JURN might be around 500. Which means it’s been a somewhat slower year than 2015, which added 450 new titles published in English.

Added to JURN

15 Thursday Dec 2016

Posted by David Haden in New titles added to JURN

≈ Leave a comment

Beyond Philology

Journal of Global Theatre History

Fundamina : A Journal of Legal History

JURN’s annual linkrot check completed

15 Thursday Dec 2016

Posted by David Haden in My general observations

≈ Leave a comment

JURN’s annual full link-check + repair is now complete. The checking of the indexed URLs is normally done August/September, so this year it has been running a few months late. Mostly because it took a few months, on and off. URL presence on Google Search is checked to the indexed path at http://www.site.com/journal/articles/pdfs/.. etc and not to http://www.site.com/ etc.

This checking is in addition to the weekly linkbot-enabled checking of the homepage URLs in the Directory.

DOAJ study in New Library World

15 Thursday Dec 2016

Posted by David Haden in Spotted in the news

≈ Leave a comment

I see there’s a new 2016 study of the DOAJ in New Library World (Vol. 117, 11/12, pages 746-755). The researchers found that in the DOAJ…

“roughly 20-25% of the [journal homepage] URLs redirected to another URL” but that “only 2.11% of 9,073 journals [proved] to be inaccessible”

… once the redirects were followed.

Two automated tests were done (using home-brewed Excel wizardry, rather than dedicated linkbot software) of all 9,073 titles, one month apart, pinging each journal’s homepage. They followed this up with a manual check on all the URLs of the still-inaccessible journals.

The research seems to have been quite thorough, although I’d observe that a homepage URL is far less likely to be broken than the deeper direct article URLs on the DOAJ’s table-of-content pages. Article page / PDF URLs can be easily broken, for instance by the journal moving from WordPress to OJS or visa versa. A similar test might usefully be run on a sample of DOAJ article URLs, although I must say that I haven’t noticed any problem on the DOAJ in that respect.

I see that Bentham Open (aka Bentham Science Publishers, not directly indexed in JURN) provided 67 of the inaccessible titles. For some reason they are still in the DOAJ after the recent purge, but my quick tests on the DOAJ’s Bentham URLs found all those tested to be unresponsive. That was last night, and they were tested again today and again found to be unresponsive. So I’m not too worried about their popping up in JURN results (via the DOAJ indexing) and I presume that the DOAJ will have them out fairly soon for 404-ing.

← Older posts
Subscribe: RSS News Feed.
I'm on Patreon!

JURN:

  • JURN : directory of ejournals
  • JURN : main search-engine
  • JURN : openEco directory
  • JURN : repository search

Related sites:

  • 4 Humanities
  • Academic Freedom Alliance
  • Accuracy in Academia
  • Alliance Defending Freedom
  • ALPSP
  • alt.academy
  • AMIR
  • Anterotesis
  • Arcadia project
  • Art Historicum (German)
  • AWOL
  • Beall's List (updated at 2018)
  • Beall’s List (old)
  • Beyond Search
  • Bibliographic wilderness
  • Booktwo
  • Campus Reform
  • Charleston Advisor
  • Coalition for Networked Information
  • Communia (public domain watchdog)
  • Cost of Knowledge
  • Council of Editors of Learned Journals
  • Dan Cohen
  • Digital Koans
  • Digital Shift
  • Dissernet (Russian anti-plagiarism)
  • DOAJ
  • Don't Block TOR
  • eFoundations
  • EIFL
  • Electronic Frontier Foundation
  • ELO
  • Embargo Watch
  • ePublishing Trust for Development
  • Facebook: Arab Open Access
  • Facebook: Italian Open Access
  • Facebook: Open Access India
  • Film Studies for Free
  • FIRE
  • Flaky Academic Conferences
  • Found History
  • Foundation for Individual Rights in Education
  • Free Speech Union (UK)
  • Google Algorithm
  • Heterodox Academy
  • Iconclass
  • IFLA Serials blog
  • ImpactStory
  • infoDocket
  • InTech Blog
  • Jinfo (formerly Free Pint)
  • Kindle blog
  • L'edition Electronique (French)
  • La Criee : periodiques (French)
  • Leader Statement Database on Free Speech
  • National Association of Scholars
  • National Coalition of Independent Scholars
  • Neil Beagrie
  • OA Lookup : Policies
  • OA Working Group
  • OASPA
  • Online Searcher
  • Open Access Bibliography
  • Open Access Week
  • Open and Shut?
  • Open Electronic Publishing
  • Open Folklore
  • Open Knowledge Maps
  • Open Library of Humanities
  • Periodiques en ligne (French)
  • Peter Murray Rust
  • PKP / OJS
  • Project Gutenberg
  • Publishing Archaeology
  • RBA Blog
  • Reclaim the Net
  • Research Information
  • Research Remix
  • Right to Research
  • River Valley TV
  • ROARS (Italian)
  • Scholarly Electronic Publishing
  • Scholarship Matters
  • Searchblox
  • Searcher
  • Serials Cataloger
  • Serials Review
  • Society of Young Publishers
  • Speech First
  • TaxoDiary (taxonomies news)
  • Taxpayer Access
  • Tentaclii
  • The Scholarly Kitchen
  • Thoughts from Carl Grant
  • Web Scale Discovery
  • Zotero blog

Some of the libraries linking to JURN

  • Boston College Libraries
  • Brooklyn Public Library, NY
  • Duke University
  • Kobe University, Japan
  • Rhode Island College
  • San Jose State University
  • UConn Stamford
  • University of California
  • University of Cambridge (Casimir Lewy Library)
  • University of Cambridge (main)
  • University of Canberra
  • University of Toronto
  • Washington University
  • West Virginia University

Spare BitCoins? Please send donations to JURN via: 17e2KGuyzjzEEE7BsoYTwMo3MtUod6DrjP

Archives

  • January 2023
  • December 2022
  • November 2022
  • October 2022
  • September 2022
  • August 2022
  • July 2022
  • June 2022
  • May 2022
  • April 2022
  • March 2022
  • February 2022
  • January 2022
  • December 2021
  • November 2021
  • October 2021
  • September 2021
  • August 2021
  • July 2021
  • June 2021
  • May 2021
  • April 2021
  • March 2021
  • February 2021
  • January 2021
  • December 2020
  • November 2020
  • October 2020
  • September 2020
  • August 2020
  • July 2020
  • June 2020
  • May 2020
  • April 2020
  • March 2020
  • February 2020
  • January 2020
  • December 2019
  • November 2019
  • October 2019
  • September 2019
  • August 2019
  • July 2019
  • June 2019
  • May 2019
  • April 2019
  • March 2019
  • February 2019
  • January 2019
  • December 2018
  • November 2018
  • October 2018
  • September 2018
  • August 2018
  • July 2018
  • June 2018
  • May 2018
  • April 2018
  • March 2018
  • February 2018
  • January 2018
  • December 2017
  • November 2017
  • October 2017
  • September 2017
  • August 2017
  • July 2017
  • June 2017
  • May 2017
  • April 2017
  • March 2017
  • February 2017
  • January 2017
  • December 2016
  • November 2016
  • October 2016
  • September 2016
  • August 2016
  • July 2016
  • June 2016
  • May 2016
  • April 2016
  • March 2016
  • February 2016
  • January 2016
  • December 2015
  • November 2015
  • October 2015
  • September 2015
  • August 2015
  • July 2015
  • June 2015
  • May 2015
  • April 2015
  • March 2015
  • February 2015
  • January 2015
  • December 2014
  • November 2014
  • October 2014
  • September 2014
  • August 2014
  • July 2014
  • June 2014
  • May 2014
  • April 2014
  • March 2014
  • February 2014
  • January 2014
  • December 2013
  • November 2013
  • October 2013
  • September 2013
  • August 2013
  • July 2013
  • June 2013
  • May 2013
  • April 2013
  • March 2013
  • February 2013
  • January 2013
  • December 2012
  • November 2012
  • October 2012
  • September 2012
  • August 2012
  • June 2012
  • May 2012
  • April 2012
  • March 2012
  • February 2012
  • January 2012
  • December 2011
  • November 2011
  • October 2011
  • September 2011
  • August 2011
  • June 2011
  • April 2011
  • March 2011
  • February 2011
  • January 2011
  • December 2010
  • November 2010
  • October 2010
  • September 2010
  • August 2010
  • July 2010
  • June 2010
  • May 2010
  • April 2010
  • March 2010
  • February 2010
  • January 2010
  • December 2009
  • November 2009
  • October 2009
  • September 2009
  • August 2009
  • July 2009
  • June 2009
  • May 2009
  • April 2009
  • March 2009
  • February 2009

Create a free website or blog at WordPress.com.

  • Follow Following
    • News from JURN
    • Join 901 other followers
    • Already have a WordPress.com account? Log in now.
    • News from JURN
    • Customize
    • Follow Following
    • Sign up
    • Log in
    • Report this content
    • View site in Reader
    • Manage subscriptions
    • Collapse this bar
 

Loading Comments...