The tyranny of “relevance” sorting is rather wearing. Why is “relevance” the unchangeable default for various forms of search result? Because they’re so very rarely “relevant” (Google Search aside) and more often than not I’m looking for a “by date” ordering. I’ve been to the site before, and now I just want to see what’s new. If there’s one innovation I’d like to see in 2017 it’s a robust browser add-on, one which can be taught to identify the site’s relevance/date toggle and then auto-switches to “by date”.
Here’s a working Microsoft Excel 2007 .xlsx file (11kb) that has a simple formula to split a word list according to the case of each word’s starting letter. For instance, you have a list that runs…
You want to remove all the words that do not start with a capital letter, since they are not likely to be personal names or place-names or species etc. Excel can’t do this ‘out of the box’, at least not with the various Sort buttons available in Excel 2007. Nor can plugins like ASAP Utilities. This spreadsheet results in a list with the all-lowercase words pushed down to the bottom of the sorted list, thus…
It won’t work properly if you also have words in your list with a capital letter after the first letter, such as “naZgul”. Those words will be flagged as if they start with a capital letter. Numbers, on the other hand, are fine.
Want to home-brew a classic “back of the book” index from a Word file, ideally using freeware? Here are all the current software options I could find:
* TExtract can handle a wide variety of input files and seems to be favoured by pro book indexers. From $79 (use for a single title) to $595 (buy outright). Seems likely to take a while to learn.
* WordEmbed v3.11. £80. A MS Word Macro that helps to automate the process where your pre-made book index gets slotted in as an intrinsic ‘living/linked’ part of the MS Word document. It seems to be well regarded as a helping hand, but is not an automated maker of the index in the first place. Not likely to be used by amateurs but it might be something you could tell your hired low-cost ebook freelancer about — they might be interested in learning how to use it and thus adding to their skills-base.
* PDF Index Generator. $69.95, with a free demo limited to the first ten pages of the book. Create a basic automatic index, and then trim back and supplement it as needed. Note that it requires that you install Java to run it, and having Java installed on your PC these days is a very major security risk.
* Index Generator 5.5 is un-crippled freeware for PDFs. It’s more basic than PDF Index Generator (above) but is quite capable and easy to use. I found that it doesn’t require Java to launch or work. For Windows, Mac and Linux. Could make it a lot easier to hand off the indexing to a low-cost ebook freelancer, and get something worthwhile back. Could be used in conjunction with the free Calibre (see below).
* For a simple table of: word | language | times used the free Calibre ebook management and conversion software can also give you a quick output from an ebook of all words in the book. Calibre’s simple word table can then be exported to .csv and thus sorted in MS Excel. To access it from inside Calibre: load your ebook and convert to ePub (it only works with the ePub format) | click the tiny top-right “more” arrows | drop down the extra hidden toolbar | Edit Book | Tools | Reports | Words | Save…
The Word file’s word capitalisation is retained in the resulting Calibre list. On loading into Excel and sorting for capitalised words, one may thus quickly create a rough checklist of important name items, for reference use when selecting words with the likes of Index Generator (which regrettably has no such ‘show capitalised name words only’ function).
It seems that JURN’s search results have become even more precise over the last year, if a new report by Searchmetrics is to be believed…
“the study found the URLs for pages that feature in the top 20 search results are about 15% longer on average than in 2015. Searchmetrics said this is likely because Google is better able to identify and display the precise pages that answer the search intention, and these pages are more likely to have longer URLs because they possibly lie buried deeper within websites.”
Exhibition (journal of the U.S. National Association for Museum Exhibition, with a two issue partial paywall)
Conservar Patrimonio (Portuguese art conservation journal, partly in English)
Fixed indexing of the scielo.org aggregation sites, to make them less verbose in search results. Specifically, several of the Scielo sites recently introduced an ‘export’ page for each and every citation. These ‘export’ pages are now blocked from JURN’s results.
The launch of Metadata 2020 is reported to have slipped to early 2017. They’re apparently hoping that the big publishers will release all their metadata for open public use, and will flag their open access articles with uniform publicly-discoverable tags. Good luck with that one.
For those interested in end-of-year OA tallies, I can report that this blog recorded a total of 340 journals added to JURN in 2016. Nearly all those titles publish in English on topics in the humanities or the natural world. If the 340 were combined with the worthy foreign language journals URLs also added in 2016, then the total OA journals added to JURN might be around 500. Which means it’s been a somewhat slower year than 2015, which added 450 new titles published in English.
JURN’s annual full link-check + repair is now complete. The checking of the indexed URLs is normally done August/September, so this year it has been running a few months late. Mostly because it took a few months, on and off. URL presence on Google Search is checked to the indexed path at http://www.site.com/journal/articles/pdfs/.. etc and not to http://www.site.com/ etc.
This checking is in addition to the weekly linkbot-enabled checking of the homepage URLs in the Directory.
I see there’s a new 2016 study of the DOAJ in New Library World (Vol. 117, 11/12, pages 746-755). The researchers found that in the DOAJ…
“roughly 20-25% of the [journal homepage] URLs redirected to another URL” but that “only 2.11% of 9,073 journals [proved] to be inaccessible”
… once the redirects were followed.
Two automated tests were done (using home-brewed Excel wizardry, rather than dedicated linkbot software) of all 9,073 titles, one month apart, pinging each journal’s homepage. They followed this up with a manual check on all the URLs of the still-inaccessible journals.
The research seems to have been quite thorough, although I’d observe that a homepage URL is far less likely to be broken than the deeper direct article URLs on the DOAJ’s table-of-content pages. Article page / PDF URLs can be easily broken, for instance by the journal moving from WordPress to OJS or visa versa. A similar test might usefully be run on a sample of DOAJ article URLs, although I must say that I haven’t noticed any problem on the DOAJ in that respect.
I see that Bentham Open (aka Bentham Science Publishers, not directly indexed in JURN) provided 67 of the inaccessible titles. For some reason they are still in the DOAJ after the recent purge, but my quick tests on the DOAJ’s Bentham URLs found all those tested to be unresponsive. That was last night, and they were tested again today and again found to be unresponsive. So I’m not too worried about their popping up in JURN results (via the DOAJ indexing) and I presume that the DOAJ will have them out fairly soon for 404-ing.
The huge hi-res maps service of the Perry-Castaneda Library Map Collection is now inviting small donations. Their service is free and open, cleanly organised and exposed to the public via search engines. Note that there are several forms to wade through to donate, and it looks like it may be ‘credit cards only’. I think they may do better if they also put a simple and swift PayPal button on the front page.
How to get a free and approximate audio transcription via YouTube’s automated transcription:
1. Use the free Audacity or other desktop audio software to split your .mp3 into segments of less than 15 minutes each. I assume that’s still the limit. Or make it whatever time-limit YouTube sets on uploads in future.
2. Upload the .mp3s to YouTube as a “Public” video via TunesToTube. This is a free service that lets you upload an .mp3 to YouTube and quickly add a single picture visual, to become a video which is then uploaded to YouTube.
3. Then go to YouTube and find your Channel, click the Settings cog on the uploaded video, and turn on “Automatic Subtitling”.
4. Wait a minute or so for the subtitles to be made. Then go to DownSub.com to download and save the video’s subtitles as an .srt standard subtitles file.
5. Get the Open Source Subtitle Edit 3.5 desktop software. Load the .srt file. In Subtitle Edit: File -> Export -> Plain Text.
6. Load the resulting text into Word, and edit and correct. It’s accurate enough for a ‘speech radio’ type podcast, though without much punctuation and you’ll need to work on it to polish it up.
You can of course get willing hands around the Web to transcribe, but you have to pay them (it’s surprisingly affordable) and there’s usually at least a 12 hour turnaround time. The above method would help you to meet a much tighter deadline.
The African Open Science Platform has just launched as a pilot…
“The Africa-wide initiative will promote the development and coordination of data policies, data training and data infrastructure. The pilot phase, launched today, is supported by the South African Department of Science and Technology (DST), funded by the National Research Foundation (NRF), directed by CODATA, the Committee on Data of the International Council for Science (ICSU) and implemented by the Academy of Science of South Africa (ASSAf).”
* coordinate initiatives already underway.
* encourage shared investment in infrastructure.
* circulate good ideas and practice.
* develop the capacities of individuals and institutions.
* promote key applications of relevance to Africa.
* be a conduit to international open data.
Free and Open Access to Biodiversity Data. 800+ per-publisher index pages, with their dataset titles listed and linked in a TOC-like manner, added to JURN.
Evil megacorp Elsevier has launched a new Elsevier Datasearch for finding datasets. As you might expect, a very large proportion of the links lead to pages which ask the visitor to ‘pay Elsevier $39.95 to access this’.
The Publishing Research Consortium has a new report, “Early Career Researchers: the harbingers of change?”…
“there are no recent investigations into the extent to which their behaviours may prove transformational. This qualitative study of ECRs from seven countries, a first report of a longitudinal study, tracks communication and publication behaviour, and attitudes to peer review, collaboration, sharing, open access, social media and emerging impact mechanisms.”
Mostly that seems to be “change” as in, “We have still have a physical library? How quaint and delightfully old-fashioned”.
Leaving Twitter for something better? New today, a handy concise article, “How to Own & Display Your Twitter Archive on Your Website in Under 10 Minutes”.