Open Marginalis, medieval marginalia in open access.
A new guide “to defeating tracking traps that could identify document leakers”, such as academic journal articles from behind paywalls…
Harding’s popular lecture detailed the watermarking and metadata techniques used to identify works and listed tools that can identify and circumvent both mechanisms.”
CACTUS Tourism Journal (Tourism, not cactii)
“SFX Miscellaneous Free Ejournals Target: Usage Survey Among the SFX Community“, Serials Review (2015), 41(2), pp. 58-68.
SFX is an Open URL link resolver product for university libraries, focussed on the output of traditional publishers — of which 16-20% is apparently so dodgy in terms of quality that it breaks the system. Yet, rather amazingly, it appears that much of this 16-20% is still allowed to get to the point-of-use.
The article briefly surveys recent findings on how SFX copes with open access articles, and then the rest of the paper gives the results of a survey of librarians who integrate a specific ‘free’ section of SFX with their library discovery tools. It appears that scholars looking for open free full-text via SFX can expect way over 20% dead link errors on URLs…
… one category [of failure] (incorrect parse params) alone leads to 20% false positives (dead links) for MFE [the largest ‘free’ target in SFX]. Besides incorrect parse params, there are numerous other reasons for the occurrence of false positives (dead links), such as resolver translation error, inaccurate embargo data, provider target URL translation error, incomplete provider content, wrong coverage dates, indexed-only titles mistakenly considered as fulltext titles, and other reasons listed in the literature review section.”
So that might mean… perhaps 40% of links to open access full-text are dead? The article doesn’t hazard a guess.
The DOAJ ‘targets’ are apparently not much better…
It’s an irony that I find discovery services generally have much poorer coverage of Open Access than Google Scholar. … Most discovery services have indexed DOAJ (Directory of Open Access Journals), but many libraries experience such a bad linking experience they just turn it off” — Aaron Tay, July 2015.
I’m pleased to say that JURN should have close to zero dead links on standalone journals, due to the way it is set up. JURN may lead to a few fleeting “server maintainance” / “timeout” errors here and there, but if the journal’s base URL for articles moves then its articles effectively get auto-removed from JURN’s results. But they get found again within a year at most, through an effective two-pronged method.
AWOL has a fascinating post today. It’s on the attempts to identify which AWOL linked resources have already been ingested into major long-term Web archives, and which haven’t. As part of that experiment Charles and his helpmate Ryan have offered their readers a nice big cleaned A-Z list of the “52,020 unique URLs” linked from AWOL, which is very good of them. I might clip these URLs back and de-duplicate, and then do a side-by-side sheet with JURN’s own indexing URLs and thus see what’s missing from JURN. Very little in terms of post-1945 journal articles, I suspect, though there may be some I’ve missed.
Of course a JURN Search already runs across the AWOL pages, as well as a great many of the post-war full-text originals (via Google). But if I were an Ancient History scholar I might now be tempted to get together with others to crowdfund a mass download of AWOL’s full-text, so that I could search across the full-text locally and minutely, without having to rely on Google etc. I reckon the entire set of AWOL full-text would fit on a 1.5Tb external drive and would cost around $10,000 to harvest by hand/eye. Why would that be needed? I’m assuming that many long-term Web archives are ‘dark’ or that license complications mean no single archive can ingest the entirety of what AWOL points to.
My calculations for the $10k figure start with the fact that a little over 10,000 of AWOL’s 52,020 URLs are straight-to-PDF links, and so very easily downloaded by a harvesting bot. Assuming an average of 5Mb per PDF, that means about 260Gb of disk storage space for those PDFs.
If one then assumes that perhaps 10,000 of the URLs are not going to articles (rather to such things as sites that show scans of original source manuscripts and old books that display in zoomable and frame-nested forms etc, huge datasets, that are difficult to extract and archive), then that might leave 32,000 URLs that are mostly likely to be links to either journal TOCs pages or individual articles.
Let’s assume that each of the 32,000 TOC page URLs lead to an average of 16 articles and reviews (though some 2,000 may be home-page links sitting above links to issue TOCs). So 32,000 = 512,000 articles of some kind, in PDF or HTML, on average weighing 1.5Mb each. So that’s 768Gb in total. In that case one might easily store all the AWOL-discovered full-text on an $80 1.5Tb external disk, and have space to spare for the desktop indexing software‘s own index, which would be fairly big. That is a product that I might find very useful, if I were an Ancient History student, specialist, or independent scholar without access to university databases.
But how to harvest those 512,000 articles? The brute force way would be to parcel up the 32,000 URLs into parcels of 150 each. That’s 230 parcels x 150 URLs. If one were paying 20 cents per URL to Indian freelancers, to go in and spend 3 minutes grabbing whatever articles are hanging off each of those 150 page URLs, plus the page, then that would cost $37 per parcel. Let’s say $40, with a small quality bonus. Let’s say it takes four hours to do the 150 URLs and not miss anything. So that’s $10 U.S. a hour — pretty good for an Indian freelancer with broadband, I don’t think anyone would be being exploited on that deal. So the whole 32,000 URL set would cost $9,200 to harvest by hand and eye, which seems well within the range of a small crowdfunding campaign.
Of course, it might be that the articles could be wholly or partly harvested by bot. But I suspect that a simple “page + anything it links to” harvest would bring in a lot of chaff alongside the articles, given the very varied and non-standard nature of what AWOL links to. Perhaps that wouldn’t matter in practice, when keyword searching across the entire harvest. Or one might be able to use a more intelligent bot, one using Google Scholar-like article-detection algorithms.
AdBlock Plus’s Element Hiding Helper has updated. It no longer resides on the right-click mouse menu. You need to enable the top menu bar button for AdBlock (View|Toolbars|Customise), then it launches from a drop-down from that icon.
The new method of selecting a block to hide takes a minute of getting used to. If you can comprehend nested HTML code at a glance then it’s not necessarily easier than before, since it’s now trickier to identify the master container DIV for the whole block you want to hide. However, other users will probably find it a bit easier and more visual to use.
Element Hiding Helper is useful for customising “noisy” websites such as newspaper front pages, which blast you with celebrity news sidebars, scrolling tickers, sports sections and other regular items you never read.
Checked and repaired the linkrot on the 400+ URLs in the preliminary directory of ecology/nature related titles indexed by JURN. Revised the corresponding indexing URLs in the main JURN database, if needed.
Google indexes images from PDF files. Fairly limited at present, possibly because the pictures all seem to be drawn from a small set of 500 PDFs stored at
http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/. But I’d guess that, as Google’s machine-learning algorithms get tuned up on this, we may start to see the service expanded to extract and serve images from wild PDFs. I wonder if there will be a Creative Commons filter for images from open access research PDFs? I also wonder if this may enhance the size of the image pool accessible via JURN’s new Image Search feature?
[ Hat-tip: ResearchBuzz ]
Xenu’s Link Sleuth is a free desktop Windows linkbot (Web link checker). It’s quite possibly the only one now available, at least for those who prefer not to punt their URLs into a Web-based service. Xenu’s Link Sleuth has been out of development since September 2010, but is still more up to date than the old Linkbot Pro 5.x (which needs to be run in Windows XP compatibility mode on newer versions of Windows). Xenu LS looks very similar to Linkbot and it works in much the same manner on Windows 8.x. It has about the same speed, maybe a little quicker — but that may be because it is more impatient on waiting for timeouts.
Xenu LS can “treat redirections as errors”, often very useful for detecting moved pages, but this feature may need to be enabled in the Advanced panel. It’s not as useful as Linkbot in this regard, because it doesn’t show you the new and old URLs side-by-side. Just the original URL and an “object permanently moved” or “object temporarily moved” flag. This makes it harder to detect if an OJS journal installation, for instance, is just passing a visitor over to the “current issue” page of a journal or is sending visitors to a more anomalous URL.
Sadly some large sites, such as Hathi, block being visited by a Xenu installation. Presumably this is because they see it as a species of URL harvester. The way that Xenu identifies itself to servers cannot be spoofed, unfortunately. Thankfully the number of such URLs seems to be very small (Hathi, CIA, uMich.edu, for me). If, however, your own starting server is blocked in that manner, then the simple workaround is to locate the URL list page on your hard-drive and check from your local copy.
I’m pleased to announce that the JURN blog now has a new page offering various ways to support JURN. The new page will host a (hopefully growing) range of digital download items that will help support JURN in terms of my time, the cost of hosting and general digital shoe-leather. I doubt I’ll sell more than a dozen or so downloads a year, but even that would help support JURN — and a few more sales beyond that might give me funds for marketing or to attend a UK open access conference.
So, first up is my $23 Microsoft Publisher magazine template. This is a substantial 28-page royalty-free template for the popular Microsoft Office Publisher 2013 (and higher) software. It has been designed by myself, as if a quality “small town” quarterly magazine, one aimed specifically at those wanting to sustain and revive a small American town or neighbourhood. But it can also be easily adapted to suit your own special interest, business sector, university or location. Just drop in your own pictures, and paste in new texts.
You can help support JURN by posting this news to Facebook or Twitter, or by suggesting the template to anyone who uses MS Publisher.
The images below show some sample page spreads from my new template…
(Cover photography by Colin Garrow, all other pictures are Wikipedia or CC0)
Is your 2013 free FeedDemon Pro 4.5 becoming annoying, in terms of its occasional ’15 second freeze’ problem on Windows? It’s nice software but is no longer under development. So I’ve taken another look around for alternative desktop RSS readers that are under active development, seeking something a touch faster but with the same or better features.
QuiteRSS offers search within your feeds, though only at the level of a per-folder search…
Also font size and font choice, across all display panes. So there is now one acceptable actively developed desktop Windows alternative to FeedDemon Pro. Which is good to know.
The only other — albeit unacceptable — actively developed desktop option is RSSOwl 2.2.1. Sadly it requires you to have Java installed to run it, which no one with any brain would install these days. Java is still an ongoing and massive security risk in Windows. Though you may have a workplace that forces your PC to have the Java environment installed, in which case you’ll find that RSSOwl does offer a particularly nice feature — to create new internal keyword-based feeds that selectively draw content from all your other feeds. So you could set up a series of wide-spectrum feeds, ignore them, but draw on them to create a new single key-phrase delimited feed. QuiteRSS isn’t yet that sophisticated.
JURN’s “A short guide to free academic search” guidance page has been link-checked and repaired.
Repozitar is a unified search tool for Czech open repositories. By default their keyword search only returns records which offer full-text. A nice touch, and it makes one wonder why the English-speaking world’s repository search tools seem to have such trouble offering this simple useful feature.
Repozitar is associated with a searchable nationwide registry of Czech theses, seemingly part of a Masaryk University project to help detect plagiarism in theses and student papers. English abstracts appear to be common in the very detailed record pages.
Retraction Watch has been given a $400,000 grant from the John D. and Catherine T. MacArthur Foundation, “to create a comprehensive database of retractions, allowing us to hire our first staff writer”.
Depending on the form it takes this could potentially be indexed by JURN? It would have to be one retraction, one page, and have the OA status indicated in the URL path — http://www.database.fuz/articles/oa/article725.html
I’m pleased to say that JURN’s annual summer repair-and-update process is now complete. All that remains is for me to roll out some donation-ware goodies in the next week or so, and add 40 or so URLs for additional geoscience journals over the coming weeks.
A new blog post from Aaron Tay, “5 things Google Scholar does better than your library discovery service”, looking at the huge market advantages enjoyed by Google Scholar. The main points in summary:
* Intake and update: Google intakes, refreshes and updates very quickly.
* Automated detection: The Google bot spots and indexes academic articles wherever those are located.
* Relevancy ranking: It’s certainly not perfect, but is vastly better than anyone else’s.
* Clear and fast: Simple interface, a few useful widgets and filters. Additional features are accessed only via typed-in search modifiers or the well-hidden “Advanced” form.
* Cross-platform: Scholar can be tweaked to become a seamless gateway into paid subscription services.
I would also add…
* De-duplication in results. Not always perfect, not always even seen by the end user, but pretty intelligent.
Here’s another quick group test of academic search tools that index open access or otherwise free academic papers. It follows JURN’s recent large number of additions of ecology related sources. The test search is on the popular topic of “mountain gorillas”, with a tourism keyword that is intended to skew results toward papers and chapters useful for understanding the inter-relationship of gorillas with tourism. Not a very sophisticated search, but the sort of thing that an age 16-18 college student or undergraduate might input.
Search: “mountain gorillas” tourism
|JURN group test: “mountain gorillas” tourism
July 2015. Searching for free full-text academic articles, theses, reports or book chapters in English. I clicked through on possible results and evaluated.
|Journal Click||?||Now requires registration / payment to use, and the search box has been removed. Thus it was not tested. It performed very poorly in previous tests.|
|DOAJ||0||Used ‘Article’ search. Zero from one result.|
|JournalTOCS||0||Zero from one result.|
|Paperity||0||Checked first 25 results. Closest possibility seemed to be the general short survey article “Exploring Sustainable Tourism in Nigeria for Developmental Growth”, but on investigation the text had no mention of gorillas.|
|Journal Seek||0||Zero results.|
|PQDT Open||0||Zero from five results.|
|Ingenta Connect||0||Zero from three results|
|CORE||0||Filtered search by English language, full-text only. Looked at first three pages of results. Results were a disparate jumble of general tourism items, though CORE did manage to bring the political anthropology dissertation “Lines in the sand: An anthropological discourse on wildlife tourism” to the top, but this was only tangentially relevant.|
|Microsoft Academic||1||1 from eight results. “Measuring the demand for nature-based tourism in Africa”, a UK economics experiment asking potential tourists about their likely choices around a hypothetical visit to see the mountain gorillas in Rwanda.|
|OATD||1||1 from two results. 2014 PhD thesis, asking if tourism reduces poverty-related forest mis-use by local people, in the Volcanoes National Park in Rwanda, a key mountain gorillas tourism destination.|
|OAlib||1||OAlib gave a jumble of general results for tourism in mountains, but had nothing specific on the first page for either Africa or gorillas. Second page had the 2011 article “Extreme Conservation Leads to Recovery of the Virunga Mountain Gorillas” at PLOS One, among another jumble of irrelevant results.|
|Google Search||1||Used a Web browser not signed in to Google, forced Google.com results (not .uk). Newspapers (Guardian, Daily Mail, CNN, FT etc) and magazine (National Geographic) articles, amid charity and tourist holiday booking sites. Got one good result, the World Bank’s report “The success of tourism in Rwanda – Gorillas and more”, as result No.15. Checked the first thirty results. A short interview by the Breakthrough Institute, “Extreme Conservation of Gorillas”, was judged too journalistic and tangential to be a result.|
|OpenAIRE||1||The one likely candidate, 2001’s “Ecological and economic impacts of gorilla-based tourism in Dzanga-Sangha, Central African Republic”, proved to have no full text available. But trying a different search access point into OpenAIRE surfaced one useful item, “Habituation, ecotourism and research for conservation of western gorillas in Central African Republic”.|
|Mendeley||2|| Searched ‘Articles’ only, then filtered for Open Access articles only. After the first ten results, results dissipated into general/unrelated tourism items. One useful result provided some deep historical background to the current tourism: “Memories of Walter Baumgartel (1902-1997): pioneering promoter of the mountain gorillas of Uganda”. Another was more about the general conservation measures, but useful, “Extreme conservation leads to recovery of the Virunga mountain gorillas”.|
|Digital Commons Network (BePress)||2||I switched out of the Arts and Humanities section for this search. I had 17 results, two of them strong, with another three being very broad critical studies of aspects of eco-tourism aesthetics.|
|FreeFullPDF||5||From 26 results. Three tourism items (“Measuring the demand for nature-based tourism in Africa” in which gorillas was used as test topic; “The success of tourism in Rwanda – Gorillas and more”; “Development AND gorillas? Assessing fifteen years of integrated conservation and development in south-western Uganda”; and “Memories of Walter Baumgartel (1902-1997)”. Plus two partially relevant items on general conservation (“Extreme Conservation Leads to Recovery of the Virunga Mountain Gorillas”; and “Sustainable Conservation of Bwindi Impenetrable National Park and community welfare improvement”).|
|BASE||5||I chose the facet to “boost open access documents”. 24 results, with many duplicates. Some possible results turned out to lack full-text. One promising article, “Benefits to the poor from gorilla tourism in Rwanda”, proved to be paywalled at $76(!).|
|Google Scholar||6||Checked first 40 results. Results tended to focus strongly on gorilla disease, diet, mating and population dynamics. But among these were three full-text open papers on ape tourism and disease transfer to/from them, which had not been surfaced in the test before (“Habituating the great apes: the disease risks”; “Ape tourism and human diseases: how close should we get”; “Anthropozoonotic … infections in habitats of free-ranging human-habituated gorillas, Uganda”). Plus another three, including a pirate copy of “Who is on the gorilla’s payroll? Claims on tourist revenue from a Ugandan National Park”, and the World Bank report “The success of tourism in Rwanda: Gorillas and more”, plus the ubiquitous PLOS One article “Extreme conservation leads to recovery of the Virunga mountain gorillas”. Many of the full-text links offered at Scholar came via researchgate.net.|
|OPENDoar||10||Examined first 40 results. The World Bank report “The success of tourism in Rwanda: Gorillas and more” was at No.4, followed by the ubiquitous PLOS One article “Extreme conservation leads to recovery of the Virunga mountain gorillas”. Some duplicates. One prospective item (“Evaluating the prospects of benefit sharing schemes in protecting mountain gorillas in Central Africa”) led to a $38 paywall whereas JURN found it free, while others (“The role of tourism in post-conflict peacebuilding in Rwanda”) led to records that had no full-text. Most useful was the indexing of the German-run on-the-ground Gorilla Journal, offering articles such as community opinion research among local people, “Gorilla Habituation and Ecotourism – a Social Perspective” (June 2014); “Western Gorilla Tourism: Lessons Learned from Dzanga-Sangha” (Dec 2006); and “Ten Years of Gorilla Tourism in Mgahinga” (June 2004). However, these three article titles were not highlighted in search and were instead deeply embedded in single issue PDFs of Gorilla Journal. (I regret that Gorilla Journal is not yet indexed in JURN, but it will be added soon).|
|JURN||15||Looked at first 40 results, the link titles of which are given below. There were a number of duplicates in the first four pages. A key finding is that JURN is now large enough to easily provide strong results through to result No.100. So, given a well-formed search, people who are habituated to just look at the first ten results in Google should explore the full set of 100 results in JURN.|
1. * “The success of tourism in Rwanda – Gorillas and more”.
2. “Extreme Conservation of Gorillas”.
3. * “Evaluating the Prospects of Benefit Sharing Schemes in Protecting Mountain Gorillas in Central Africa”.
4. “Human Metapneumovirus Infection in Wild Mountain Gorillas, Rwanda”.
5. “The Success of Tourism in Rwanda: Gorillas and More” (duplicate of No.1).
6. “Conserving critically endangered central African Mountain Gorillas from poaching threats”.
7. * APE TOURISM AND HUMAN DISEASES: How Close Should We Get?
8. “Dian Fossey’s Controversial “Active Conservation” Proves Useful in Increasing Mountain Gorilla Awareness”.
9. * Best Practice Guidelines for Great Ape Tourism (78 page book from the IUCN)
10. “Diversity of Microsporidia, Cryptosporidium and Giardia in Mountain Gorillas”.
11. “(Gorilla beringei beringei) in Bwindi Impenetrable” (mis-titled in results link, actually has main title “Landscape predictors of current and future distribution of mountain gorillas”)
12. * “Economics of Gorilla Tourism in Uganda”.
13. * “Extreme Conservation Leads to Recovery of the Virunga Mountain Gorillas”.
14. “Genetic census reveals increased but uneven growth of a critically endangered mountain gorilla population”.
15. “Murdered: the Virunga Gorillas” (National Geographic article from 2008, on pressures from militias, refugees and charcoal burners).
16. “Mountain Gorillas: Three Decades of Research at Karisoke”.
17. “Cambridge Books Online” (Free book chapter from Cambridge University Press, “Long-term research and conservation of the Virunga mountain gorillas”, from the book Science and Conservation in African Forests).
18. “The Success of Tourism in Rwanda: Gorillas and More” (another duplicate of No.1).
19. “Impacts of tourism and recreation in Africa” (Encyclopedia of Earth, short introductory article by the U.N.).
20. * “Gorilla-based Tourism: a Realistic Source of Community Income in Cameroon? Case study of the villages of Goungoulou and Karagoua”.
21. “Gentle Gorillas, Turbulent Times” (National Geographic article from 1995).
22. “Mountain Gorilla PHVA Final Report 1997”.
23. “Consequences of Non-Intervention for Infectious Disease in African Great Apes”.
24. * “VIRUNGA MASSIF SUSTAINABLE TOURISM DEVELOPMENT PLAN”. (2005. A useful baseline for understanding what was expected of the gorilla tourism in Rwanda).
25. * “Chimpanzee Tourism in Mahale Mountains National Park, Tanzania”. (Not gorillas, but included because possibly useful for comparison).
26. * “THE RWANDAN GORILLA PROJECT” (Detailed charity prospectus proposal to UK investors, for a gorilla tourism venture. Another useful baseline for understanding what was expected of the gorilla tourism in Rwanda, from the investor point of view).
27. * “Development AND gorillas? Assessing fifteen years of integrated conservation and development in south-western Uganda”.
28. “Population dynamics of the Bwindi mountain gorillas”.
29. * “Evaluating the prospects of benefit sharing schemes in protecting mountain gorillas in Central Africa”. (Free full-text at JURN, but behind a $38 paywall at OPENDoar — see the OPENDoar entry given above).
30. “Dian Fossey’s Controversial “Active Conservation” Proves Useful in Increasing Mountain Gorilla Awareness” (Duplicate of No.8).
31. * “THE ECONOMIC VALUE OF THE MOUNTAIN GORILLA PROTECTED FORESTS (The Virungas and Bwindi Impenetrable National Park). Final Report”. (Has 12 pages of rigourous examination of the value of gorilla tourism).
32. “Evaluating the prospects of benefit sharing schemes in protecting mountain gorillas in Central Africa”. (Duplicate).
33. * “From vision to narrative: A trial of information-based gorilla tourism in the Moukalaba-Doudou National Park, Gabon”.
34. “From vision to narrative: A trial of information-based gorilla tourism in the Moukalaba-Doudou National Park, Gabon”. (Duplicate of No.33).
35. Diversity of Microsporidia, Cryptosporidium and Giardia in Mountain … (Duplicate of No.10)
36. * “Gorilla Tourism: Uganda uses tourism to recover from decades of violent conflict”.
37. “Plumptre et al 2003 Current status of gorillas” (Cambridge University free book chapter, “The current status of gorillas and threats to their existence at the beginning of a new millennium”)
38. “Community-based forest enterprise development for improved livelihoods and biodiversity conservation: a case study from Bwindi World Heritage site, Uganda” (Short, and rather too tangential, but useful in showing the gorilla tourism in the context of other micro-livelihoods such as honey, oyster mushrooms, handicrafts, growing passion fruits and Irish potatoes).
39. “Bwindi Impenetrable National Park, Uganda” (Encyclopedia of Earth, short introductory article by the U.N.).
40. “20 Years of IGCP: Lessons Learned in Mountain Gorilla Conservation”.
Results stayed on-topic for mountain gorillas and/or related tourism right through to result No.100, with another 10 or so results that would have been very useful — but which were not counted for the purposes of this test.
I’m pleased to say that JURN is now nearer to becoming a useful search tool for open access journal articles in the geosciences. JURN already had moderately good coverage of this science, but after discovering the American Geosciences Institute’s handy directory of Open Access Journals in geosciences I have been able to index a further 30 geoscience journals. I’ve also started to add a list of around 40 further journals that were previously missing from JURN — most of the remaining missing journals are from small nations such as Portugal, Finland, Belgium, Hungary, etc, and these will be added over the next month or so. In using AGI to compile a list of the missing geoscience journals I’ve taken care to consider only journals from reputable publishers (the AGI’s directory appears to have an open policy of listing all applicants).