New from MIT Press and under CC, Shadow Libraries: Access to Educational Materials in Global Higher Education (PDF). Also available in paperback via Amazon etc. Surveys the evolution of the trend that has today become Sci-Hub, Libgen.io etc.
“An Evidence-Based Review of Academic Web Search Engines, 2014-2016”… “This article seeks to summarize research concerning Google Scholar, Google Books, and Microsoft Academic from the past three years”.
Useful. Interesting snippets from this excellent new summary survey:
* Weiss noted, “no critical studies seem to exist on the effect that Google Books might have on the contemporary reference experience” (Weiss 2016, 293). […] Research is badly needed about the coverage and utility of both Google Books and Microsoft Academic.”
Seriously? None, not one single study from 2005-2015? For one of the most important innovations in books since Gutenberg? Wow. That’s one hell of a grudge you’re holding there, librarians.
* “In September 2016, Hug et al. […] noted Microsoft Academic has “grown massively from 83 million publication records in 2015 to 140 million in 2016″ […] As of February 2017 its index contains 120 million citations.”
Great news, which means I’ll have to take another look at that. I’m overdue for doing another big ‘group test’ of OA coverage in public search-engines, so this news may spur that. Of course, “citations” are not full-text, but 120m is impressive.
* “Bonato  noted Google Scholar retrieved different results with Advanced and Basic searches”
So that’s another thing to take into account if I do another group-test this summer.
* A “glaring lack of research related to the [search] coverage of arts and humanities scholarship” [and specifically] “Little is known about coverage of arts and humanities by Google Scholar.” [and it is evident that arts and humanities scholars’] preferences and behavior […] cannot be inferred from the vast literature focused on the sciences.”
* “research concerning the use of academic web search engines by undergraduates, community college students, high school students, and other groups would be welcome.”
* “Scholar results have been said to contain “clutter””.
This is the closest the paper comes to mentioning all the predatory journals and similar dubious items, which get dragged into Scholar by automated collection bots.
* “During interviews of 20 historians by Martin and Quan-Haase (2016) concerning serendipity, five mentioned Google Books and Google Scholar as important for recreating serendipity of the physical library online.”
Yes, serendipity is vital. It’s more of a loosely chain-linked set of serendipity loops during search-based research, really, interspersed with deep-dives to get tiny confirming nuggets of fact (e.g.: was Borges correct when he suggested that The Time Machine‘s famous central motif of ‘the future-flower’ was almost certainly not influenced by a striking passage in Coleridge’s notebooks? Yes he was, presumably by a private letter of enquiry to some learned bibliophile in London. But he was characteristically recondite on this point in the essay, and thus can only be proved correct if you do the 30 minute deep-dive to the primary sources needed to get the exact month-of-publication dates in 1895).
* “arts and humanities scholars […] commonly expressed the belief that having a complete list of research activities online improves public awareness [with] the enormous potential for this tool’s use.”
Might be more useful to have a rolling listing of what’s not being done, but which needs to be done. Sort of like a speculative Kickstarter, only you’d gather people rather than cash.
* “Gardner (2016) showed […] people working in the humanities and religion and theology prefer to use Google”. “Humanities scholar use of Google over Google Scholar was also found by Kemman et al. (2013); Google, Google Images, Google Scholar, and YouTube were used more than JSTOR or other library databases”
* “Namei and Young’s  comparison of Summon, Google Scholar, and Google using 299 known-item queries. They found Google Scholar and Summon returned relevant results 74% of the time; Google returned relevant results 91% of the time.”
* “In Yang’s (2016) study of Texas Tech’s DSpace IR [the university repository], Google was the only search engine that indexed, discovered, or linked to PDF files supplemented with metadata; Google Scholar did not discover or provide links to the IR’s PDF files, and was less successful at discovering metadata.”
I’m guessing this possibly illustrates the value of separating a university’s big dumpy Digital Collections from the nimble research repository, by putting them on different domains? Texas Tech’s DSpace has them both cheek-by-jowl, and adds a Law repository for good measure.
* “IR platform and metadata schema dramatically affect discovery, with some IRs nearly invisible (Weideman 2015; Chen 2014; Orduña-Malea and López-Cózar 2015; Yang 2016) and others somewhat findable by Google Scholar (Lee et al. 2015; Obrien et al. 2016).”
* “Another area needing investigation is the visibility of links to free full text in Google Scholar.” [and more generally] “retrieval of full text, which is another area ripe for more research studies, especially in light of the impressive quantity of full text that can be retrieved without user authentication.” […] “When will academic users find a good-enough selection of full-text articles that they no longer need the expanded full text paid for by their institutions?”
There are also good formulations of four future-research questions specific to the arts and humanities (pages 27-28).
A survey of the state-of-play in providing ‘plain English’ summaries of journal articles. With a spreadsheet list of the titles which currently offer such summaries.
Newly announced for the UK…
“Today Jisc announced that OCLC, the global library cooperative, has been awarded the contract to develop a new national bibliographic knowledgebase (NBK).”
Judging by the initial press-release, the focus seems likely to rest first on cohering UK academia’s metadata management for digital book collections. This will in time…
“enable shared bibliographic metadata to flow into … global search engines”
Hopefully that means Google Search, as well as Google Scholar (which are two separate systems and databases).
“Availability of digital object identifiers in publications archived by PubMed”, 3rd January 2017. For…
“the period 1966–2015 (50 years). Of the 496,665 articles studied over this period, 201,055 have DOIs (40.48%).”
So just under 60% are without DOIs, and that’s for biomedical in PubMed — albeit when including thirty years of pre-1995 (pre the mass Internet) coverage. More recently, for 2015, the study found that 13.5% of new content was still without a DOI.
The DOI-free figures for the humanities will be far higher, according to “Availability of digital object identifiers (DOIs) in Web of Science and Scopus”, February 2016…
“Many journals related to the Natural Sciences and Medicine with considerable impact have no DOI. Arts & Humanities WoS [Web of Science] categories have the highest percentage of documents without DOI.” … “exceeding 50% only since 2013. The observed values for Books and Proceedings are even lower despite the importance of these document types …”
As for DOI availability within articles in repositories, IRUS-UK provides a “DOI Summary” field giving “the numbers and percentages that have DOIs available” in UK repositories, although the access to their datasets is controlled. IRUS-UK has no summary infographics that I could find, relevant to DOI availability. But it would be interesting to determine what proportion of UK repository free/open journal articles have DOIs.
A new Medium article, from the head of Ingenta Connect, “Is the Open Access discoverability problem solvable? And whose problem is it?”. It’s a cursory look at the problem, but even then it’s interesting for what it doesn’t say…
* For “institutional librarians” the author seems to imply that their future role is only to be in one-to-one “mentoring and facilitation” of researchers. No mention of anything else, like the big publishers working with librarians to craft and adopt universal OA-status tagging code for discoverability.
* For “scholarly authors” he only suggests academics might become marketeers for their own papers. Frankly, this seems like a waste of their valuable time. Given the salaries that full-time research academics get, they can afford to hire a virtual assistant. To promote four or five papers a year outside of one’s own disciplinary niche, simply go to UpWork (or similar) and hire your personal marketeer at $180 a paper (to get someone of quality, for a day and-a-half of work). One could probably find a way to write the $900 bill off against tax each year. Of course that assumes one is publishing something worth reading, rather than academic shovel-ware intended to tick boxes inside one’s own institution.
* For the big “publishers” the article vaguely suggests they need to embrace openness. Though perhaps only in order to capture it for their own purposes, via a… “drawing-together of all the dispersed OA content silos into one place”. Well, for their own limited set of OA content, the big publishers can solve that on Monday morning if they really want it. They just have to allow the seemingly-stalled Paperity to import the OA-only article feeds of Elsevier, Brill, Degruyter, Wiley and others, so that Paperity has full coverage of all OA articles from the big publishers.
A new OA tool from the French, doai.io. If you’ve found a live DOI Web link that can only take you to a paywall article, then replacing http://dx.doi.org/ with http://doai.io/ will get a URL that tries to find a free version via BASE.
BASE is only middling for finding open access articles. It currently has 3.1m OA journal articles in English, with those being overwhelmingly in science, technology and medicine…
… but it’s reported that doai.io now also looks for the article posted on ResearchGate.
The doai.io coding was completed back in November and it’s only just gone public, so it’s early days. They don’t yet have a Web browser add-on that will automate the fallback from dx.doi.org to doai.io. One has to wonder if the same add-on, which would presumably be open sourced, would be quickly forked to also serve Sci-Hub (which at present only has a Chrome add-on, and no Firefox add-on).
An interesting discussion on UI and navigation for academic content discovery, from the recent “The Researcher to Reader” conference (London, Feb 2016): “Making Sense of the Flood: ways to curate content and adapt search to deliver serendipity in discovery” and the following Q&A.
Byron Russell, manager of Ingentaconnect, wants to search only for freely re-usable Open Access articles, but finds that ‘the Google moment’ for such a search hasn’t arrived yet…
Run a Google search on “Mendelian dominance open access” and the first two hits are for one publisher – the OMICS Group.
Judging from my Google Search results to recreate his search, what he actually tried to search for was: Mendelian dominance open access — without the quote marks. Difficult to see how such a loose search would find something worth having. But even if he’d then gone on to say… ‘so, we need to teach students how to search Google properly…’, his article’s point would have been much the same. Even using sophisticated Google search methods, one still gets mired amid a swamp of Powerpoints, K-12 lesson plans, student quizzes, wikis, high-ranking predatory journal articles and other junk.
JURN does a fairly good job with…
Mendel “dominance” “Commons Attribution” -noncommercial
Having Mendel without quote marks in that way, catches Mendel | Mendel’s | Mendelian | since Google automatically expands the name.
The target CC content, as currently found on OA journals via JURN, seems to reside almost entirely in PLOS, Pubmed, Springer and a few others.
But there’s more in the hybrid journals. So one can also approximate a main Google Search across the large publishers, Elsevier for instance, via something like…
site:www.sciencedirect.com/science/article/ “Commons Attribution” -noncommercial -“non-commercial”
For Oxford Journals it’s slightly different…
inurl:oxfordjournals.org “Commons Attribution” -“non-commercial”
(Google will probably flash up an annoying “captcha” to make sure you’re not a robot, at that point, if you’ve worked the examples down to this point).
And so on… one could just work through the larger publishers that way. For Springer most of the work has already been done by Paperity, although Paperity still lacks coverage of a couple of OA Springer titles.
It’s certainly not ideal, as Russell suggests. On the other hand, one might ask why someone needs to find just the CC-BY content on a topic. Perhaps it’s actually quite useful that a big publisher would find it difficult to automatically siphon all known CC-BY articles and books into its own giant repository, slap on some search, mining, overlay journal and themed book-compiling tools, and then sell access to it.
Hypothes.is lets visitors annotate your Web pages, via a pop-out sidebar filled with a Twitter-like stream of visitor comments/links.
It’s the perennial idea of re-inventing the classic footer comments box as a uniform annotation layer, something that has been tried many times over the past 20 years. Google ran such a tool for three years before closing it down. Such services tend to end up as dank wastelands filled with Viagra ads, troll spoor and link-rot.
But this time might be different. There’s a couple of somewhat workable-looking early W3C standards (more are on the way), new options for moderation and closed group working, and an impressive range of publishers and universities are now planning to discuss how social annotation might proceed for scholarship…
Our goal is that within three years, annotation can be deployed across much of scholarship.”
The ‘can’, not ‘will’, is probably because the big publishers like Elsevier et al are noticeably absent from the list of Hypothes.is academic supporters. I can’t see them liking the idea that an open commenting system is being laid over/into their content. The sidebar’s content seems to be outside the control of the page owner, so I could theoretically pitch up at an Elsevier $66 article paywall and say “there’s a free PDF of this article over at Site XYZ…”
So how does it work, at present? Imagine that someone took a Web page’s comments section from the bottom of the page, and instead put it into a standalone and uniform sidebar. Someone adding a comment also has the option to highlight a bit of text on the page, automatically hyperlinking their comment to it. Other visitors see the comments and the highlighted text. Obviously various Twitter-ish and Wiki-ish features could be added, but that’s the basic functionality.
A pop-out sidebar means that Hypothes.is can work with PDFs, and the Hypothes.is roadmap suggests that annotation of data / images / videos / ePubs could be on the way soon. So it seems Hypothes.is needs fixed browser-displayed content, located on a URL that’s never going to break — a natural fit with things like PDFs in repositories and digital libraries. But even in that relatively limited arena, who will do all the hand annotation, moderation, linkrot checking and repair need to keep such a service usable across a billion or more pages and documents? I somehow doubt that overworked and underpaid repository staff will be skipping through the library stacks with joy, at being told they must also become the herders of social media cats and the tamers of trolls.