New at Liber Quarterly, “How reliable and useful is Cabell’s Blacklist? A data-driven analysis”.
An interesting idea: a meta-pedia browser add-on, to consult all public encyclopedias at the same time. Presenting the results as an elegant full-screen dashboard with a strip of side-links to Google Books, Archive.org, Scholar, JURN etc. Ideally with configurable sources…
* Current Britannica
* 1911 Britannica
* Specialist public encyclopedias is they exist for the topic, e.g. Philosophy, Catholic, Science-fiction etc.
I’m assuming this would need to be a browser add-on, as a cloud service that did this would face lawsuits and frame-busting scripts. The closest I can find is 2019’s free ResearchKit which shows Wikipedia and the current Britannica side-by-side, above bot-driven auto-summaries of their text. It’s not exactly elegant to look at it, but it works.
Obviously some fuzzy-lookup might be needed to align search topics, though the individual encyclopedias strive to do that on their pages via navigation strips and links.
But rather than jumping straight to a presumed page, perhaps each encyclopedia panel might first show sub-panels with a half-dozen ‘possible’ hits, colour-shaded by order of likely relevance to the search. If such a browser addon was in widespread use, the data gathered from such mass human-driven topic-selection/alignment might be rather useful, over time being judiciously used to augment existing ‘knowledge navigation trees’ that are able to cope at a meta-level with shifting topic titles (e.g. Aetheopia > Abyssinia > Horn of Africa > Eastern Africa > Ethiopia).
Another way to do it might be for the addon to ‘read’ such existing navigation on the encyclopedia pages, make its own deft distillation of such, and then use that to ‘prime’ with keywords the sidebar links to Google Books, Archive.org, Scholar, JURN etc.
Martin Paul Eve has a new post on Zotero and auto-downloading open access books…
all I really wanted was to be able to embed an ISBN and a citation_pdf_url and have Zotero do the lookup and save the file. However, out of the box there is no easy way to do this.
His test book is quite interesting, his own new Close Reading with Computers: Textual Scholarship, Computational Formalism, and David Mitchell’s Cloud Atlas (April 2020), which applies textual computing to the science-fiction-philosophy novel Cloud Atlas.
I don’t know about or use the current version of Zotero, so I’m unsure what advantages it confers. I assume Eve intended to find a way to automatically harvest all CC-SA books in PDF, and build a local collection for automated analysis.
But I see his book is already on the OA book aggregator catalogue OAPEN. Theoretically then, since OAPEN is comprehensive and timely, one could have a harvester look at all the pages hanging off library.oapen.org/handle/ and save out only those pages with the required permissive CC “Rights” label on them. These pages each have a uniform PDF link URL in their HTML, in the form of library.oapen.org/bitstream/ and these could be easily extracted to a list. One would end up with a set of PDF links for a linkbot, ready to download to a local folder for computational analysis. I presume that’s what Eve intended to have Zotero do.
One would need to reference the OAPEN record page first, in the way I’ve suggested, since the PDF itself can have different or non-uniform or contradictory licence information. For instance in its interior Eve’s book is labelled as both “©” … “No part of this book may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying and recording, or in any information storage or retrieval system without the prior written permission of Stanford University Press.” and also “Creative Commons Attribution-ShareAlike 4.0”.
How many items on OAPEN have a creativecommons.org/licenses/by-sa/ “Rights” label at present, as Martin’s book does? A Google site: search suggests around 650 titles. Half an hour of my filtering the OAPEN CSV suggests it’s actually just over 3,000 under some form of permissive CC that permits commercial use. That’s still a manageable harvest at present. But as the supply of OA books and monographs grows rapidly, the likely result of various OA mandates in the near-future, it might be a useful time-saver for text-miners and digital humanists if OAPEN were to maintain a single torrent of all the PDFs. Inside which a half dozen folders would neatly organise the books by CC licence type. Such a one-click solution might save a lot of faffing around with digging into and filtering their XML and CSV feeds, wrangling with harvester scripts and timeouts, or trying to wrestle with third-party services such as Zotero. A torrent could also save OAPEN’s bandwidth.
Issued yesterday from President Trump’s office, but so far unreported in the virus news I’ve seen…
“The U.S. Coronavirus Task Force leader, Dr. Kelvin Droegemeier, and government science leaders including science ministers and chief science advisors from Australia, Brazil, Canada, Germany, India, Italy, Japan, the Republic of Korea, New Zealand, Singapore, and the United Kingdom are asking publishers to make all COVID-19-related research and data immediately available to the public. … Science leaders requested that existing and new articles be made available in machine-readable format to allow full text and data mining with rights accorded for research re-use and secondary analysis.”
Mindshare UK’s The Future of Search (full report, free in public PDF) for those who use smartphones in the UK…
we tracked people’s search behaviour using ethnography, face-to-face workshops and neuroscience experiments surveying 1,800 UK smartphone users.
Access to academic libraries: an indicator of openness? (March 2019)…
academic library policies can place restrictions on public access to [such] libraries. […] This paper reports on a preliminary study [and finds that] physical entry and access to print and electronic resources in academic libraries is contracting. […] Most affected is the general, unaffiliated public.
initial sample for the study was fourteen medium to large research universities in Australia, Brazil, China, Hong Kong, Mexico, Singapore, South Africa, Taiwan, the United Kingdom and the United States.
Qresp, an open source tool for the automated collection, bundling and distribution of all supporting data and data-sets for a journal paper. Apparently it also auto-adds the required metadata and public discovery enhancements.
The new OOIR List. Currently with 849 journals in its List, these being from Web of Science’s SSCI journals in social studies. 119 of the titles on the OOIR List are flagged as Open Access, though a good number of these are greyed-out and not tracked (because they don’t bother to also submit to CrossRef).
Evidently Web of Science only covers 119 such OA titles, which means its OA coverage in this area has hardly budged since 2015 when Web of Science was only showing 116 titles in OA in social studies.
Within that very limited range, what OOIR is trying to do with its titles seems interesting, by providing an aggregated ‘latest’ / ‘trending’ / ‘active journals’ dashboard. It’s neatly presented, and there are also per-journal metrics over on the Statistics tab.
Apparently the service is focussed on recent papers, and “OOIR does not link to papers published before Nov 2018”. A previous RSS-feed based version, for politics and diplomacy, was titled Observatory of International Relations (OIR). But this has now been shut in favour of OOIR.
I guess the question now is, would it be possible to build something bigger and similar and slightly shinier, that could provide a public tracking-dashboard for all such material of use to those interested in timely new research on politics, diplomacy and related matters? Zak Kallenborn has some ideas on that in his recent article “Academic Paywalls Harm National Security”.
provides persuasive evidence that specific enhancements to technical aspects of a repository can result in significant improvements to repository visibility […] traffic to Strathprints from Google and Google Scholar was found to increase by 63% and 99% respectively.
Another new prodding of Google Scholar, this time from the latest First Monday “Testing Google Scholar bibliographic data: Estimating error rates for Google Scholar citation parsing”…
While data quality is good for journal articles and conference proceedings, books and edited collections are often wrongly described or have incomplete data. We identify a particular problem with material from online repositories [where there appears to be] considerable inhomogeneity in the implementation of data standards [and] a mismatch between repository software and the harvesting protocols employed by Google Scholar.
One of Scholar’s other problems is that it includes Google Books results. While 30% of the time its Google Books inclusions can useful, there is no way to exclude Books results. One might want to exclude because Scholar still can’t seem to determine a proper book from a robot-produced shovelware ebook that assembles public-domain content. Scholar has no ‘edition authority’ which states that the Joshi-edited and annotated Penguin Classics edition of H.P. Lovecraft’s “Dexter Ward” is the gold-standard and that it has a text that has been fully corrected of the many textual errors, omissions and editing mistakes of previous decades. Unlike the public-domain shovelware ebooks that flood Amazon and (often) Google Books.
A basic undergraduate level search, for instance, for Lovecraft “Dexter Ward”, demonstrates the problem on the first page. Joshi is nowhere to be seen, and the searcher is hammered by links to shovelware ebooks (or worse), often with citation counts that suggest they are legitimate.