A new Canadian commercial start-up is offering its new oaFindr service, with free / low-cost trials for university libraries. oaFindr is said to be able to explore a library’s existing journal subscriptions, and to identify just the open access articles within the hybrid journals. According to the press release oaFindr…

… enable[s] academic institutions to analyze their journal subscriptions and provide[s] them with a reliable, precise search and discovery tool to retrieve all open access articles. This solution will also help them comply with governmental open access mandates, and support them in rapidly increasing the diffusion of their institutions’ scholarly production in a manner that is much less labour-intensive”

The idea appears to be that the discovered OA articles are then harvested and passed to the company’s related oaFoldr service, with oaFoldr providing a conduit into their hosted repository for the OA articles. Nice if it works and gets adopted and, if public, it would provide a welcome new mega-repository for Google and JURN to index. Alternatively, I suppose that the oaFoldr may just be a private folder for cataloguers, in which the articles reside before being placed into the university’s own repository. More likely to be the latter, since otherwise one commercial company could potentially get to corral the world’s OA article output in its own repository, and would then be in a position to sell it back to universities via an enhanced search and mining/metrics service.

Regrettably, as Bernard Rentier observes, mass extraction and archiving of 1000s of OA articles per month from commercial databases may not be welcomed by the big publishers…

Elsevier has designed a way to prevent researchers from mass-downloading articles from its website where they are so-called open access…”

So how would universities harvest efficiently? Bear in mind that commercial licenses may also prevent a university from taking the proprietary hybrid journal metadata from the likes of Elsevier, Springer, Oxford etc, along with their OA fulltext PDFs. So I guess it’s much more likely that each institution will play safe and harvest only PDF articles by their own researchers, thus giving a much lower harvesting volume that might not trigger download blocking. And that they’ll find ways not to take any metadata generated around the OA article by publisher databases.

I wonder if some large institutions may have to harvest articles via spoofing multiple ‘student’ accounts? Or is oaFindr itself pre-harvesting OA PDFs from hybrid journals and then vending them to institutions along with metadata? Probably not, or the big publishers would likely be throwing lawsuits at the company. oaFindr seems more likely to be a sort of super-Paperity, but covering all hybrid titles from the big publishers plus all the DOAJ titles at the article level. I’m guessing a lot here, or course, but if such a service works then it would be rather cool. Though probably lacking in things like Google-strength semantics and relevance ranking.

So let’s assume that the university libraries are the ones that do the work of harvesting OA PDFs for their repositories. OA mandates and the consequent exponential growth of OA articles may still lead to the hitting of a ‘mass downloading’ roadblock in the near future, even at a university which restricts itself to its own outputs and/or harvests fulltext via multiple accounts. Big publishers might even change their database small-print, so as to forbid ‘type targetted’ mass harvesting leading to local storage of articles.

I guess one solution would then be to rely only on having repository records + Web links to the fulltext (fulltext hosted back on the journal’s website). Though that assumes that links don’t break. Which they do, and at a horrendous rate.

In the end I suspect it may just be easier for a university to go after its research staff with pitch-forks, and literally force them to upload their OA papers to the university repository. If your new paper isn’t in the repository after 28 days, then your next month’s salary gets docked 20% and your department can’t apply for any new funding or external partnerships in the next six months. That sort of thing.

Update, Nov 2017: OAFindr is now called 1Findr.