Odysci Academic Search is aimed at allowing…

“technical professionals and companies to find and use the relevant technical information” in “computer science, electrical engineering and math-related areas”

Their blog entries tail off and stop in 2011, so it’s been around for a while, but the developers have a new paper which describes the technical infrastructure and gives the algorithms. I was interested to learn that…

“This framework is able to import, de-duplicate and persist 200K papers in the database (and all their entities) in 16 hours on an i7-based workstation with 32GB of RAM.”

My broad test search for…

   “energy conservation” organizations

… gave me 44 results which included three fulltext links. That suggests that when Odysci imports records, there might be PDF links on less than 10% of those records?

On that basis I would guesstimate an ability to ingest, strip and process perhaps 20,000 fulltext PDF papers every 16 hours, give or take? So in terms of making a standalone JURN, give me six such PCs and the bulk of the humanities journal indexing might be done in… six months? Keep in mind that processing power is increasing (the Core i7 CPU line was introduced in 2008). If Odysci’s i7 is a circa-2008 CPU then more modern processors will do the job faster, and superfast broadband would speed up the actual PDF downloading.

The same search in JURN tended to foreground papers on the role of human behaviours/attitudes and public policy in organizational energy conservation — rather than the technical aspects of electrical implementation. That suggests that — despite the recent science additions — JURN will tend to veer toward ‘the human element’ of topics. I also ran the test in Google Scholar, which proved to have the same veer, though with a heavier emphasis on articles from Psychology.