Following on from my previous post… a search for “open access” was discouraging. There are about twenty “living-dead” Custom Search Engines from 2006, but no large ones updated after 2006 (so far as I could tell from a quick visit).

Pouring out all this open access content is all very well, but where’s the competition and development in open access search?

And where are the simple common standards for flagging open content for search-engine discovery and sorting, for that matter? Judging by the structure and look of most academic repositories, internet search-engines are the last things on their minds.

Now of course I’m viewing things from the outside, as an independent curator and social entreprenuer, not a librarian or OA evangelist. But it seems to me that burying your PhD thesis deep in a repository cattle-car — seemingly with only a few keywords, an ugly template and an impenetrable URL for company — isn’t serving it or the author very well. Especially in terms of metadata and tagging leading to full-text search discovery. As the authors of “Experiences in Deploying Metadata Analysis Tools for Institutional Repositories” recently wrote in Cataloging & Classification Quarterly (No. 3/4, 2009)…

“Current institutional repository software provides few tools to help metadata librarians understand and analyse their collections.”

Which doesn’t bode well for search-engines aiming to hook into and sort the same metadata. That sort of statement might have been acceptable in 1999, but it’s a damning statement to hear from librarians in 2009. And another paper in the same issue concludes that there is…

“a pressing need for the building of a common data model that is interoperable across digital repositories”.

Now I wouldn’t know a Dublin Core from a Dublin Pint, but how difficult would it have been to build a search-engine friendly tag that allows a repository to tell the world “this is a root free-to-all full-text file” and “you’re not going to get any full-text for this title”? Or to allow the “one-click” filtering out of science and medical-related OA material across search results from a thousand repositories?

This could be done at the URL level. For example by using a standard universal URL structure that could be read by machines and humans alike. For a journal it might run something like:

Where preindustrial_water_mills are the first three words of the article title.

Without even accessing the document, a human can now glance at the URL in search results and read off:

   Journal name (Technology History)
   Issue number (Number 4)
   It’s from a journal
   It’s free full-text
   The year published (2009)
   The author surname (Adams)
   The first three words of the article title (“preindustrial water mills“)

For a repository it could look something like:

And with a uniform standard for URL structures, university IT techies would not be allowed to fiddle with the directory structure and thus break the URL. All full-text files in U.S. repositories could then be searched simply by indexing one line:


Anyway, rant over. I did find a large Google CSE for Economics. Not much use for the arts and humanities you might think, and last updated in 2006, but due to its sheer size (23,613 sites from apparently reputable sources) searches for…

“creative economy” keyword

“creative industries” keyword

“art market” keyword

… all seem to show it still has some use as a discovery tool.