Someone asked about what comes into the JURN index, when a title is indexed but only offers a limited amount of free full-text or “free-sample” articles. Does the rest of the online material (link-less tables-of-contents, abstracts with no full-text links etc) from the journal also enter JURN? The answer is: no, not usually. It’s usually possible to filter at the URL level so that only the free content enters JURN. For example, by only indexing URLS such as:

http://www.journal.com/journal/sample/*.pdf

http://www.journal.edu/journalABC/documents/*.pdf

A real-world example is:

http://www.egyptpro.sci.waseda.ac.jp/pdf*/*/*.pdf

Where “*” is the Google CSE wildcard. Of course if some dimwit IT techie then decides to juggle the directory structure, it will erase the journal from JURN. But that’s a risk any directory or search-engine takes.

Sometimes a few PDFs to do with society or journal administration matters can be called into search along with the articles, if all the PDFs sit indiscriminately in a single URL path. A search for:

site:http://www.scholarly-society-journal.info/ filetype:pdf

… will usually show if there are too many of these. Google tends to bunch that sort of material at the top of site: search results. Usually there are only a dozen or so.

It’s different with the few ejournals that cheekily use standard ‘open access’ publishing software, but which actually keep recent articles locked away behind a one-year or even three-year rolling paywall. The software is not intelligent enough to place paywall article abstract pages on a different and distinctive URL path, and then to automatically transfer&bounce these when the article becomes free. But by indexing only the .pdf path in such cases, that will usually call only fulltext articles into JURN.