You are currently browsing the category archive for the 'JURN's Google watch' category.
Google is now supporting the filetype:epub search modifier, for finding ebooks in the popular epub format. Google has fairly limited coverage of such files so far, at a reported 54,000 hits. Searches for Iliad and Wonderland only show a few epub editions of each.
Google’s Custom Search blog summarises design-oriented changes to the Google Custom Search service. Along with themes for formatting search results, one of the most interesting is auto-reformatting when Google detects you’re on a mobile device.
Google is apparently set for a massive jolt soon — of Caffeine.
Otto of Kyrgyzstan observes on his blog today that…
“While Google Books is much worse in Russian than in English it turns out that this is not the case with Google Scholar. To the contrary, Google Scholar is infinitely better in Russian than in English. It has a lot of full text journal articles. The number of scholarly journal articles available in their entirety on Google Scholar in Russian is much, much greater than on Google Scholar in English”
What’s odd here, is that there is no Russian Google Scholar, according to the Russian version of Google, and my search of the English Google. Anyone know different? I thought it might be a useful tool for discovering Russian journals that sometimes publish articles in English.
There seem to be some changes going on at Google Scholar, under the hood. First the RefWorks functionality recently vanished, and now all “Web search” links have vanished. Scholar used to place “Web search” under many links, offering a one-click method of searching the Web for the title/author in the hope of finding the full-text or a commentary. Now we have to manually copy & paste.
I found another recent book on Google Scholar — Google Scholar & More: New Google Applications & Tools For Libraries (Routledge, Oct 2008). It originally sold for a whopping $150.00, but Amazon has 26 used copies from $37. And, oddly, there seem to be not a single review to be freely found online, not even on the Amazon U.K. or U.S. pages for the book. So I’m not sure what all that says about the book’s usefulness, but I thought I’d mention it here for those who may be interested that it can now be had cheap on Amazon.
Google Images just introduced the ability to search for images tagged with usage rights…

Ever wanted to take the hassle out of re-typing a short quote, found on Google Books? Free OCR is a simple online OCR application that might help.
To test it, I gave it a very unpromising bit of text captured from Google Books using a standard screen-capture utility — slightly skewed, slightly fuzzy, in a non-standard typeface I’m willing to bet no-one has on their system, captured as a JPG at a mere 72 dpi, and just 500 pixels wide…

A few seconds after uploading, it gave me this…
ADVERTISEMENT.
Tms publication of the Works of Jomv KNOx, it is
supposed, will extend to F’ive Volumes. It was thought
advisable to commence the series with his History of
the Reformation in Scotland, as the work of greatest
importance. The next voliune will thus contain the
Third and Fourth Books, which continue the History to
the year 1564; at which period his historical labeurs
maybeconsideredtoterminate. ButtheFi&hBook,
forming a sequel to the History, and published under
his name in 1644, will also be included. His Letters
and Miscellaneous Writings will be arranged in the
subsequent volumes, as nearly as possible in chronolo-
gical order; each portion being introduced by a separate
notice, respecting the manuscript or printed copies from
which they have been taken.
It may perhaps be expected that a Life of the Author
should have been prefixed to this volume. The Life of
Knox., by Ds. M‘Cms, is however a work so universally
, known, and of so much historical value, as to supersede
l any attempt that might be made for a detailed bio-
Not perfect, but not bad for such a poor-quality capture. Stand-alone OCR software usually demands a huge TIF file, at 200 dpi or above.
The popular screenshot software HyperSnap v6 promises to do the same with its TextSnap feature, but for some unknown reason this feature just doesn’t work with Google Books or the captured image above. I suspect it can only handle text that uses system fonts.
So until we get a neat free OCR Firefox addon (which is a direction I would urge the makers of Free OCR to go in) then screenshot – save image – upload image to Free OCR is a viable and speedy workflow for OCR-ing fair-use quotes found on Google Book Search or other places that only offer plain page-scans.
Oh, and don’t bother doing this for books that are already in the public domain — since last month Google provides the full-text of these for download, and also serves it up via Google Book Search Mobile.
** Update: If you have Microsoft Office 2007 or higher, then I find that the included Microsoft OneNote works just as well for OCR on low-res images such as the one above. See the comments to this post for details.
Here’s a useful tip for those who want better precision while wading through the Google Blog search “blog bog”. The search modifier intitle: works with Google Blog Search.
Google News has just introduced a new feature to find articles written by someone, rather than about someone…
“If you spot an article by a specific journalist, you can click their name to bring up other articles they’ve written.”
With this and the Google News RSS feed, it’s now possible to set up a simple news feed for new articles from your fave journalists. Possibly in the elegant Firefox addon Feedly. Don’t forget to click “sort by date” before you grab the feed.
And since you can plug RSS feeds into pages, you could now set up a public Daily Something page, cut out the churnalist press-releases and just have a select band of top specialist journalists effectively writing for you. This is really going to annoy the newspaper publishers.
And you can also type a simple search modifier into the Google News search-box, e.g.:
author:”Matthew Parris”
And this can be combined with the source: modifier…
source:Washington_Times
Here’s a useful tip: Google’s intitle: search modifier only works if the search-results title/link uses the phrase. Google is not reading the article title from your metadata, but instead reading it from the links on a larger ‘upstream’ set of search results pages. For instance, searching for intitle:”The Searchers” Ford will not pick up…
”Home on the Range: Space, Nation, and Mobility in John Ford’s The Searchers“
The Japanese Journal of American Studies, No. 13 (2002)
…because the article appears in search results as…
![]()
As you can see, “The Searchers” has dropped off the end of the link to be replaced with three dots. So using intitle: doesn’t find it.
Article titles should be around 50 characters or less (inc. spaces), to fit comfortably on a Google link. Or a 500-pixel width blog column, for that matter.
Google Scholar is more forgiving, only hitting the same problem at around 100 characters. But JURN works like the main Google, and so users should be aware of the difference.
Following my own group-test, it’s interesting to see that Peter at Gale Reference Review has just published a detailed May 2009 review of three major academic search-engines. He takes a skeptical look at Web of Science (WoS), Scopus and Google Scholar. The article is rather long, but here are some interesting quotes…
“Google Scholar [...] reports implausibly high citedness counts for most items, which becomes quite obvious when tracing the purportedly citing papers”
“I looked at the widely touted figures in the promotional materials [ of WoS and Scopus and found ] they should not be taken for granted. Many of these are incorrect and exaggerated. Their compilation has been fast and loose, sometimes making them fiction rather than fact.”
“The coverage of arts & humanities [ in Scopus ] is extremely poor (representing barely 1% of the database) [ and by comparison ] Web of Science has about [...] 10 times as many for arts & humanities.” [ and even if Scopus gets a boost, as proposed, it would still only have ] about 1/6th of what Web of Science has for these disciplines”
“It is one thing that Scopus has no cited references in records for papers published before 1996, but it adds insult to injury that the pre-1996 papers are ignored. This results in absurdly low h-index for many of the senior teaching and research faculty members and independent researchers who published papers well before 1996 which have been widely cited in the past 25-35 years [...] Lazy administrators and bureaucrats stop here and ignore [ worthy people ] for some lifetime award”
Some fab new additions to Google Book Search:
* A drop-down menu to navigate directly to a chapter.
* A YouTube-like “embed this book” code snippet.
* Sort search results by “relevance”, as well as page order.
* Expanded Book Overview page, with reviews and more keywords.
There are a few more additions, only applying to public-domain books.
Interestingly the new contents listing doesn’t seem to wholly rely on a table-of-contents, since Google apparently has a new “structure extraction technology” which is being added to the mix.
Hannah Noll’s paper for her M.S. in Library Science degree, Where Google Scholar Stands on Art: An Evaluation of Content Coverage in Online Databases (PDF link, 300kb)…
“This [ 2008 ] study evaluates the content coverage of Google Scholar and three commercial databases (Arts & Humanities Citation Index, Bibliography of the History of Art, and Art Full Text/Art Index Retrospective) on the subject of art history. Each database is tested using a bibliography method and evaluated based on Peter Jacso’s scope criteria for online databases. Of the 472 articles tested [ * ] , Google Scholar indexed the smallest number of citations (35%), outshone by the Arts & Humanities Citation Index which covered 73% of the test set. This content evaluation also examines specific aspects of coverage to find that in comparison to the other databases, Google Scholar provides consistent coverage over the time range tested (1975-2008) and considerable access to article abstracts (56%). Google Scholar failed, however, to fully index the most frequently cited art periodical in the test set, Artforum International. Finally, Google Scholar’s total citation count is inflated by a significant percentage (23%) of articles which include duplicate, triplicate or multiple versions of the same record.”
* tested with a set of “article citations authored by a pre-selected set of art historians” via 12 names “culled from the Dictionary of Art Historians“, according to the paper. Authors had to be British or American, and born after 1925.
It’s interesting that Noll rejects keyword searches as a test measure…
“Searching by a compiled list of subject terms did not seem appropriate for testing Google Scholar. Google Scholar lacks a system of controlled vocabulary and search results reflect in many cases a full-text search of the document, whereas traditional databases only search the title and abstract keywords of a record.”
… yet Noll might have easily used intitle:”title of the article” with Google Scholar, to find specific articles. The intitle: search modifier is not mentioned in the paper. Instead Noll used a wider author search, then trawled the results for the target titles, but admits of this method of using Google Scholar that…
“some articles may have been impossible to find by using the author search.”
Google has just deployed a new Custom Search Element to Google CSE owners. This allows your users to do things like paste a JURN search engine box into their blog, and have it return results for their readers without having to leave the page.
Sadly, a hosted WordPress blog (like this one) gets all paranoid about security and strips out the code tags — and thus I can’t give you a demo here. WordPress.com really should whitelist all javascript that runs from www.google.com/. But should you have a self-hosted blog, it will work well — and the snippet of code you need to copy and paste is here.
It gives results like those seen below. One nice thing I’d like to see added to the GCSE would be the ability to preset the results by keyword. Thus at the end of a blog post about, say, Pygmalion and Galatea, I could paste in a JURN search-engine box atop a set of pre-run results for pygmalion galatea…

… but I guess that would never happen because then it would be used by blog-spammers to build fake blogs
The main Google search results now offer a new set of advanced search tools, via a drop-down left-hand sidebar…

The standard search-modifiers work with these new search types.
Most useful is the ability to sort results by the date at which they were located by the Google bot. No RSS feeds for this yet, but Feed my search offers something similar.
Second most useful is the ability to easily limit your search by time, searching only material from the last day, week or year. This was previously available via the Advanced search, but was fiddly. Now it’s a one-click option, integrated with the search results.
Other useful options allow you to view a custom Google Image search, with the images pulled from your search results, while retaining the search results alongside the images.
‘Wonder Wheel’ is a simple Flash-based ‘topic prompt’, which updates in real-time as you search. I can image that this might be useful for students.

The Timeline is quite impressive — but also potentially dangerous, if students take it at face value and don’t realise that it’s constructed ‘on the fly’ by a bot.

The ability to search through bona fide discussion forums might also be useful for those seeking to track the buzz about their products. This can be combined with an option to search only reviews. Reviews can now be marked up by page authors using open formats, to help search-engines.
