There’s now a unified directory for Google’s open source projects, opensource.google.com.
But I’m very pleased to see that the search engine DuckDuckGo now offers Bing-like feed:keyword searches and seems to do so rather well. Unlike Bing, DuckDuckGo even offers a “Past Week” option on such searches. Though it’s not so useful. Because the results are “we crawled this in the last week, but it hasn’t updated since 2012”, rather than “wow, the feed updated with juicy new content in the last week”.
Searches are however aware of the feed’s content as well as the simple fact that is a feed. Since feed posts are dated, this means that you can approximate a ‘recent’ search with:
feed:keyword March 2017
feed:deadline conference history university March 2017
Very useful for those who need to find timely new content, drawn only from sources highly likely to be dedicated to pumping out such content. Although on a simple search you will still get tangled in feeds that don’t restrict themselves to ‘last 20 posts’, and instead pour in years and years of posts. Using an additional -2016 seems to knock out such over-long feeds, at the cost of omitting some feeds that may be useful. feed: also accepts a ‘nuke-from-orbit’ command such as -2010~2016.
You can also do feed:“keyword” to prevent annoying word-juggling (e.g. search for stoke, see results for stock) or to add phrases.
Firefox browser users may not get the feeds to display prettily as a browser page, when you start clicking on the search results from such a DuckDuckGo search. This may have been because you reset your Firefox RSS preview (‘Live Bookmark’) functionality some time in the past. This may have been done because it’s apparently been somewhat insecure to preview RSS feeds inside Firefox until a security fix in version 51, the current version being 52. So security-minded users may have passed RSS feed subscription handling straight to a dedicated desktop reader, such as the excellent free FeedDemon. To undo such a change go: Tools | Options | Applications | Web Feed | and switch back to ‘Preview in Firefox’.
You’ll then get an in-browser page-like preview of the RSS feed, whatever format it comes in (it appears Firefox can tell an .xml feed from an “.xml document”). The Firefox preview page will still offer you an option to send the feed to your main feed reader.
It looks like Google is set to make some back-end changes to how its Custom Search Engines work, to better suit mobile search users…
“The search space is evolving rapidly and we want to make sure that CSE continues to evolve to meet the needs of your users, whether they are visiting from desktop or mobile devices.”
This means a need for a few API changes, from April 2017. Linked CSE’s will have to be done via the CSE Control Panel (I thought they were already, in part) in future, as will Dynamic Link Extraction. None of these API changes affect JURN. Hopefully the Linked CSEs changes may even make it easier for me to set up new search side-projects for JURN.
The fine search-engine DuckDuckGo is getting sort-by-date filters and website sub-section links very shortly.
Also, Google is now back to honouring site: searches in full. Over the last month or so, a site: search (with no additional keyword or phrase) only ever returned one lone link. Now the full set of links is showing up again, as they used to.
And Yandex has started enforcing word substitutions, when it ‘thinks’ a word is spelled incorrectly. This change makes Yandex useless for academic search, because there’s no way around it. For instance…
“Google wins long US court battle” over Google Books…
Google’s massive book-scanning project has cleared what may be its final legal hurdle, with the US Supreme Court denying an appeal that contended it violates copyright law. The top US court on Monday denied without comment a petition from the Authors Guild to hear the appeal of a 2013 federal court ruling seen as a landmark copyright decision for the digital era. […] Google said in a statement after Monday’s decision, “We are grateful that the court has agreed to uphold the decision of the Second Circuit [appeals court] which concluded that Google Books is transformative and consistent with copyright law.”
Thanks to The Register for pointing out a search-engine that’s new to me, the Russian Lukol.com. Lukol claims to be a wholly anonymous search engine. So how is Lukol different from the non-tracking and privacy measures offered by DuckDuckGo? Lukol claims that…
“When we obtain enhanced results from Google, we tunnel your search query through our proxy servers, without exposing your search data.”
I’ve given it a quick test, and it seems to work fine and supports filetype:pdf. Basically it seems to be Google Search + URL-matched pictures + news down the side. I’m thinking Lukol might be useful for academics who want to search the Google Search index, in-depth and in very complex ways, without triggering anti-robot countermeasures from Google’s bots?
The Evolution of the Web is a very elegant interactive timeline of the browsing software and hardware storage capacity, from Google. It would be interesting to see a ‘social impact’ / ‘economic impact’ / ‘cultural impact’ version of this.
JURN search results seem to have changed a bit recently. Specifically, those that are obtained with the popular intitle: search modifier. It seems Google is now running intitle: against the actual document title, rather than against the text that forms the hyperlink.
For instance, search via JURN for…
… and some of the ’10 blue links’ titles returned will lack the word ‘turtles’ in them. I don’t remember that happening before. Did Google break? It seems not — loading up such links shows that the fulltext does indeed have ‘turtles’ in the article title (the title that heads the actual document).
This means that users of JURN should not overlook intitle: results links that seem to lack their desired keyword or phase.
My guess is that Google Search’s document title identification and extraction is improving, behind the scenes. But that the results server is told not to waste good computational time, and so is free not to plug each and every article title into the results links. Maybe the Googleplex figures that anyone smart enough to use intitle: will pretty soon figure out that all the search results need to be considered when using intitle:, whether or not the desired intitle: keyword appears in a blue link or not.
I also noticed that Google may even be truncating the oh-so-hip preambles that are common on academic article titles in the arts and humanities. For instance, the results link that in 2009 appeared in JURN worded as…
“Home on the Range: Space, Nation, and Mobility in John Ford’s The …”
… now appears in JURN simply as…
“Space, Nation, and Mobility in John Ford’s The Searchers”
I haven’t tested this very extensively, but if I’m correct it may be more evidence of Google getting better at article title identification and manipulation, and/or at weighing a long article title against the article’s abstract. Something like…
SPLIT document title on “:”
MATCH both sides of “:” against the article abstract
IF the words that occur before a “:” DO NOT MATCH words in the abstract
THEN truncate the document title before “:”
The snippet below a JURN results link is also starting to be a citation of sorts, in certain circumstances…
Author surname in capitals, even. Very nice, even if it is taken from the document’s own formatting rather than from some new gee-whizz improvement in Google. Journal editors, and those slapping generic cover pages on repository PDFs, might do well to check out that particular PDF article’s front page and seek to duplicate its simplicity. Since it obviously plays so well with Google.