Sad to say, but the three Google Search harvesting utilities from Sobolsoft no longer work on Windows 7 with Internet Explorer 9. The utilities are: Google Save Search Results; Google Extract Data & Text; and Excel Import Multiple Google Search Results.
I’m guessing that to use these today one would need to blow the dust off an old Windows Vista PC with something like IE6 or IE7 installed, although the problem might be due to newer versions of Visual Basic runtimes or similar. The utilities don’t run well on Windows XP (I tried one, on an old laptop) because the GUI layouts are truncated in it, vital ‘save’ buttons are unreachable, and the software can’t be re-sized.
Among possible fallback options, none of the Google URL Harvester scripts for Greasemonkey work now. The clunky Outwit Hub still can’t seem to get past Google Search’s URL obfuscation and other clutter, making it fairly useless for the task. SEO software like URLHarvester and Scrapebox doesn’t seem to care about link titles or extracts, just raw URLs and PageRank.
Still working is the basic per-page method of using Multilinks (Firefox) or Linkclump (Chrome), and then my combinatory Excel spreadsheet.
** Update: found a new, free way to do it, that also harvests snippets.
True, the Web is an ever-changing universe. Google, in particular, which often alters its code. I sometimes have to modify my scrapers, but I do seem to have more luck than you with outwit hub: it finds the links pretty consistently. I’ve become somewhat of an addict, in fact. I use the links or list tabs when I am in a hurry, the guess function is not bad but less reliable. In links, you can hide local URLs and what is left is a complete list of the result links. When I want to be sure of what I extract from SERPs or any kind of source (including AJAX), I make a scraper and it works like a charm.
Thanks AlJo. I’ll take another look at it. Is there a good 2011/12 step-by-step tutorial you know of, for using OutWit Hub with Google Search, and a set of scraper examples that work with the current Google Search? It just seems way over-complicated and unpredictable – when Sobelsoft could harvest Google Search in a much simpler, quicker and more intuitive manner.
Yesterday I found a free basic Google Search URL ripper from 2007, that still works fine, albeit with no in-built delay. But it doesn’t grab the link title text.
There are a series of tutorials in the latest version (2.1). I haven’t looked at them all but some explain how to make scrapers.
The links extractor is super simple to get the links in one click and it works pretty well if you uncheck local links.
A scraper is more powerful. I don’t know how to attach a file. But you can type this in the scraper editor.
- Line 1
description: URL
marker before: /]*href=”/
marker after: ”
- Line 2
description: abstract
marker before:
marker after: /]*>/
That’s pretty generic and it works for now.. until a next change
. I’ve put regular expressions, so it looks a bit complex but it’s more concise. It could be done without.
I’m not sure if the light version works with AJAX pages. I know the pro version does and you must set it to use the dynamic source code or you will only be able to get the results of the first page.
AJ
Oops, looks like the code didn’t make it through…
It was supposed to read:
- Line 1
description: URL
marker before: /<h3 class=”r”><a[^>]*href=”/
marker after: ”
- Line 2
description: abstract
marker before:<h3 class=”r”>
marker after: /<br[^>]*><\/span>/