(Update: January 2011. This tutorial superseded by a new and better one)
1) Have Google run your search in advanced mode, selecting “100 results on a page”. If you prefer Bing, choose Preferences / Results, and select “50 on a page”.
2) Run the search. Once you have your big page o’ results, just leave the page alone and save it locally — doing things like right-clicking on the links will trigger Google’s “url wrapping” behaviour on the clicked link, which you don’t want. So just save the page (In Firefox: File / Save Page As…), renaming it from search.html to something-more-memorable.html
3) Now open up your saved results page in your favourite web page editor, which will probably add some handy colour-coding to tags so you can see what you’re doing. But you can also just open it up in Notepad, if that’s all you have available. Right click on the file, and “Open with…”.
A hat-tip to richarduie for the original script. I just hacked it a bit, so as to output the results in handy comma-delimited form.
5) Now locate the start of the BODY of your web page, and paste in this code after the body tag…
Save and exit.
6) Now load up your modified page in your web browser (I’m using Firefox). You’ll see a new button marked “Extract all links and anchor titles as a CSV list”…
Press it, and you’ll get a comma-delimited list of all the links on the page, alongside all the anchor text (aka “link titles”), in this standard format…
Highlight and copy the whole list, and then paste it into a new Notepad document. Save it as a .csv file rather than a .txt file. You can do this by manually changing the file extension when saving a file from Notepad.
7) Now you have a normal .csv file that will open up in MS Excel, with all the database columns correctly and automatically filled (if you don’t own MS Office, the free Open Office Calc should work as an alternative). In Excel, highlight the third column (by clicking so as to highlight its top bar) , then choose “Sort and Filter” and then “A-Z”…
You’ll then be asked if you want “Expand the selection”. Agree to expansion (important!), and the column with the anchor text in it will be sorted by A-Z. Expansion means that all the columns stay in sync, when one is re-sorted like this.
Now you can select and delete all the crufty links in the page that came from Google’s “Cached”, “Similar”, “Translate this page” links, etc. These links will all have the same name, so by listing A-Z we’ve made them easy to delete in one fell swoop.
8) You’re done, other than spending a few minutes ferreting out some more unwanted results. Feel free to paste in more such results from Bing, de-duplicate, etc.
If you wanted to re-create a web page of links from the data, delete the first column of numbers, and then save. Open up your saved .csv in Notepad. Now you can do some very simple search and replace operations, to change the list back into HTML…
(Note: you can also use the excellent £20 Sobelsoft Excel Add Data, Text & Characters To All Cells add-in for complex search & replace operations in Excel)
Ideally there would be free Firefox Greasemonkey scripts, simple freeware utilities, etc, that could do all of this automatically. But, believe me, I’ve looked and there aren’t. Shareware Windows URL extractors are ten-a-penny (don’t waste good money on them, use the free URL Extractor), but not one of them also extracts the anchor text and saves the output as .csv.
Yes, I do know there’s the free Firefox addon Outwit Hub, which via its Data / Lists … option can capture URLs and anchors — but it jumbles everything in the link together, anchor text, snippet, Google gunk, etc, and so the link text requires major cleaning and editing for every link. Even with the hit-and-miss home-brew scraping filters, it’s not a reliable solution.