Archive.org offers books as unprocessed hi-res .jp2 image scan files, but once unpacked these don’t preview as thumbnails natively in Windows. Have you ever played “spot the page”, as a result, trying to find a picture among the pages? The quick solution to that problem is the freeware IrfanView and its JP2 (JPG-2000) Plugin pack.
The recent changes to the Google CSE services appear to have introduced another glitch. The problem happens when adding new URL entries into your Google CSE. For instance, you can no longer add…
… and reliably select “Include all pages whose address contains this URL”. Oh yes, the Dashboard will let you save it that way… but then go back and open the URL up again. You’ll see that the CSE dashboard has refused to accept the setting you gave the URL, and has instead defaulted the URL to: “Include just this specific page or URL pattern I have entered”.
The problem with this is that you didn’t explicitly enter http://www.nnns.org.uk/sites/nnns.org.uk/files/* With the * wildcard making the “Include just this specific page or URL pattern I have entered” functional. Without the wildcard, the http://www.nnns.org.uk/sites/nnns.org.uk/files/ URL is null and void on that setting, and may as well have not been added to your CSE.
This has only just started happening, and the “Include all pages whose address contains this URL” setting is sticky on entries made prior to about 24 hours ago. Which makes me think it’s probably a temporary glitch, inadvertently introduced during yesterday’s switch from three-options to two-options for settings on individual URLs.
If you’re working on a CSE over the weekend / Bank Holiday (UK), you should be aware of this problem, as it probably won’t be fixed by Google until early next week. You’ll probably want to keep a .txt file of all the URLs you add which you have to use a /* for, because you may need to manually change them back once the problem gets fixed.
Facebook Photos & Images Size Guide – 2017. “1640px wide and 624px” is suggested as optimal for headers.
The Internet Archive (and its Wayback Machine) has recently announced it has stopped honouring robots.txt files for some sites. A robots.txt is a simple text file which tells visiting crawler and harvesting bots that the site owners don’t want their content accessed, copied and (potentially irrevocably) made public somewhere else without their permission.
The Internet Archive is currently… “ignoring [robots.txt warnings at] U.S. government and military web sites”, and state that in future… “We are now looking to do this more broadly.”
This would seem to have a number of implications for repositories and journals. Especially in terms of things like retractions, ‘heavy harvesting’ of large numbers of large files, and also the practical implementation of the emerging legal concept of ‘the right to be forgotten’.
To anticipate this impending policy change at Internet Archive and to block their crawlers, you reportedly need to set up a way to “limit access by IP addresses” from the IA, and/or configure your site to block visiting clients named “ia_archiver”.
If you can’t do that — at first glance it looks a lot more complex than simply uploading a plain robots.txt file — then note that they say they will… “respond to removal requests sent to email@example.com”. The latter option may be of special interest to hosted wordpress.com blogs and similar sites, which have no means of blocking the IA’s crawlers.
If you use torrents and your torrent software uTorrent has suddenly been overrun with slow-loading banner adverts and nags, I can recommend the free open source qBittorrent. Almost the same as uTorrent in terms of the interface, and it takes about half an hour to swop over if you’ve been seeding a half-dozen or so torrents. Note qBittorrent’s ability to set time-of-day on upload/download speeds, so as to automatically increase them across all torrents at times when you will be away from your PC.
Keep in mind that, after uninstalling uTorrent, it leaves behind a big backup cache of all your downloaded .torrent files in C:\Users\USERNAME\AppData\Roaming\uTorrent
Trello Inspiration, a fine survey of all the different ways in which one can use the excellent and free Trello service. They missed out magazine and journal production, though.
For your Web page, here’s strong anti-framejacking and anti-clickjacking code, which has been tested and currently busts nasty frame-jackers such as In.is (aka Linkis). As such these snippets may be useful for journals and other academic services, to prevent legitimate content from being hijacked and surrounded by frames advertising ‘essay-writing services’ or predatory publisher services or worse.
Source: Stanford Security Lab via a recent blog post by Zipline Interactive, where there’s also additional defensive code to add to your website’s root .htaccess file (if you have FTP access and your host will allow upload of a changed .htaccess)…
Header set X-Frame-Options SAMEORIGIN
The .htaccess code is ‘as well as’, serving as a second line of deeper defence, and is not required for the first code suggestion to work in your Web page. Most modern Web browsers understand the self-explanatory SAMEORIGIN command when they hear it from a website.
Those with a hosted WordPress blog or journal may also want to consider the Frame Buster plugin. So far as I know there’s nothing similar for the Open Journal System (OJS) or Omeka or similar academic content plug-and-play systems. But perhaps there should be, if they don’t already have such counter-measures baked in?
Search Console – Submit URL to Google. Handy to bookmark if you find or own a site that is missing from Google, and where there’s no ../robots.txt on the URL to block the indexing bots.