How to resize pages in a squished PDF

Sometimes you get a PDF where the page is “squished”, as seen here…

Bad, some dunderhead saved the pages with slightly wrong proportions and didn’t notice.

Good, as it should be.

It can also happen when ebooks files are being bulk converted to .PDF files. It’s often especially noticeable where there is artwork with faces. The slightly “squished” or “stretched” result is locked in a PDF file and is difficult to change. It’s no use trying PDF tools that only scale a page proportionally, or simply crop the page, or will re-print from U.S. Letter size to UK A4 size etc. Because you only need to change each page along one dimension, not along both.

There are three or four online tools for fixing this in a PDF, though that’s not much help if you have a 200Mb PDF and a very slow upload speed, or are offline. Or have 50 such files to process. Or if your business has a mission-sensitive document you’d rather not sent to Whereizitagin. The full paid Adobe Acrobat can also do the repair though in a clunky way, from Adobe Acrobat DC (2015, not to be confused with Adobe Reader) onward, via fiddling around with Preflight and following a convoluted recipe.

Are there any fast Windows desktop options? I found and tested three working possibilities, one free.

1. The free and trusty Irfanview can open PDFs (with the free Ghostscript and free plugins pack installed). This combo can together open and page through PDFs. Irfanview can even resize the first page in an unconstrained way, so you can work out what your re-size dimensions need to be. Sadly it can’t then flow this resizing over to all subsequent pages. Instead it can at least automatically save out all the pages as .PNGs or .JPGs, then you’d open their output folder and batch resize them with Irfanview. Then you’d re-compile them back to a .PDF file, or zip them into into a Comic Book .CBZ file.

2. Apex PDF Page Resizer did the job easily and perfectly, although it’s expensive at $20 via FastSpring. Over-priced, for a one-trick-pony that won’t be used too often. There’s a 30-day trial with only a light watermark.

3. Advanced PDF Tools at $38. Twice the price it should be, but it does the job after a quick bit of fiddling with the settings. As you can see here, you scale the Page Content by a % and then pad in pt’s to accommodate the added width or height. It’s a bit more hit-and-miss than Apex.

As you can see, you’re getting many more features than Apex PDF Page Resizer. But the very fast output speed and exactly the same file-size in output suggests it is working in much the same way as Apex, probably via a .NET Windows GUI that gives a pipe into several key Ghostscript switches.

In both, the settings are then run across all pages, and a new repaired .PDF is swiftly saved out. It strikes me that such a relatively slight change could be one way of detecting a leaker in an organisation. Give each person a .PDF copy with very slightly widened or lengthened pages, such that each imperceptibly changed .PDF is unique to one person.

I looked hard but could not find anything with a GUI for Windows that hooked into Ghostscript’s resizing and scaling switches in the same way as the above two, but for free. pdfScale: Bash Script to Scale and Resize PDFs using Ghostscript came closest (see the scripts at the end) and may interest some.

If you just want to crop pages to a user-defined rectangle, including instances where you have several columns on the same page, the free Briss is well recommended.


(If you have a related problem, a PDF that shows the curved pages of a book as photographed from above with a hand-held camera, see my recent How to auto-correct curved book pages post)

How to delete search-box auto-suggests in the Opera browser

Well, here’s a handy trick for users of the Opera Web browser, and possibly of any other Chrome-based browser. Do you have a lingering and slightly annoying search-box autosuggest, which occurs on non-search websites? Such as on one of your WordPress blogs…

If so, then it’s no use searching in Opera’s Settings | Advanced | Privacy | Autofill. Only things like home mailing addresses and passwords live back there.

What you do is move your mouse cursor down to highlight (but not click) the offending suggestion, when it occurs in normal use. Then it’s hands-off your mouse, to press SHIFT and then DEL (delete) simultaneously on your keyboard. This removes the offending suggestion.

Get an RSS feed for any YouTube channel

YouTube has removed the ‘Export RSS feeds list’ option from your Subscribed Channels List. It used to be that it was at the foot of the page. The link to this feature is now nowhere to be found.

For the time being the RSS feeds are still there and working, however. A standard subscription URL is in the form of…

https://www.youtube.com/channel/UCralF3lNmSNYFaFtul5apuw

.. and a handy bit of UserScript reveals the current YouTube RSS feed URL is in the form of…

http://www.youtube.com/feeds/videos.xml?channel_id=UCralF3lNmSNYFaFtul5apuw

Therefore, you go to your Subscribed Channels List page, and there use LinkClump (or similar) to copy out just the channel URLs and then in Notepad…

Search: /channel/

Replace: /feeds/videos.xml?channel_id=

You then have a list of RSS feeds for your subscriptions.

Fixing Paperity

Paperity appears to have been heavily depreciated in search results, by Google Search. Since circa 2015 Paperity has usefully listed and linked OA articles in hybrid journals. For five years indexing Paperity thus enabled JURN to offer coverage of the OA articles in about 50+ hybrid journals in the arts and humanities, mostly at the publisher Springer. JURN users also benefited from by-catch of hybrid journal OA articles in other subjects.

Per-article pages at Paperity now appear to be being automatically de-duplicated and discarded by Google Search, as expressed in JURN, in favour of the same article as known to sources such as Semantic Scholar and EuroPubMed (both also in JURN) and other aggregators. As a fallback I’m now indexing just the per-journal pages at Paperity (i.e. their linked lists of OA articles in a journal), and Google Search seems to treat these as an absolute backstop. Meaning that that they will show up in JURN’s results, but often only as the very last result in a short set of results. This is quite useful behaviour, as it doesn’t distract users up at the top of the results. Formerly, JURN indexed the per-article pages at Paperity, but these were no longer appearing in results. Hence the need for change.

I’m also now indexing a couple of the relevant Springer journals directly in JURN. Indexing of article pages at *.springeropen.com also serves as a further backstop. Please contact me if SpringerOpen indexing doesn’t work well for your Springer OA journal, and you would like it directly indexed in JURN.

Also, note that JURN excludes URL paths with /figures/ from SpringerOpen and Springer journal links. These are pages containing the graphics, graphs etc from the article. While useful in their own right, and as such grabbed by Google Search, they are best approached in academic search via their main article page.

Venezuelan Scielo – URL change

It appears that the Venezuelan Scielo at www. scielo.org.ve/ is now just ve. scielo.org/ DuckDuckGo still indexes the old URL, which it seems no longer works. I get a pass-through to the Wayback Machine at Archive.org on hitting such dead URLs, though that’s the result of my browser plugin.

Google Search is indexing the new URL only, which works. When Google Search changes, it’s usually a sign that the old URL really is kaput.

It’s probably best to keep an eye on the other Scielo aggregators, to see if they make the same change and thus break older URL paths and Web links. They don’t appear to have made such a change, so far.

My guess would be that the .ve change could be a result of buying one of those ‘vanity’ fixes which remove the www. in an URL. After some hard-sell from a salesman such fixes usually turn out to be very expensive to maintain, and after a time the URL often defaults back to normal.

In the meantime JURN is also directly indexing Venezuela’s journals at produccioncientificaluz.org and saber.ucv.ve/ojs/ and erevistas.saber.ula.ve. The nation is starving, but their journals are still online and reachable for now. The latter two appear to have different sets of journals, despite being from the same University of the Andes.

Yahoo Groups to close

The Yahoo empire continues to crash and burn. The old Yahoo Groups will shut down completely on 15th December 2020. If you had a Group there with content that’s still useful, now’s the time to back it up and upload the .ZIP file to Archive.org in perpetuity. Although I imagine that Archive.org itself may already be ‘on the job’ in that respect.

Working Excel spreadsheet: Take a list of home-page URLs, harvest the HTML, extract a snippet of data from each

I’m pleased to present a free ‘ISSN harvester’ for Excel 2007 or higher.

What you need: You have a long list of home-page URLs, one per line. You want a small snippet of data captured from each HTML page. The target data is not in any kind of repeating HTML table or tag, and could be anywhere on each page.

Usage: A long list of home-page URLs is pasted into the first column. The sheet then checks each URL in turn, and also extracts their HTML source into an adjacent cell. A formula in the end column then looks at the captured HTML and extracts the first instance of “ISSN” and any 70 following characters. Where no result is found, the formula leaves a general label as a placeholder.

Download: ISSN-and-data-checker-working.xlsm

Works in Windows and Excel 2007. May require the user to have Internet Explorer installed. Tested and working fine on an 800+ URL list. Each URL just captures the loaded page, not the entire website.

It should be adaptable to capture any snippet of data, just vary and replace the formula. Theoretically, you could also add extra columns to capture other data from the same HTML, such as “i s s n” or “eISSN”.


Credit: This is derived and expanded from the free “Bulk URL status checker in Excel sheet”, which checked a list of home-page URLs for 404s, and also rather usefully extracted each page of HTML to a cell while it was about it. I would have had no idea how to set up that ‘HTML per cell’ bit, without his working example. That spreadsheet was kindly shared on the TechTweaks blog by ‘Conscience’ in April 2017. Here is has been adapted by myself to also extract data.

Added to JURN

Journal of L.M. Montgomery Studies (the life and work of the famous author of Anne of Green Gables)

Technical Bulletins of the Williamstown + Atlanta Art Conservation Center

Archiv Orientalni, 1929-2010. Later known as ArOr: Quarterly Journal of African and Asian Studies. Partly indexed by JURN via the URL kramerius.lib.cas.cz/periodical/ — which is not ideal, in terms of either target or Google Search’s current indexing, but is the best that can be done at present.

WIPO Magazine (World Intellectual Property Organisation, JURN was already indexing the WIPO Journal).

Quantitative Science Studies (MIT, “theoretical and empirical research on science and the scientific workforce”)

Philosophy of Medicine


Luminaria (interdisciplinary natural biology, Brazil, latest issues are partly in English)

+

Three more UK repositories, and the per-article record pages of IA Scholar at fatcat.wiki/release/ (the latter fairly poorly indexed by Google Search, at present).

Initial harvest of ISSNs is now integrated into the openECO directory. The process of adding ISSNs is not yet complete.

Working Excel spreadsheet: Align two lists without fuzzy lookup

Here’s my possibly-useful working Excel list-sorter, made for Excel 2007 and higher.

Situation: You have a long list of items in column A. You’ve copied out this list to run it through a process elsewhere, perhaps in some arcane Windows freeware that is the only thing that can do a particular job for free. This process has added a snippet of wanted new data at the end of each item. Hurrah!

But… possibly the process also discarded some lines, when no new data was found. Or perhaps a ‘helpful’ intern has later added a few lines here and there to the new list. Your new processed list is thus rather awkwardly jumbled up. You can no longer easily align your valuable new data snippets against the old list.

Use: Paste your jumbled and expanded list in Column E, and Column C will automatically sort and auto-align it alongside the original list. No ‘fuzzy lookup’ engine is required.

Download: match_and_sort_without_fuzzy_lookup.xlsx