A possible unwanted side-effect of making PhD theses open access in public repositories, if not actually Creative Commons… image libraries want hefty image reproduction fees

“consider that your average art history PhD will have dozens, if not perhaps hundreds, of images, then soon even an unpublished PhD can become prohibitively expensive. You want to discuss mid-18th Century portraiture, and show perhaps 50 images? That’ll be £750. You want to turn that PhD into a book? £3,050 please, before you’ve even thought of printing costs. Want to put on a Hogarth exhibition, with a decent catalogue? £8,600. Ouch. And Tate [in the UK] are on the cheaper end of the scale.”

And that’s before many image libraries realise that the PhD might be made public as a PDF, and thus that their digital pictures could be extracted at print-res (pro version of Adobe Acrobat, go: Tools | Document Processing | Export All Images) and then whisked into the public domain by cackling anarchists on Wikipedia.

But the image given in the article as an example seems to have already had something similar happen to it. It’s the Tate’s copy of “The Painter and his Pug” (£162, please… the Tate having already taken PhD PDFs in repositories into account, and gouged accordingly). The picture’s now on Wikimedia and gleefully marked as public domain.

Still, that picture is by Hogarth. If you’re writing on someone more obscure or more modern, or don’t have the time or search skills to go burrowing into Hathi and Archive.org, then I can see how the gouging ‘repository-increased’ fees could make it difficult.

And difficult not only for the hapless writer. But also for librarians. Once the PhD is in a repository and is the institution’s responsibility, one suspects that some especially viscous picture libraries may even decide to make a bundle of cash by finding ‘personal use’ images in PhDs and demanding institutional prices for their use. In which case in future might we see PhD PDFs with most of the pictures blanked out, due to a mis-match between the assumed ‘personal use, on the library-shelf only’ licence for the pictures (for instance, Google’s 10m-picture LIFE magazine archive) and the subsequent public and institutional status of the document once it hits the repository? If so, who is going to go through and censor? One suspects it’ll be too much trouble for librarians to do that by hand, and too much trouble to figure out what stays and what goes (I assume 100% reliable machine-readable rights tagging is a non-starter, due to the human author in the loop). In which case the university’s risk-averse lawyers would just recommend that some bot should automatically detect and delete all the pictures, or — as with the Digital Library of India books that I’ve seen recently — their contrast would be increased so far that the pictures become almost illegible.

One way an author might get around that is to also provide a search link with keywords and phrases embedded in the URL. Thus my URL, when clicked, searches multiple image search-engines for “The Painter and his Pug” etc with a size of more than 2MB. Of course, readers can do that for themselves, but it would be a nice future-proofing courtesy. Or what about ‘intelligent PDFs’ that do that for you, fetching and embedding the required image on-the-fly from wherever it can be best found? An AI might help with that, and perhaps the link might contain an AI-friendly formula for what the required image should look like (big red splotch here, eyes there, etc) to ensure that the correct one is fetched.