A new category for posts on this blog, Regex. I’ve gone back and retrospectively tagged old posts with it.
New category for posts: ‘Regex’
27 Wednesday Apr 2022
Posted My general observations, Regex
in27 Wednesday Apr 2022
Posted My general observations, Regex
inA new category for posts on this blog, Regex. I’ve gone back and retrospectively tagged old posts with it.
23 Saturday Apr 2022
Posted My general observations
inA WordPress plugin is needed that lets small publishers and copyright-owners easily and cheaply offer a Google Books-like experience. This would allow public searching across a set of uploaded PDFs, but the actual PDFs would not be made public.
The only thing being served from search would be Google Books-like snippets of text and page. Something more or less like this in terms of the elements… snippet, issue title, cover thumbnail, page thumbnail with snippet location highlighted…
A typical use-case would be, for instance, a set of 50 hobbyist magazine back-issues. Cheese Making and Makers magazine, 1990-2002, that sort of thing. Old, but still valuable in terms of the wealth of information. The rights owner is not a huge mega-publisher, and may indeed have inherited the rights on the death of a family member. All they want to do is ‘scan and sell’, as simply and cheaply as possible and without recurring costs other than the web-space.
Searchers, having satisfied themselves via search snippets and a TOCs preview that their discovered magazine or journal is of obvious use to their specific needs, could then buy a bundle of the back-issues.
There must be a great many niche, trade and ‘vernacular’ sets of back-issues out there, that might be winkled out into public availability if such a simple secure tool were available.
23 Saturday Apr 2022
Posted My general observations
inHurrah! After numerous support emails and hassle, my new jurn.link domain has now been ICAAN verified within the 15-day verification period. It should now stay online for the foreseeable future.
21 Thursday Apr 2022
Posted JURN tips and tricks
inI’m currently having a great deal of trouble trying to ‘verify’ my new domain name for JURN. All this needs is a simple email and a link, but the email is just not getting through to me. So, a ‘heads up’ to readers and JURN users that it might vanish again at the weekend. As I wrestle with my web host’s support, here’s a guide I’ve written to help others in the same predicament. It’s fairly complete, re: what I’ve learned so far, but I’ll update it a bit if there’s success with a SMS verification of the new domain.
What to do when the domain-name verification email never arrives:
When you buy many types of domain name for webspace, you can use it for 14 or 15 days but then it dies… if you haven’t received and clicked on a link in the vital “domain verification email”.
Ok, so no verification email whatsoever appears to be being sent. In which case work through the following…
1. Look in the “spam” and “deleted” folders in your email software. The verification email may be there.
2. Re-send the verification email, which is usually done from your domain control-panel at your Web space host. Wait 20 minutes and see if it’s arrived. After three or four failed tries, it’s likely nothing will ever come through. Something unknown is blocking or diverting the email.
3. Access your email account via browser webmail rather than your desktop email software. You will may find that the webmail has an additional “spam” folder that you knew nothing about, and which does not show up in the desktop software. I found such a folder. The missing emails may be in there, though sadly they were not there for me. Some people do report success by trying webmail access, and they do find the email in this additional spam folder. Webmail access will also prove that the problem is not your broadband router or Windows .hosts file doing some kind of local domain blocking. While you’re in webmail, also check the other folders in your webmail system.
4. Nothing there? Back in the desktop email software, access security settings and whitelist the domain that sends the verification email. This may take some Google-ing to find — e.g. for the large European provider Hostinger this is @hostingerdomains.com and thus hostingerdomains.com Whitelisting is unlikely to work at this stage, but may be worth a try. Also check security and other settings on your email software. Then re-send the verification email, again via the button on your webspace hosting control-panel.
5. Still nothing coming through? Since you saw nothing in webmail, the problem is unlikely to be the desktop anti-virus messing with your emails. But it may be worth double-checking for aggressive anti-phishing measures from anti-virus and anti-malware. While you’re looking at such unlikely possibilities, also check your phone SMS just to be sure the domain registrar is not trying to contact you that way.
6. Back in your desktop email software, ensure it has no ‘rules’ that might delete or divert the vital email. (At this stage I even wondered if @hostingerdomains.com was being blocked by the UK’s btinternet.com at the DNS level because it was Russian, but no… WHOIS says it’s registered in Cyprus).
7. In your website provider’s control panel DO NOT then try to change the address that the email is sent to. A successful change will usually require two emails, one to the new and one to the old address. Both will need to be received and confirmed, to effect the change. You see the problem here, I think! The old address cannot receive the vital email, which is being sent from the very same domain as… the email message you’re not getting.
8. Ok, now contact your hosting’s support, ideally via their “domains” contact point if they have one. Work through the usual script (“Look in your spam folder”, “I’ll send you a test email” “ok, our test mail was sent ok and didn’t bounce” etc), until you can ask to be referred upward to someone else.
9. Some webspace hosts can then have their technical team manually edit the WHOIS data, so as to just replace the bad contact email with a new good one. If this can be done, give them a few hours and then reload/refresh your control panel in your Web browser. If the new details don’t then show up, you may need clear the last 4 weeks of browser Cache and probably also the same time-period for Cookies. Than reload again. If the page now shows the details are changed and you’re Verified, then… problem solved. If the details are changed but you still need to verify, then try again at the new email address.
If the change of details is not possible then ask for a SMS verification to be sent to the mobile phone-number you added to your domain registrant details. Which I hope you did. ICAAN apparently allows its licensed domain-selling… “registrars to send a unique link via SMS instead”. If an SMS verification can’t happen… then you may just have to abandon the domain name and put a new and similar one on your webspace. This time you’ll of course try another email address, as part of the new domain’s contact-details.
10. Alternatively, if the SMS verification is not being permitted or also failing, then you might just claim the 30-day money-back offer. Start over somewhere else more amenable to keeping your domain online for years and years.
Ideally, when first setting up a domain, there would be a button to: “Test the email address you entered, by sending it a simple ‘welcome’ email from the domain verification mail service we use”. Only if that email was a success would you go ahead and buy the domain/space with your entered registrant details. You would know that the vital 15-day domain-verification email would actually reach you. In my case I luckily know that an office365 email address will accept an @hostingerdomains.com email with no problems.
Useful links:
* Open SRS’s “The ICANN community is failing its customers”, which explains the problem very well.
* ICANN’s official “Do You Have a Domain Name? Here’s What You Need to Know”.
19 Tuesday Apr 2022
Posted Spotted in the news
inA new unofficial index of Boxoffice magazine issues on Yumpu (it’s an Issuu-like flipbook magazine-hosting service)…
these magazines are a superb resource for theater historians, but to say that they are disorganized would be an understatement. For now, I’m concentrating on the years containing the most drive-in theater news, 1948-1965. If you’ve got a list for other years, please let me know.
15 Friday Apr 2022
Posted JURN tips and tricks
inAt last, a way to turn off those infernal keyboard shortcuts in the online MS Outlook 365. I only have to use it occasionally, but when I do the shortcuts are easily accidentally triggered while typing and they can cause absolute chaos.
1. Visit your usual https://outlook.office365.com/mail/ and allow everything to load up. In the top-right corner there’s a cog-wheel icon. Click on it.
2. This will open a side-panel down the right of the screen. Again, office365 can be sluggish, so give the panel time to fully load.
3. Down the bottom of the panel is “View all Outlook settings”. Click on this.
4. Assuming your pop-up blocker is not getting in the way, you should see this full settings control-panel appear…
… and then you go: General | Accessibility | and click inside the “Turn off keyboard shortcuts” radio-button, so that it’s the button that now shows the blue pin. Save, and exit the config panel.
I’ve briefly tried several times to find out how to do this. But all I recall learning from the Microsoft help boards and similar is that it’s supposed to be impossible, and when the enquirer gets irate at such stupidity the thread is then swiftly closed. Well, it’s not impossible.
14 Thursday Apr 2022
Posted JURN tips and tricks
inThe Google Search logo ‘doodle of the day’ has re-appeared in the corner of the actual search results page. Here’s the new div needed to block it…
.YQ4gaf
It’s written thus in your uBlock Origin element block-list (found at uBlock | Dashboard | My Filters)…
Spacing/layout of the page is retained.
The other three filters you can see here are old, and presumably no longer work but I’m keeping them just in case. If you use a national version of Google Search, change .com to whatever yours is.
Update: It changed again. Is now just .logo as seen here…
12 Tuesday Apr 2022
Posted JURN tips and tricks, Regex
inWhen placing an image in a blog post, on a server installation of WordPress you often get something like this code…
The main image is only linked to, and an ersatz auto-generated thumbnail is what’s shown on the page. One may wish, for various reasons, to change an .XML archive of a WordPress blog so as to only have the filename. If one could remove the suffix seen here in yellow…
… then the blog post will only need to call the original image. The HTML code will take care of the resizing on the blog post.
Such snipping is useful if you only archived the .XML and the original images. You may no longer have or never harvested the multifarious thumbnails that were auto-generated by your server installation of WordPress.
Such filename snipping can be done, and with a simple regex formula, in Notepad++. But first… check what images you have in your local archive for the blog, since it’s just possible that the harvesting collected the thumbnails and not the originals.
Definitely have the originals, with no suffixes? Ok, then let’s proceed. Thankfully all the added filename suffixes have certain repeating elements, even if they have differing pixel dimensions. Thus they can be handled by a regex.
In Notepad++ the following working and tested regex will search and delete the thumbnail extensions, even if they all have different pixel dimensions:
FIND: \|*-([0-9]+)x([0-9]+)\.jpg\|*
REPLACE: .jpg
In plain English: find everything between any – and a following .jpg and if it has the general form of numberxnumber then delete it along with the – and .jpg. Replace the deleted string with .jpg and then repeat the process down through the whole .XML file.
Possibly regex gurus will shriek and swoon at my formulation seen above, but… it works for me.
If you want this regex to run automatically on any post you make on WordPress, as you blog, then Regex Replace extension for the Chrome browser will do the job. It automatically removes the suffix after you “Update” (i.e. save) the post, not when you first insert the image. Thus you’ll need to press “Update” twice on your blog post.
Tested and working. You probably also want to tell Regex Replace to run only on your target blog or website.
You don’t usually need this on free WordPress.com blogs, because they do things differently when placing an image on a blog post.
If you have a fixed width on your blog posts, you can also prevent WordPress from generating unwanted thumbnails thus…
… here I have gone to Media | Settings in the Dashboard, and set 0 to prevent all except “medium” from generating. This is set to the width of the blog post, so I get the HTML sizing code, but the regex ensures all my posts both display and link to the source image. Obviously in this use-case you’ll try not to inflict file sizes above 500kb on your readers.
11 Monday Apr 2022
Posted JURN tips and tricks, Regex
inHere’s how to search-replace all image paths in a .XML WordPress blog export, using the freeware Notepad++.
The idea here is to point them all to a folder of /oldimages/ with your cache of old blog images it it. Otherwise, if there’s no live blog for the new WordPress install to go and fetch them from, they won’t show on the blog posts. So you ideally want all your images pointing to mysite/blog/oldimages and that’s where you upload all your archived blog images.
The \d+ bit stands for ‘any date number’ e.g. /01/02/ It’s a sort of ‘wildcard for dates’. This does not work inside of WordPress, for instance when you have a .XML already imported and a regex plugin to call on. It appears to be Notepad++ specific.
This method assumes you haven’t been letting WordPress rename your image files by size. e.g. upload fluffybunny.jpg and have WordPress insert it as a shrunken fluffybunny-500×350.jpg along with a link to the main and properly name image file. In which case deeper surgery is required.
With a free WordPress blog you may not have this problem. For instance, the above image is placed in the post in shrunken form. But this is done by a bit of code that gets appended to the file path…
… and the image itself is not duplicated, shrunken and renamed. Even when, as a 585 pixel wide image, it only appears in the blog post at 529 pixels.
Also useful here is the old Windows freeware WXRsplitter which will split a single WordPress .XML archive into smaller but valid chunks. Often, a large single .XML (known to WordPress as a WXR file) will not upload once it gets beyond about 18Mb or so. The freeware still works fine. Once the .XML is chunked, you just upload and import each piece in the numbered sequence.
10 Sunday Apr 2022
Posted My general observations
inJURN is now back online after a short hiatus, at the Web domain www.jurn.link and my blog has also had all instances of the old .org links fixed.
Please update your bookmarks and links. It’s just a matter of changing .org to .link which should be a simple matter for most.
My JURN directory of open scholarly ejournals in the arts & humanities and GRAFT (full-text search of the world’s repositories) are also back online.
Everything should be working correctly, though it’s possible that one or two especially recalcitrant DNS servers in Whereizitagin may not yet have picked up the new domain-name.