• Directory
  • FAQ: about JURN
  • Group tests
  • Guide to academic search
  • JURN’s donationware
  • openEco: nature titles indexed

News from JURN

~ search tool for open access content

News from JURN

Monthly Archives: March 2022

Freeware to convert a WordPress blog .XML export to Word

27 Sunday Mar 2022

Posted by David Haden in JURN tips and tricks, Regex

≈ 2 Comments

Hurrah. Desktop software has been found that will robustly convert all of a WordPress blog .XML export to Word. Specifically, a free blog hosted at the .com WordPress, which doesn’t permit or offer any fancy ebook conversion plugins. The solution is good old Windows freeware, as usual. Though the various freeware directories know nothing of it, and it was only found after hours of digging.

The XML to Doc software is the snappily titled wpxslgui…

wpxslgui is a Windows application which converts an XML File generated by the WordPress Export function into an HTML or Word HTML document.

It’s Windows freeware in v.1.04 (June 2020) from Devio IT Services, being the worthy and generous Herbert Oppolzer of Austria. Tested and working here. Very simple usage, with a Windows GUI.

Also includes the option to… “Convert WordPress XML to a single HTML file allowing filter by category (JavaScript)” but the “Word HTML” output saves as a .DOC file.

However if you cleverly just re-name the .DOC to .HTML it then works fine as a Web page in a Web browser, and thus calls in the images. I’m assuming here that your browser is allowed to go online, but Word is not.

The overall aim here is to get the blog to a clean ebook format for Kindle, removing as much gunky code and blog-cruft as possible. Archivists may also be interested here. As such there are some initial changes you may want to make…

1). First you may want to delete the superfluous 24-hour timestamp by editing the WordPress .XML output itself. What you’re targeting looks like this…

Since datestamps have a unique pattern, a regex can deal with them. Simply deleting all timestamps in Notepad++ with a regex is…

 \d{2}:\d{2}:\d{2}

…and note the single space at the start of the regex.

Running this removes all timestamps but leaves the date intact. Possibly a regex could also re-work the main date to something nicer (e.g. 12th July 2015), but it would likely be a very complex regex.

2). After conversion. wpxslgui adds italics on post titles. The CSS stylesheet is embedded at the top of the output HTML, and so changing this is just a matter of tweaking font-style:italic on H2. Bold might be better.

3). While you’re in the CSS you may want to have external links be something other than blue and purple. In which case edit the colours of a:link and a:visited in the header CSS.

4). wpxslgui adds numbering of posts, in front of each post title. The code spans multiple lines and looks like this…

Awkward, but not impossible for a regex to fix. This Search-Replace regex for Notepad++ will replace all of these with a tilde or whatever other elegant typographer’s HTML mark you might want.

5). Once you have output in semi-cleaned HTML, Ctrl + a should “Select all…” from the browser and then you paste to whatever you’re using. If you were keen on YouTube embeds you will then need to manually go through and delete the WordPress code for these. The WordPress YouTube ‘slug’ insert is presented raw in the post. Without a WordPress installation the code can’t call the video. The same will likely be true for any other fancy embeds of maps, charts, podcasts etc.

Finally, note also that wpxslgui only deals with posts, not pages.

Back to Google Search

12 Saturday Mar 2022

Posted by David Haden in JURN tips and tricks, Spotted in the news

≈ Leave a comment

I’ve given up on DuckDuckGo as primary search-engine, now that it’s just Bing in sheep’s clothing. They were an indeterminate blend of Bing and Yandex, but now they’re reported to have thrown out the Russian Yandex. So they’re now just another Bing clone, including trackers. You can see it in the results. And who wants to use Bing for search? Ugh. Admittedly Bing has its good bits, these being Bing Images and Bing News. But the main search is not one of them.

Google Search thus now seems the best and only option, although you’re going to have to put up with and be able to ‘visually parse’ a lot more spam. I’m assuming you have the spam-iness locked down a bit, by using add-ons like Google Hit Hider. And that you see and can read the full URL path under each result (making it easier to detect pages pretending to be another site) rather than a ‘breadcrumbs’ URL.

So, Google Search it is, for now. Time for some light surgery on the Google Search interface, then.

1. I’m pleased to see that the UserScript Reorder Google Categories still works on Google. This allows you to re-order the items Images | News | Books | Maps | and have the ones you want up front. Then cruft like Shopping | Flights can be removed with the uBlock Element Picker.

2. The UserScript Alternative Search Engines 2 will then let you layer a set of links into Google, such that the current Google Search keyword “or phrase” can be passed over to other engines with one click. Works fine, and I’m pleased to see that JURN can be one of these links. Result…

To get this just edit the Alternative Search Engines 2 script code thus…

This hack now replaces the defunct UserScript I made a few years ago, to add such a JURN link to Google Search.

If you don’t care for Bing Images (see above code), Brave Images is basically Bing Images with a nicer and less fussy UI.

The alternate services are searched in a new window, so your current Google search session is not voided.

You can’t pass the current search keyword over to Google Books or Google News or Maps this way, because the script can only add it the end of each URL. If you could add it in the middle, then the user could do away with Google’s own bar of sub-links entirely. But Books and News have URLs that look like this…

As you can see, the keyword needs to be inserted in the middle. It’s probably possible, and maybe I’ll further hack the code to do that if I find time. Anyway, it all works well so far, with the dual link-bars that are both capable of capturing the current Google Search terms and passing them through.

Added to JURN

10 Thursday Mar 2022

Posted by David Haden in New titles added to JURN

≈ Leave a comment

Added to JURN:

Journal of Italian Philosophy

Argumenta (Journal of the Italian Society for Analytic Philosophy)

Studies on National Movements

Korean Heritage

Year in C-SPAN, The (media communications)

Ludic Language Pedagogy (use of games in teaching language and linguistics)

Ephemeris Hungarologica and associated e-book library

Tahiti (Finnish art history, partly in English)

Parnassus: Classical Journal of the College of the Holy Cross

New England Classical Journal

European Conservative, The (book reviews only)

Journal of Business Strategies (Sam Houston State University)

Explorers Journal, The (journal of the Explorers Club)

Expeditions with MCUP (U.S. Marine Corps. University Press)

Marine Corps History (formerly Fortitudine)

Afrika Focus

Tribal Law Journal

Journal of African Languages and Literatures

Journal of Global Catholicism

Zeramim : Journal of Applied Jewish Thought

Tyndale Bulletin (Biblical Studies, was lost and is now back in again)

Journal for the History of Environment and Society

Journal for Digital Legal History

Oslo Law Review

Online Journal of Space Communication, The

Tamarind Papers, The (fine art lithography)

Proverbium (the study of proverbs)

Berita : The Newsletter of the Malaysia, Singapore, Brunei Studies Group (Ohio University)

Archives & Manuscripts (Australian archivists, 1955-2011)

Subscribe: RSS News Feed.
I'm on Patreon!

JURN:

  • JURN : directory of ejournals
  • JURN : main search-engine
  • JURN : openEco directory
  • JURN : repository search

Related sites:

  • 4 Humanities
  • Academic Freedom Alliance
  • Accuracy in Academia
  • Alliance Defending Freedom
  • ALPSP
  • alt.academy
  • AMIR
  • Anterotesis
  • Arcadia project
  • Art Historicum (German)
  • AWOL
  • Beall's List (updated at 2018)
  • Beall’s List (old)
  • Beyond Search
  • Bibliographic wilderness
  • Booktwo
  • Campus Reform
  • Charleston Advisor
  • Coalition for Networked Information
  • Communia (public domain watchdog)
  • Cost of Knowledge
  • Council of Editors of Learned Journals
  • Dan Cohen
  • Digital Koans
  • Digital Shift
  • Dissernet (Russian anti-plagiarism)
  • DOAJ
  • Don't Block TOR
  • eFoundations
  • EIFL
  • Electronic Frontier Foundation
  • ELO
  • Embargo Watch
  • ePublishing Trust for Development
  • Facebook: Arab Open Access
  • Facebook: Italian Open Access
  • Facebook: Open Access India
  • Film Studies for Free
  • FIRE
  • Flaky Academic Conferences
  • Found History
  • Foundation for Individual Rights in Education
  • Free Speech Union (UK)
  • Google Algorithm
  • Heterodox Academy
  • Iconclass
  • IFLA Serials blog
  • ImpactStory
  • infoDocket
  • InTech Blog
  • Jinfo (formerly Free Pint)
  • Kindle blog
  • L'edition Electronique (French)
  • La Criee : periodiques (French)
  • Leader Statement Database on Free Speech
  • National Association of Scholars
  • National Coalition of Independent Scholars
  • Neil Beagrie
  • OA Lookup : Policies
  • OA Working Group
  • OASPA
  • Online Searcher
  • Open Access Bibliography
  • Open Access Week
  • Open and Shut?
  • Open Electronic Publishing
  • Open Folklore
  • Open Knowledge Maps
  • Open Library of Humanities
  • Periodiques en ligne (French)
  • Peter Murray Rust
  • PKP / OJS
  • Project Gutenberg
  • Publishing Archaeology
  • RBA Blog
  • Reclaim the Net
  • Research Information
  • Research Remix
  • Right to Research
  • River Valley TV
  • ROARS (Italian)
  • Scholarly Electronic Publishing
  • Scholarship Matters
  • Searchblox
  • Searcher
  • Serials Cataloger
  • Serials Review
  • Society of Young Publishers
  • Speech First
  • TaxoDiary (taxonomies news)
  • Taxpayer Access
  • Tentaclii
  • The Scholarly Kitchen
  • Thoughts from Carl Grant
  • Web Scale Discovery
  • Zotero blog

Some of the libraries linking to JURN

  • Boston College Libraries
  • Brooklyn Public Library, NY
  • Duke University
  • Kobe University, Japan
  • Rhode Island College
  • San Jose State University
  • UConn Stamford
  • University of California
  • University of Cambridge (Casimir Lewy Library)
  • University of Cambridge (main)
  • University of Canberra
  • University of Toronto
  • Washington University
  • West Virginia University

Spare BitCoins? Please send donations to JURN via: 17e2KGuyzjzEEE7BsoYTwMo3MtUod6DrjP

Archives

  • January 2023
  • December 2022
  • November 2022
  • October 2022
  • September 2022
  • August 2022
  • July 2022
  • June 2022
  • May 2022
  • April 2022
  • March 2022
  • February 2022
  • January 2022
  • December 2021
  • November 2021
  • October 2021
  • September 2021
  • August 2021
  • July 2021
  • June 2021
  • May 2021
  • April 2021
  • March 2021
  • February 2021
  • January 2021
  • December 2020
  • November 2020
  • October 2020
  • September 2020
  • August 2020
  • July 2020
  • June 2020
  • May 2020
  • April 2020
  • March 2020
  • February 2020
  • January 2020
  • December 2019
  • November 2019
  • October 2019
  • September 2019
  • August 2019
  • July 2019
  • June 2019
  • May 2019
  • April 2019
  • March 2019
  • February 2019
  • January 2019
  • December 2018
  • November 2018
  • October 2018
  • September 2018
  • August 2018
  • July 2018
  • June 2018
  • May 2018
  • April 2018
  • March 2018
  • February 2018
  • January 2018
  • December 2017
  • November 2017
  • October 2017
  • September 2017
  • August 2017
  • July 2017
  • June 2017
  • May 2017
  • April 2017
  • March 2017
  • February 2017
  • January 2017
  • December 2016
  • November 2016
  • October 2016
  • September 2016
  • August 2016
  • July 2016
  • June 2016
  • May 2016
  • April 2016
  • March 2016
  • February 2016
  • January 2016
  • December 2015
  • November 2015
  • October 2015
  • September 2015
  • August 2015
  • July 2015
  • June 2015
  • May 2015
  • April 2015
  • March 2015
  • February 2015
  • January 2015
  • December 2014
  • November 2014
  • October 2014
  • September 2014
  • August 2014
  • July 2014
  • June 2014
  • May 2014
  • April 2014
  • March 2014
  • February 2014
  • January 2014
  • December 2013
  • November 2013
  • October 2013
  • September 2013
  • August 2013
  • July 2013
  • June 2013
  • May 2013
  • April 2013
  • March 2013
  • February 2013
  • January 2013
  • December 2012
  • November 2012
  • October 2012
  • September 2012
  • August 2012
  • June 2012
  • May 2012
  • April 2012
  • March 2012
  • February 2012
  • January 2012
  • December 2011
  • November 2011
  • October 2011
  • September 2011
  • August 2011
  • June 2011
  • April 2011
  • March 2011
  • February 2011
  • January 2011
  • December 2010
  • November 2010
  • October 2010
  • September 2010
  • August 2010
  • July 2010
  • June 2010
  • May 2010
  • April 2010
  • March 2010
  • February 2010
  • January 2010
  • December 2009
  • November 2009
  • October 2009
  • September 2009
  • August 2009
  • July 2009
  • June 2009
  • May 2009
  • April 2009
  • March 2009
  • February 2009

Create a free website or blog at WordPress.com.

  • Follow Following
    • News from JURN
    • Join 901 other followers
    • Already have a WordPress.com account? Log in now.
    • News from JURN
    • Customize
    • Follow Following
    • Sign up
    • Log in
    • Report this content
    • View site in Reader
    • Manage subscriptions
    • Collapse this bar
 

Loading Comments...