FoRESEE: Future Search Engines 2014, a one day workshop in Germany, 22nd September 2014.
David Prosser at Jisc blogs on the need for action on discoverability…
… 40% of researchers kicked off their project with a trawl through the Internet for material, while only 2% preferred to make a visit to a physical library space. [yet] nearly half of all items within digitised collections are not discoverable via major search engines by their name or title [and, even worse] digitised collections become harder and harder to find over time, for a variety of complex reasons.
“Our estimates show that at least 114 million English-language scholarly documents are accessible on the web, of which Google Scholar has nearly 100 million. Of these, we estimate that at least 27 million (24%) are freely available since they do not require a subscription or payment of any kind.”
I’d say that 27m is probably a large underestimate, given that the two engines used for the study (Google Scholar and Microsoft Academic Search) are proven to be poor at indexing open repositories and open access journals. Given a few hours of work I could probably winkle out from JURN a list of 100 “big” URLs, which together would put JURN at 25m (primarily in English) — before even starting to tally all the other URLs.
Bealle has a new list, Hijacked journals. Counterfeit websites that mimic or clone legitimate journals.
Open source, open access comics? The great bard of Northampton is on the job, with a little help from NESTA’s Digital R&D Fund…
” Alan Moore said in a statement: … we are assembling teams of the most cutting-edge creators in the industry and then allowing them input into the technical processes in order to create a new capacity for telling comic book stories. It will then be made freely available to all of the exciting emergent talent that is no doubt out there, just waiting to be given access to the technical toolkit that will enable them to create the comics of the future.”
Google, being evil: ceases all RSS feeds from YouTube.
A Thomson ISI / Web of Science study is reported in Nature, dated 26th May 2014, as “Do Open Access journals have impact?”. They concluded that…
“Open Access journals [a selection of 190 titles, "core scientific publications"] can have similar impact to other journals, and prospective authors should not fear publishing in these journals merely because of their access model.”
RSS feed search, by keyword. Tip: paste in the URL, then cut it back to just the main word in the URL. It will usually find the RSS. Or just use site:yoursite.com and it will find all feeds from that site. Incredibly useful.
Why having the data can sometimes be handy: the Financial Times has fisked the Piketty data on Europe…
“The FT [Financial Times] found mistakes and unexplained entries in his spreadsheets, similar to those which last year undermined the work on public debt and growth of Carmen Reinhart and Kenneth Rogoff. … For example, once the FT cleaned up and simplified the data, the European numbers do not show any tendency towards rising wealth inequality after 1970. An independent specialist in measuring inequality shared the FT’s concerns.” – Financial Times.
The DOAJ now has a handy list of journals they’ve removed since the start of 2014. When you load the spreadsheet, switch through to the “Removed” tab to see them.
I wasn’t previously aware that if a journal hasn’t published in the last 12 months, it will be totally removed from the DOAJ, archive TOCs and all. JURN, on the other hand, is happy to index your journal so long as the archives are still online and open.
Says Microsoft: it’s crap, but… ‘hey, there’s a new version coming soon’. No, they’re not talking about the Windows 8.1 KB2919355 debacle and Windows 9, but rather about MS Academic Search…
Asked about the collapse [in the current version], a spokesperson for Microsoft Research declined to address the problem directly, writing in an e-mail:
“Microsoft Academic Search [has] a next-generation version of MAS, which focuses on enhancing the user experience and evolving it from a research project to an integrated offering within Microsoft’s services portfolio. During this transition, Microsoft has maintained the features, functionality, and the ability for third parties to enter new and updated content into the existing search engine, but the majority of our focus has now shifted to this new initiative.”
The Metropolitan Museum of Art has nearly 400,000 images online, almost all now in ‘just about’ print-res, and…
“that the Museum believes to be in the public domain and free of other known restrictions; these images are now available for scholarly use in any media.”
Above: “A Fury Riding on a Monster”, by Cornelis Saftleven, mid 17th century.
A sample download gave me a 72dpi picture at 3000px on the longest side, which (with a bit of Photoshop work) would just about hold up at A4 size and be able to fill a full page of a print magazine.
Oaddo is an early alpha of a cool new search tool. Imagine that Wikipedia and Pinterest combined to give autocomplete a usability makeover, with Trello acting as the makeup girl. The aim is to help you do deep ‘research search’ when you don’t really know what you’re searching for.
It has an interesting way of allowing your search terms to interact with clustered semantic tags, for drilling down to the best search result. Sort of like a Google autocomplete / autosuggest that’s slowed way down and is largely under your control, and is curated by humans — and as a consequence is not dumb.
Oaddo has a nice clean interface too, which is neatly poised between power and simplicity. The developer Tim Borny has obviously been looking at Trello and Pinterest for inspiration. Although at the moment the discarding of search modifier tags takes two clicks, instead of a fun one-click “fling it to the discard tray” movement.
The other innovation is that it aims to have a democratic user-driven model. That aspect might take Oaddo a long way, provided there’s a critical mass of people — and provided a mechanism can be found to reign in the inevitable SEO spivs, ideological censors, and WikiPolice types.
* Users will ‘vote’ on content, curate content and the database of related terms.
* The community will drive the addition of new features.
So, very interesting. Amid the sea of recent search launches, this is actually one to watch. Here’s Tim Borny’s full explanation…
CultureCase, a new UK overlay service that provides a short plain-English summary of selected academic research on the impacts and effects of the arts and arts policy. There are OA links where possible, but most of the outbound links are to research that’s behind a paywall — which shows why these summaries may be especially useful for bootstrapping arts organisations which need to “make the case” for culture to sceptical bureaucrats. Though, in my experience, one does ideally need access to the original papers and reports since much arts advocacy research tends to rest on shaky foundations. Once you track back the estimates and ‘received wisdom’ factoids to their sources, the case being made can start to totter. This is especially true when people are making numbers claims about the boost to cultural employment or regional tourism income.
CultureCase currently has no links to OA journals on their links page, so I’ve sent them the following list…
Irish Journal of Arts Management and Cultural Policy
Asia Pacific Journal of Arts and Cultural Management
Working Paper Series, The Princeton University Center for Arts and Cultural Policy Studies
Current Opinion in Creativity, Innovation and Entrepreneurship
Nordic Journal of Cultural Policy
Arts Professional (UK, now free)
A Google search shows that CultureCase only have 27 OA articles at present, which can be found via a Google site: search. It would be useful if there was a http://www.culturecase.org/research-category/open-access/ tag which would collect all the open article records onto a single page.
The other problem is that they are linking to JSTOR and calling it ‘open access’ — but most people outside academia don’t have access to JSTOR, or only have very partial access.
Odysci Academic Search is aimed at allowing…
“technical professionals and companies to find and use the relevant technical information” in “computer science, electrical engineering and math-related areas”
Their blog entries tail off and stop in 2011, so it’s been around for a while, but the developers have a new paper which describes the technical infrastructure and gives the algorithms. I was interested to learn that…
“This framework is able to import, de-duplicate and persist 200K papers in the database (and all their entities) in 16 hours on an i7-based workstation with 32GB of RAM.”
My broad test search for…
“energy conservation” organizations
… gave me 44 results which included three fulltext links. That suggests that when Odysci imports records, there might be PDF links on less than 10% of those records?
On that basis I would guesstimate an ability to ingest, strip and process perhaps 20,000 fulltext PDF papers every 16 hours, give or take? So in terms of making a standalone JURN, give me six such PCs and the bulk of the humanities journal indexing might be done in… six months? Keep in mind that processing power is increasing (the Core i7 CPU line was introduced in 2008). If Odysci’s i7 is a circa-2008 CPU then more modern processors will do the job faster, and superfast broadband would speed up the actual PDF downloading.
The same search in JURN tended to foreground papers on the role of human behaviours/attitudes and public policy in organizational energy conservation — rather than the technical aspects of electrical implementation. That suggests that — despite the recent science additions — JURN will tend to veer toward ‘the human element’ of topics. I also ran the test in Google Scholar, which proved to have the same veer, though with a heavier emphasis on articles from Psychology.
Free certified Coursera MOOC in Metadata: Organizing and Discovering Information. Starts 14th July 2014, for 8 weeks.
The theme for this year’s International Open Access Week (20th–26th Oct 2014) will be “Generation Open” with a focus on students and ‘early career’ researchers.
Off the top of my head, a few ideas for student activities:
* Bring together a small team to produce a one-off WordPress-based “overlay journal” or ebook. This would aim to elegantly showcase selected fulltext items in your university repository. Also scan just one old public-domain scholarly article that has never been seen online before, and add it to the mix. The issue/book might be themed around research on the history and natural history of your region or city — likely to spark local media coverage and thus to raise awareness among local independent/retired scholars. Once complete, invite local writers and artists to post responses to the chosen articles. Promote the completed issue/book as a resource for teaching of advanced comprehension and writing: have selected lecturers give student assignments to ‘translate’ the articles into 250-word ‘plain English’ summaries for general readers.
* Reach out to any Library / Librarianship related student groups on Facebook etc. Make sure you’re not pushing against an unlocked door, or duplicating work that’s already being done.
* Foreground and promote Open Access in a wider and rather cooler context than the introductory lecture on first-year study skills. (You know the one: 150 first-year students perspiring in a stuffy lecture hall in late summer, in front of which a librarian with an over-stuffed Powerpoint is trying to rectify six years of bad habits in 60 minutes). For instance, instead try adding OA to local events on the practicalities of Creative Commons and the remix culture, co-organised with your local creative industries network.
* Reach out to university alumni, via writing an article in the alumni magazine or mailing. Stress the abrupt loss of access to research, on finishing a course. If the editors seem enthusiastic, suggest they carry a regular feature to signpost the best of “what’s new in the repository this quarter”.
* Write an article for your university’s local businesses / local partners engagement newsletter. If these publications carry funding news then they can be surprisingly closely scrutinised by key local players.
* Offer to spend a day “dust busting” the library website, via a full link-check / update / OA expansion of all their subject guides and open access pages.
Slidee, a new search engine for Powerpoint presentations. Very underpopulated at present, but it may improve. It’s better than Google in one respect: Google Search currently refuses to show any results when using a doubled-up filetype search, so as to cover both types of Microsoft Office file, such as this one…
metadata “open access” filetype:pptx filetype:ppt
Bing doesn’t balk at double filetype: modifiers, but then it just seems to discard/ignore any .pptx results (Google sees 10 million of those). Odd, considering Bing is from Microsoft.
Not quite here yet, but with a smart cover and some advance articles for free: Rift Valley Review from the The Rift Valley Institute (covers various nations in Central East Africa). NewJour lists Rift Valley Review as Open Access.