Space in Images (20,000).
Space in Videos (3,200).
Space in Images (20,000).
Space in Videos (3,200).
Social Media Research Toolkit is an up-to-date and academic list of free tools for bulk-collecting and investigating various forms of social media.
TagWalk: The Fashion Search Engine is a very nice high-end ‘trends search’ experience for the fashion industry. It kind of feels like you’re using Anna Wintour’s personal search-engine, but it’s public. Extensively tagged by hand, by a small team of ‘industry insider’ curators. It’s fast too. Humdrum academic archive discovery searches could learn a thing or two here, in terms of slick navigation and speed.
The only thing I’d change would be the ability to switch through from womenswear to menswear, so that when you’re looking at Spring/Summer 2017 “tag” in womenswear results, you can instantly flick over to the same “tag” in menswear for visual comparison. I’d also add “hat” and “cap” to the tagging, since they’re currently missing, though that may be because designers are not currently sending hats down the catwalk. At the back is “moodboards”, which I didn’t explore, but I presume you can pull your own custom moodboards of designs.
The Metropolitan Museum of Art has just placed 200,000 CC0 images of its art and others items in its collection. So far the site is holding up nicely in terms of managing the traffic.
The test files I had from it were large, at around 4000px, and were also crisply photographed. The site’s search works cleanly and a single click downloads the largest file size as a .jpg. The only thing I’d change is to allow a Web browser’s Back button to work with the site’s search results.
Newly announced for the UK…
“Today Jisc announced that OCLC, the global library cooperative, has been awarded the contract to develop a new national bibliographic knowledgebase (NBK).”
Judging by the initial press-release, the focus seems likely to rest first on cohering UK academia’s metadata management for digital book collections. This will in time…
“enable shared bibliographic metadata to flow into … global search engines”
Hopefully that means Google Search, as well as Google Scholar (which are two separate systems and databases).
Kapersky Labs is developing the FFForget tool, aiming to create a local backup of your Facebook. Hopeful it will also archive the Facebook Groups you moderate, while allowing local search of their full-text.
“The service is planned to go live in 2017 based on user’s interest. Subscribe now for free and become one of its first users!”
From the CORE blog…
“CORE is thrilled to announce that it currently provides 5 million open access full-text papers.”
A very worthy achievement, given the surprising difficulty of auto-harvesting full-text from university repositories. I’m happy to say the CORE full-text has long shown up in JURN’s search results.
Lecture Search at findlectures.com is a new search tool, or at least it’s new to me. It appears to have launched in the summer of 2016, which is probably why I missed its launch. In the UK it would have been drowned out by the news of our glorious Brexit.
Lecture Search aims to find ‘intelligent talk’ files such as conference and academic lectures, and it does what it claims. A few early observations:
* Seems to be running from a hand-curated URL list. There’s evidence in the results that the last indexing run may have been in early 2016?
* Includes YouTube and Vimeo as sources but seems to have a filter on them, presumably via indexing only selected channels.
* Searchers should use NOT keyword rather than -keyword to knock out search words from results.
* Nice range of limiting facets, in the sidebar.
* One annoying pop-up nag-box, but it was easily killed with AdBlock Plus’s “select an element to hide…”.
* Relevance ranking is definitely not Google-licious, as it the case for all such Summon-like services. For instance: search for “cave art”, get “The Complete Poetry of Cesar Vallejo” as the first result. That page’s text happens to mention Vallejo once did some research on “cave art”, but then presumably the prestige of the result’s loc.gov URL lifts the result up to No.1.
* Not indexing the BBC’s hundreds of In Our Time .mp3 podcasts, which seems a pity.
Precis writing skills among recent American graduates: apparently disappearing faster than UC Berkeley’s federal funding…
“We had close to 500 applicants. Inasmuch as the task was to help us communicate information related to the work we do, we gave each of the candidates one of the reports we published last year and asked them to produce a one-page summary. All were college graduates. Only one could produce a satisfactory summary. … Our own research tells us that a large fraction of community college professors do not assign writing to their students because their students cannot write and the professors do not consider themselves to be writing teachers. It is no wonder that employers like us find it so hard to find candidates with serviceable writing skills.”
Admittedly precis and outline writing is a skill that’s only barely acquired after a good deal of practice, and then not by all in a class. It may help if a student has developed the knack of point-summarising by regularly taking hand-written outline lecture notes. Even then ‘getting it’ might require half a semester, rather than just a couple of hours of lessons. It’s a skill that’s likely to be especially difficult for a student who isn’t an avid advanced reader, ideally a reader of factual argumentative content that requires one to constantly unpick arguments on-the-fly.
In the news this week, Priscilla Chan and Mark Zuckerberg (Facebook) have purchased an academic search engine Meta, and are set to… “offer Meta’s tools free to all researchers” at some point in the future. Very nice of them.
Currently meta.com’s search is shuttered to the public, but the site is inviting sign-ups. Meta.com is not a name that’s been on the tip of my tongue, or covered here. I don’t recall if public access to it was ever available, but possibly not. Apparently the pre-Zuckerberg Meta was one a clutch of startups trying to apply AI to a limited set of the academic literature — often in the relatively tame-but-lucrative biomedical field. I had a glancing post here on the apparently-similar Iris AI 2.0 back in November. At its search tool level Iris AI seems to propose much the same search capabilities as Meta — but via a demo of 30m+ records harvested from repositories by CORE. In contrast the pre-Zuckerberg Meta.com covered PubMed, according to a November 2015 press-release, combining that with metadata input from “dozens of publishers”. Another November 2015 press release rather ambitiously claimed that Meta.com enabled a user to…
“navigate the entirety of scientific information (25 million papers with 4,000 new ones published daily)”.
After the Zuckerberg-boosted relaunch the stated aim is to expand the functionality via third-party access…
“we will enable developers to build on it or integrate it into third party platforms and services … will embrace the ideas and efforts of researchers in the diverse fields that Meta intersects with – including machine learning, network science, ontologies, science metrics, and data visualization”.
Hopefully that opening up will also include open public access to the most juicy commercial bits of Meta.com, like the ‘early awareness’ Horizon Scanning module. This claimed to be able to descry a predictive map of future research agendas and trends…
“will enable academics and industries to maintain early awareness of emergent scientific and technical advances at a speed, scale and comprehensiveness far beyond human capacity, and years in advance”
Assuming that works as intended (I haven’t encountered any gushing reviews) I’m still not sure I’d want to absolutely rely on a predictive tool that only saw a fraction of the picture. Since a mere “25 million papers” seems a little lightweight, re: a claim to index “the entirety of scientific information”. On the other hand, if it covers all of the output in one’s tight little niche, and has semantic links out into a spread of related and similarly delimited fields, then it could be quite useful for some people.
Publishers have until 10th February 2017 to submit suggested humanities book titles to Knowledge Unlatched. Selected books are made Open Access in perpetuity, albeit usually minus the cover art/design as part of the Creative Commons PDF. Losses are defrayed by a consortium of libraries.
106 Knowledge Unlatched titles currently show up in OAPEN and thus in JURN. Although 343 titles were unlatched for 2016, which means that a lot more are coming soon.
The Victoria & Albert Museum “Persistent Identifiers for the Humanities (workshop report)”, 20th January 2017…
“… the British Library and the DateCite organisation (as part of the THOR project) organised a workshop before Christmas on this issue of ‘Persistent Identifier Services for the Humanities’.
It was apparent from the discussions in the workshop that the implementation of this infrastructure in the humanities is still very much in its infancy in all institutions. Some of the basic concepts inherited from scientific research do not seem to map directly across. For example, do humanities’ researchers consider their source material ‘data’. Or should we even be referring to ‘data’ as a ‘dataset’? It is not immediately obvious what the distinction between the two terms is. Is an individual museum object a dataset or is a set of museum objects a dataset in the same way as a set of data points in scientific research can be?
A separate point of discussion is how to distinguish between the physical object, its digitised version, its associated catalogue record and different versions of this record, (as knowledge is accumulated/revised) as this is not currently clear in DataCite. Although a similar situation was mentioned in the sciences with ice-core samples, where different digital datasets continue to be published from the same physical ice-core samples.”
“Availability of digital object identifiers in publications archived by PubMed”, 3rd January 2017. For…
“the period 1966–2015 (50 years). Of the 496,665 articles studied over this period, 201,055 have DOIs (40.48%).”
So just under 60% are without DOIs, and that’s for biomedical in PubMed — albeit when including thirty years of pre-1995 (pre the mass Internet) coverage. More recently, for 2015, the study found that 13.5% of new content was still without a DOI.
The DOI-free figures for the humanities will be far higher, according to “Availability of digital object identifiers (DOIs) in Web of Science and Scopus”, February 2016…
“Many journals related to the Natural Sciences and Medicine with considerable impact have no DOI. Arts & Humanities WoS [Web of Science] categories have the highest percentage of documents without DOI.” … “exceeding 50% only since 2013. The observed values for Books and Proceedings are even lower despite the importance of these document types …”
As for DOI availability within articles in repositories, IRUS-UK provides a “DOI Summary” field giving “the numbers and percentages that have DOIs available” in UK repositories, although the access to their datasets is controlled. IRUS-UK has no summary infographics that I could find, relevant to DOI availability. But it would be interesting to determine what proportion of UK repository free/open journal articles have DOIs.
Now updated and available as a Microsoft Office Excel .xls file (750kb)…
“Surfmarket [has] made a list of more than 7,400 journals in which […] Dutch universities and academic hospitals can publish in open access for free or with a substantial discount.”
570 of the titles fit the arts & humanities category, and these are all published by a small handful of establishment publishers.
It’s not possible to separate out the list’s eco/nature titles, since the “Natuur” category is too broad. At one end it ranges from New Zealand Journal of Marine and Freshwater Research through to Potato Research, and at the other end goes spinning off into chemistry, maths and physics titles like Polymer Bulletin, Probability Theory and Related Fields, and Progress in Nuclear Energy.
I thought the list might be a useful source of some new URLs for JURN. But there doesn’t yet seem to be any way to filter journals by their “hybrid OA” / “wholly OA” status. Random sampling of the list of the 570 humanities titles suggests most are hybrid, and that as yet they only have a few OA articles in them. Thought doubtless that will start to change, once mandates start to operate fully.
“Are Open Access Monographs Discoverable in Library Catalogs?”, Libraries and the Academy, Volume 17, No. 1, January 2017…
The analysis indicates that only a small percentage of college and university library catalogs in the United States and Canada consistently enable discovery and access for the test sample.
“The open access aggregators challenge: how well do they identify free full text?”, Medium article-post, 7th January 2017. Looks at BASE and CORE…
when OAI-PMH (which is the standard way of harvesting open access repositories [was established,] no provision was made to have a standard way or a mandatory field to indicate if the item is free to access.” [But today] “many have in fact more metadata-only records than full-text records.
[BASE] “is only able to see 75 free records in National University of Singapore’s IR, 654 free records in Nanyang Technology University’s IR, 143 free records in Singapore Management University’s IR. I did not do a check to see if there were false positives in BASE’s identification of full text but [assuming] they are 100% correct, we see only a full text identification ratio of 0.6%, 3.8% and 2.7% respectively!” […] “the results for CORE are as dismal as BASE.
See also: “From open access metadata to open access content: two principles for increased visibility of open access content”, conference paper presented at Open Repositories 2013, 8th-12th July 2013, Charlottetown, Canada.
… only 27.6% of research outputs in repositories are linked to content that can be downloaded by automatic means and analysed (e.g. indexed). […] the median repository will only provide machine readable content for 13% of its deposited resources. [but] it is likely that these statistics are in fact rather optimistic …
“Assigning Creative Commons Licenses to Research Metadata: Issues and Cases”, 19th September 2016…
“From a recent analysis, out of a sample of around 2500 publication repository services in OpenDOAR 2 ([those] supporting the OAI-PMH protocol standard), only 9 expose metadata license information: 3 with CC-0, 2 with CC-BY, and 4 which require a permission for commercial use, 3 with CC-0 and 1 with CC-BY.”
Nine. Not nine percent, just… nine. And one can assume that the other 1,100 repositories in OpenDOAR are even less likely to host CC license information for metadata in some form or other.
Beall’s List of Predatory Publishers 2017, just released.
The 2017 Edge Question responses have just been released. Over 200 of the world’s finest minds answer “What scientific term or concept ought to be more widely known?”. As usual the combined single mega-page weighs in at around the length of two novels, on which the likes of Instapaper will choke. So Kindle ereader owners may want the unabridged unofficial .mobi ebook conversion for the Kindle.
It seems that JURN’s search results have become even more precise over the last year, if a new report by Searchmetrics is to be believed…
“the study found the URLs for pages that feature in the top 20 search results are about 15% longer on average than in 2015. Searchmetrics said this is likely because Google is better able to identify and display the precise pages that answer the search intention, and these pages are more likely to have longer URLs because they possibly lie buried deeper within websites.”