Welcome to the first Altmetrics Research Roundup!
In these regular posts, I’ll be sharing and summarizing the altmetrics research that’s most relevant to practicing bibliometricians’ daily work. In other words, I’ll stay on top of the literature so you don’t have to.
Our first post will cover an important but relatively unknown issue in altmetrics: how altmetrics services can affect altmetrics research. Differences in how services like Altmetric and PlumX collect data can have important effects upon altmetrics research, because most altmetrics studies are based upon our two companies’ data.
Full disclosure: I’m writing this post as an Altmetric employee, specifically as one who leads efforts to make our data freely available to researchers who want to study it. My work has given me a unique vantage point onto the data challenges that altmetrics researchers face. Based on my experience and a few independent studies, here I share some common challenges to altmetrics research, and offer important questions to keep in mind when reviewing altmetrics research that might be relevant to your day-to-day work.
Altmetrics and scope
As a bibliometrician, you’re well aware of how differences in scope mean that some citation databases offer seemingly “better” bibliometrics than others. In the sciences Web of Science is often favoured because of its coverage of the STEM disciplines–it tends to count citations from the journals seen as essential for scientists. For similar reasons in the humanities and social sciences, Scopus is often preferred.
But when one compares apples to apples—looking at citation counts from journals that appear in both databases—one usually finds similar metrics. That’s because the mechanics of how the citation is picked up is pretty much the same: each looks for pointers to other articles (e.g. DOIs, author lists, etc), to increase an article’s citation count.
For altmetrics databases, scope is an entirely different matter. Surprisingly, altmetrics services can look at the same sources but come away with different metrics! That’s down to the mechanics of how mentions from the same source are being counted. It turns out that a Mendeley bookmark can be counted in more than one way!
Here’s another example: Facebook metrics can differ greatly for exactly the same article, because one altmetrics provider might collect and tally up only full-text mentions of that article scraped from publicly-facing profiles, whereas another might collect only those metrics available using the Facebook API (number of shares, comments, and likes for the article, including those from private profiles).
In fact, this is exactly what Zohreh, Fenner & Costas (2014) found for a small sample of articles from PLOS ONE. At first glance, PLOS’s article level metrics (ALMs) app, Altmetric, and Mendeley provided confusingly different metrics for the same set of articles:
But as the authors explain, these differences were due to how each application tracks metrics. For example, Mendeley readership for the sample would obviously be highest when querying Mendeley, the source of the data itself. Altmetric’s data collection policies dictate that Mendeley readership isn’t recorded when research hasn’t also been mentioned on other sites that we track. This results in lower Mendeley coverage for Altmetric. PLOS’s coverage differs from both Altmetric and Mendeley, likely due to the unique ID that the PLOS ALM software used to query the Mendeley app.
These differences in metrics for the same articles in the same attention source have an obvious bearing on how altmetrics research based upon these metrics should be interpreted!
Scientometrics researcher Stefanie Haustein digs into the consistency question further in her 2016 article, “Grand challenges in altmetrics: heterogeneity, data quality and dependencies” (OA preprint available here):
Recording tweets continuously and in real time, Altmetric shows the highest coverage of papers on Twitter, while [PLOS ALM software] Lagotto captures only a fraction (Zahedi et al., 2015). The extent to which Altmetric’s record of tweets to scientific papers is accurate or complete is unknown. Replication would only be possible through a direct verification against Twitter’s data, which is precluded by the costliness of access. The replicability of altmetrics is further impeded by the dynamic nature of events. While citations cannot decrease because they cannot be deleted, Mendeley readership counts can change over time (Bar-Ilan, 2014).
In other words, how metrics are retrieved and the stability of those metrics make direct comparison of altmetric data difficult, and replication of the data similarly hard. Those challenges have obvious implications for altmetrics research, most of which is based upon data provided by altmetrics services like Altmetric and Plum Analytics.
How to interpret altmetrics research
Given the challenges listed above, how should we interpret the altmetrics studies we read? Beyond the common markers for reliability (e.g. sample size, data collection parameters, and so on), there are a number of other things to keep in mind when interpreting altmetrics research.
1) How are the metrics in question collected from a source?
- Does the source of the altmetrics data–whether collected by the researchers themselves, or through a third party service like Altmetric Explorer, PlumX, or CrossRef Event Data–only provide metrics based upon publicly accessible, verifiable mentions? Or do they collect any metric that’s available?
- Does the altmetrics service have an exclusive vendor relationship that prevents other services from indexing that metrics source?
- Has the metrics source been manually curated (as is often the case with altmetrics services that include citations to research from blogs or policy documents)?
These questions of scope have a huge bearing upon the data that ends up in altmetrics services, and should be accounted for in any research based upon an altmetrics services’ data. They are also why “coverage” studies like this one should be read and interpreted carefully–often, head-to-head comparisons of altmetrics services’ data aren’t actually comparing the exact same metrics.
2) What are the coverage dates for the source/provider?
All of the major altmetrics services launched in 2011, meaning that most altmetrics data collected are mentions made after 2011. This has implications for altmetrics research, in that a number of other potentially confounding factors related to output age (e.g. collaboration patterns) undoubtedly influence the sharing of newer studies, making it difficult to confidently claim that all research–regardless of publication date–is being shared online in similar ways
There are exceptions, of course. For example, it’s possible for PubPeer users to review a paper from the 1960’s, or for policy documents to cite a canonical study from the 19th century. But by and large, most altmetrics data occurs for papers published in 2011 and beyond. It’s also possible for altmetrics services to stop tracking a source. For example, Altmetric stopped tracking research shared on LinkedIn in 2014, due to changes in LinkedIn’s API. This has limited the ability of research done on Altmetric data to interrogate LinkedIn mentions.
3) Are studies limited only to those publications with a DOI?
Many altmetrics studies base their findings only on publications that have DOIs. This greatly limits our understanding of altmetrics for smaller publishers and humanities research, which tend to publish without persistent identifiers at higher rates than in STEM research.
In summary, the scope of the altmetrics services upon which a majority of current research is based, has important impacts upon what we know about altmetrics. All interpretations of altmetrics research should of course be done with care, but where third-party altmetrics data is being used, one should keep the above considerations in mind.
Have tips for interpreting altmetrics research that haven’t been covered here? Share them in the comments below!
Stacy Konkiel is the Director of Research & Education at Altmetric, a data science company that uncovers the attention that research receives online. Her research interests include incentives systems in academia and informetrics, and Stacy has written and presented widely about altmetrics, Open Science, and library services. She also currently chairs the Innovation committee of Library Pipeline and is building the Metrics Toolkit. Previously, Stacy worked with teams at Impactstory, Indiana University & PLOS. You can follow Stacy on Twitter at @skonkiel.
Disclaimer: Author’s views are her own