Guest Post: Understanding SciVal’s calculation of field-weighted percentile indicators

Marianne Gauffriau (Copenhagen University Library, Denmark) and Yrjö Leino (CSC – IT Center for Science, Finland) explore SciVal’s calculation of field-weighted percentile indicators and the lesser-known implications of this method.

Introduction

This blog post is inspired by a question frequently posed to Marianne in her role as Coordinator of the Bibliometric Service at Copenhagen University Library, Denmark.

Question: “How does SciVal calculate Outputs in Top 10 % Citation Percentile, field-weighted (FW)?”

Answer: “Create a publication set from a specific year, of a specific document type, and from a specific SciVal subject area. Order the publications according to citation counts. Now, select the top 10 % most cited publications. Replicate for all other combinations of year, document type, and SciVal subject area that are relevant for your publications. Calculate the share of your publications within these top 10 % publication sets.”

Unexpected response: “Yes, I understand that, but why is my department’s score so low?”

The response provoked Marianne to reassess her understanding of the indicator. In doing so, Marianne in collaboration with Yrjö became aware of a critical nuance in the SciVal calculation that other equivalent calculations of field-weighted percentile indicators do not use. The authors feel that this variation is not well known in the bibliometric community neither is it reported in the bibliometric research literature.

Note: in this blog post we do not discuss publications with multiple subject areas, ranking ties, and publications shared between percentiles. These issues are discussed in the research literature. See also the blog post about stability of SciVal’s Outputs in Top Citation Percentiles (FW) indicators (Rowlands, 2019).

Traditional field-weighted percentile (top 10 %) indicator calculation

We illustrate the traditional calculation of field-weighted percentile indicators with a simple example below.

Dataset

  • Two subject areas, X and Y
  • 10 publications with a total of 65 citations in each subject area
  • Subject area X has an almost even distribution of citations
  • Subject area Y has a very skewed distribution of citations

Calculation procedure

  1. Rank the publications in Subject area X according to their citation counts
  2. Select top 10 % of the publications
  3. Repeat steps 1 and 2 for Subject area Y

The results of step 1-3 can be visualized as below:

Table 1: Ranking of publications by citation count and selection of Top 10% for two Subject areas.

Publications within each subject area are first ordered according to their citation counts, and then 10 % of publications in each subject area are identified as belonging to the top class. Thus, each subject area should have 10 % of the area’s publications in top 10 %. In our example, with just 10 publications in each subject area, 10 % is equal to one publication in each subject area, or a total of two publications (circled in the Table 1).

The shape of the citation distributions in the subject areas does not affect the calculation. However, as shown below, the citation distributions do greatly impact SciVal’s scores of Outputs in Top 10 % Citation Percentile (FW).

SciVal’s Outputs in Top Citation Percentiles, field-weighted (FW), indicator calculation

SciVal’s Research Metrics Guidebook describes their indicator Outputs in Top Citation Percentiles as:

“The entire Scopus database, or “World”, is the data universe used to generate this metric:

  • The citation counts that represent the thresholds of the 1%, 5%, 10% and 25% most-cited papers in Scopus per Publication Year are calculated. Sometimes the same number of citations received by the publication at, say, the 10% boundary has been received by more than 10% of publications; in this case, all of the publications that have received this number of citations are counted within the top 10% of the Scopus data universe, even though that represents more than 10% by volume.
  • SciVal uses these citation thresholds to calculate the number of an entity’s publications that fall within each percentile range.

[…]

When field weighting Outputs in Top Citation Percentiles, the document citation ratio is used instead of citations to compute values for each percentile.” (Elsevier Research Intelligence, 2019, p. 48)

The description indicates that SciVal uses citation ratios instead of raw citation counts and merges subject areas (“the entire Scopus database”) before extracting the 10 % publications with the most citations. These two operations are mathematically not equivalent to the traditional procedure for calculating field-weighted percentile indicators (see example above and the bibliometric research literature, for example, Bornmann & Williams, 2020; Waltman & van Eck, 2019) or commented on by bibliometric research literature nor by users (Rowlings, 2020). The discrepancy has the potential to cause misinterpretation of scores obtained by SciVal’s Outputs in Top Citation Percentiles (FW).

SciVal’s calculation (top 10 %)

We reuse the dataset with Subject areas X and Y.

  1. Calculate the average number of citations per publication in Subject area X
  2. Calculate the citation ratios for the publications in Subject area X, i.e. for each publication; calculate the number of citations divided by the average number of citations
  3. Repeat steps 1 and 2 for Subject area Y
  4. Merge Subject areas X and Y
  5. Rank the publications according to their citation ratios
  6. Select top 10 % of the publications

Steps 1-3 produce the following results:

Table 2: Calculation of citation count and ratio for publications in two Subject areas.

Steps 4-5, merge the subject areas and rank all publications according to their citation ratios:

Table 3: Publication sets merged and ranked by Citation Ratio

Step 6, select the publications in top 10 % of our results. In this example case, they both belong to subject area Y:

Figure 1: SciVal method of ranked citation ratios across subject areas

The dataset has 20 publications and the two publications with the highest citation ratios (circled in Figure 1) would be identified by Outputs in Top 10 % Citation Percentile (FW). Both publications are from Subject area Y meaning that Subject area Y has 20 % of its publications in top 10 % and Subject area X has 0 %.

In the traditional calculation, the subject areas would not be merged like they are in the SciVal methods displayed by Figure 1. Each subject area would have 10 % of publications in the top 10 % percentile, i.e. one publication from each subject area, or a total of two publications.

In bibliometrics, we are accustomed to field-weighted indicators scaled so that the “World average”, i.e., the average over the whole dataset, equals 1.0, or with percentile indicators, the given percentage, for example, 10 %. Moreover, this should hold for each subject area separately, not just on a merged, overall level. This scaling makes the interpretation of the scores straightforward. Thus, we tend to assume that if a given research group has 12 % of its publications counted among the top 10 % percentile, the group’s performance in its chosen subject area is above the world average.

However, when mixing subject areas with different citation distributions and using SciVal’s calculation, as illustrated in Figure 1, this interpretation no longer holds. In a large dataset, e.g. at country level, with all subject areas included, the difference between SciVal’s and traditional field-weighted percentile indicator scores are mostly insignificant. However, the smaller the focal unit and the narrower the spectrum of research areas, e.g. that of even one large research department, the difference could be significant.

In conclusion, SciVal’s Outputs in Top 10 % Citation Percentile (FW) doesn’t only measure just scientific impact, but is affected by the chosen fields of specialization.

Working with Outputs in Top Citation Percentiles, field-weighted (FW), in SciVal

Our example above where subject area X scores 0 % and subject area Y scores 20 % in Outputs in Top 10 % Citation Percentile (FW) is not just a result of the small size of the dataset and extreme citation distributions. Similar results apply for full scale empirical datasets due to subject areas with different citation distributions.

Take a subject area with a high citation activity and small shares of non-cited papers, for example, Behavioral Neuroscience in SciVal. Opposite are subject areas with low citation averages due to a very large share of publications with low or zero citations, for example, Visual Arts and Performing Arts in SciVal.

Using SciVal’s indicators Outputs in Top Citation Percentiles (FW) the publications are ordered according to their citation ratios, not citation counts. Because of the steep slope of the citation distribution in the latter subject area type (for example, Visual Arts and Performing Arts), the most highly cited publications from these subject areas get higher citation ratios, and thus have an advantage in Outputs in Top 1 % Citation Percentile (FW). On the other hand, because the majority of publications after the highest-cited percentiles in these areas have very low raw citation counts, they will be underrepresented in Outputs in Top 25 % Citation Percentile (FW).

The table below shows scores for articles from 2016. When looking at one SciVal subject area, one publication type, and one publication year; we expect the scores to be close to 1 % and 25 % for the two indicators, respectively. This is not the case. Behavioral Neuroscience’s scores are far below 1 % and above 25 %. The opposite is the case for Visual Arts and Performing Arts.

Table 4: SciVal Outputs in Top 1% and 25% Citation Percentile (FW) scores for Behavioural Neuroscience and Visual Arts and Performing Arts subject areas in 2016

As we have shown above, SciVal’s calculation method should be seriously considered if the Outputs in Top Citation Percentiles (FW) are applied to datasets of different subject areas, i.e. different citation distributions. The indicators should be flagged as not necessarily making the fields comparable and being different to the percentile indicators described in the bibliometric research literature. We feel that SciVal’s Outputs in Top Citation Percentiles (FW) may not be in agreement with how users interpret field-weighted percentile indicators in general.

Conclusion

Marianne got new question about the SciVal indicator: “My department’s score is rather low. Can we do something to increase it in the next evaluation??”

We do not advocate that there are right or wrong indicators, but indicators with different properties. Indicators affect research agendas, as demonstrated by the original question posed to Marianne and discussed in the bibliometrics community (Price, 2020). We stress the importance of external expert assessment of indicators in tools like SciVal and others. Furthermore, commercial or freely available products should reference the bibliometric research literature when appropriate in their support documentation.

Acknowledgements

We wish to thank Ludo Waltman for his expert advice on percentile indicators.

References

Bornmann, L., & Williams, R. (2020). An evaluation of percentile measures of citation impact, and a proposal for making them better. Scientometrics, 124(2), 1457–1478. https://doi.org/10.1007/s11192-020-03512-7

Elsevier Research Intelligence. (2019). Research Metrics Guidebook (p. 68). https://www.elsevier.com/__data/assets/pdf_file/0020/53327/ELSV-13013-Elsevier-Research-Metrics-Book-r12-WEB.pdf

Price, R. (2020). Truth behind the numbers? The Bibliomagician. https://thebibliomagician.wordpress.com/2020/06/09/the-truth-behind-the-numbers/

Rowlands, I. (2020). Using SciVal Responsibly: A guide to interpretation and good practice https://repository.lboro.ac.uk/articles/Using_SciVal_responsibly_a_guide_to_interpretation_and_good_practice/11812044

Rowlands, I. (2019). Six weeks is a long time in bibliometrics: Stability and Field-Weighted Citation Percentile. Bibliomagician. https://thebibliomagician.wordpress.com/2019/11/21/six-weeks-is-a-long-time-in-bibliometrics-stability-and-field-weighted-citation-percentile/

Waltman, L., & van Eck, N. J. (2019). Field Normalization of Scientometric Indicators. In W. Glänzel, H. F. Moed, U. Schmoch, & M. Thelwall (Eds.), Springer Handbook of Science and Technology Indicators (pp. 281–300). Springer International Publishing. https://doi.org/10.1007/978-3-030-02511-3_11


Marianne Gauffriau is Coordinator of the Bibliometric Service at Copenhagen University Library, Denmark. She is an advocate for open science, especially responsible metrics. Marianne is treasurer of Danish Association for Research Managers and Administrators (DARMA) and co-founder of the Danish Research Indicator Network (FIN).

https://orcid.org/0000-0001-7639-7719

Yrjö Leino has been working with bibliometric data sets at CSC – IT Center for Science (Finnish national supercomputing center) for the last decade.

Before that, he worked at CSC as a specialist in numerical mathematics and computational methods.

https://orcid.org/0000-0001-8114-9517


Unless it states other wise, the content of the Bibliomagician is licensed under a Creative Commons Attribution 4.0 International License.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.