Laura Himanen and Lizzie Gadd introduce a step-by-step process developed by the INORMS Research Evaluation Working Group for evaluating responsibly.
We’ve got a shed load of principles now for responsible research metrics. We have DORA, the Leiden Manifesto and the Metric Tide. We also have the many bespoke sets of principles being developed by individual organisations. And they’re great. They provide a framework for evaluating responsibly which makes evaluators think and think again about their approaches. However what they don’t provide is a how-to guide. They’re a bit like a Highway Code which governs the way we approach the roads, but doesn’t provide a map to our chosen destination. And this is where practitioners can struggle. Often we’ll see questions on discussion lists from Librarians and Research Managers who’ve finally got their responsible metrics principles approved, and are left asking “now what?”.
Of course one part of the struggle is actually rolling out the policy internally, and Lizzie has spoken quite a bit about this. However, the other challenge is how you actually go about using metrics responsibly once you have a policy in place. As a conference attendee once said: responsible metrics principles can come across as terrifyingly negative. Don’t do this! Beware of that! What practitioners really need is some solid step-by-step guidance as to how to test their existing evaluation procedures or go about establishing new ones. This is something the INORMS Research Evaluation Working Group has been considering in its work to produce a Responsible Research Evaluation briefing for senior managers. Its discussions have resulted in a model we’ve called ‘SCOPE’ that we hope will help those seeking a process for doing research evaluation responsibly. The five stages are outlined here along with some of the key considerations at each.
- START with what you value
- CONTEXT considerations
- OPTIONS for measuring
- PROBE deeply
- EVALUATE your evaluation
Start with what you value
This is such a critically important first step, and one so rarely considered. Too often, instead of measuring what we value as institutions, we either measure what others value, or measure using only the data that we have readily available. A discussion list member recently asked for advice as their senior leaders had asked them to suggest a single research performance metric as a KPI. However, the leader hadn’t specified what it was about their research performance they valued and wanted to measure. Were they interested in research income? Publication volume? Research impact? What? Without this information, it’s impossible to select a single research performance metric – even if having just one metric was a sensible thing to do.
Of course, many institutions outsource their values to university rankings or national research assessment exercises, only measuring their own researchers in accordance with the way they themselves are evaluated. This is understandable to a point. We can’t ignore the external evaluation environment. And sometimes, when such external assessments ‘measure what matters’ such as open access to outputs, they can be powerful drivers of positive behaviour. But it is a dangerous long-term strategy to only base your internal evaluation on external drivers. In fact, this could be a slippery slope towards losing your institutional autonomy and character. Someone on Twitter once wrote, “We are evaluated, therefore we are.” To which, another responded, “We are evaluated, therefore we are all the same.”
The other challenge in this space is the all-too-common habit of measuring only according to the data we have. This is called the Streetlight Effect: looking for answers using the easily available source of light, rather than where they are more likely to be found. The increasing availability of bibliographic data, and data visualisation tools is making citation based research evaluation ever-more common. Even those who should know better are reaching for answers by churning bibliometrics and working backwards, rather than establishing a well-formed research question and working forwards.
Once you are clear about what it is you value about research, you need to consider the context in which you’re measuring. Too often we hear arguments about which metrics are good and which metrics are bad, but of course the truth is that whether a metric is good or bad depends entirely on what (values), why and who (context) you are measuring. Lizzie recently put together a matrix which plots various measurement purposes on the x-axis and various entity sizes on the y-axis. The reasons for measuring range from high-level ‘science of science’ type analyses, through to measuring to ‘show off’, monitor, compare, incentivise and finally reward. Entities of course, can range from nations through to individuals. And of course the disciplinary focus of each entity is another complicating contextual factor. The impact, and therefore risk, of measuring larger entities to understand research trends is very different to the impact of measuring individuals for recruitment or promotion. As such you should choose your indicators carefully based on these parameters.
We would also issue a word of caution around the increasing trend towards measuring to incentivise behaviour, however. We know that this is often an effective strategy, in line with Campbell’s Law: what you measure is what you get. However, if we increasingly rely on extrinsic forms of motivation, such as measurement, to incentivise behaviours, we risk infantilising our academics, and making the desired behaviour a compliance issue.
The third stage is to explore all of your options for evaluating your values within the given context. We should remember that these might be quantitative or qualitative. As a general rule of thumb, we suggest the use of quantitative indicators for quantitative things: publications, money, citations and students, and qualitative indicators for qualitative things: excellence, quality, value, impact. And we should be very cautious about using quantitative indicators for qualitative things: citations do not equal quality; a Universities’ ranking position does not equal excellence. It’s really important to remember that most of the stuff we want to measure, we can’t actually measure. Not precisely. It’s an art, not a science: a matter of judgement.
This sounds like research methods 101. However, it’s surprising how often academic rigour is abandoned when it comes to management practices. There can be a tendency for those who’ve reached the top of one discipline to assume a level of competence in all others. This is not always the case. Most institutions employ research evaluation specialists in some form. It would be a sensible thing to either develop, or employ, such specialists when embarking on evaluation exercises.
The other consideration at this stage is to engage with those you are seeking to evaluate. Do your evaluations with people, not on people. CWTS Leiden have developed a fantastic academic assessment approach called Evaluative Inquiry which seeks to partner with those under evaluation to design and interpret the assessment. But whether you follow a formal co-design method or not, getting the input of those whose work you are evaluating will always result in better evaluation approach, more meaningful findings, and ultimately better outcomes.
Once you’ve developed your evaluation approach in line with your values, context and options, the fourth stage is to probe deeply. We think there are five key questions to be asked at this point:
- Who does this discriminate against?
We know that there are many groups that can fall foul of poor evaluation practices: early-career researchers, females, non-journal-based disciplines, and researchers that publish in languages other than English. Asking this question can help tease out where an alternative approach might be taken to ensure everyone has a fair chance of success.
- How might this be gamed?
We know that where there’s a prize there’s a game. Stress testing your evaluation approach can reveal where those games might be played, and help you design them out.
- What might the unintended consequences be?
At the recent launch of the Research on Research Institute, Dawn Duhaney used this useful slide which highlights how a system might be deliberately or accidentally used or misused. Whilst gaming cases might fall into the bottom right quadrant (‘Abuse Cases’) there are two other quadrants on the left, namely, Stress cases and Misuse cases which reflect all the unintended consequences of introducing a new approach. We need to think about the short, medium and long-term, local and systemic effects of the evaluation approach we plan to introduce. And we need to think them through every time we evaluate, even if we’ve taken the same approach before.
- Does the cost of measuring outweigh the benefit?
There are opportunity costs to research evaluation. Every penny we spend evaluating our research is a penny we can’t spend on something else, e.g., actually doing research. So it is important that we always ask whether the resource we are spending on evaluating research is netting that same amount of benefit. If the answer is yes, that’s fine; if not, we might need to rethink.
- Does measuring research actually make the research any better?
This is really related to the last point, but is an important question to tease out in its own right: are we actually improving our research (or our research environment) by measuring it? There is an expression, ‘you don’t fatten a pig by weighing it’ which highlights the oft-forgotten truth that no matter how much you measure your research, it’s not going to make it any better. That opportunity has passed. And whilst the effect of measurement on researchers might motivate them to adopt better strategies (i.e., publishing open access) the long-term effects of overmeasurement are almost always negative.
This last stage is an important review stage. We should review not only the outcomes of our evaluation, and how they might loop back round into stage one (what have we learned? Has this affected our research questions?) but the evaluation approach itself. The range of data sources and indicators available to practitioners are constantly changing, and institutional missions and strategies are also subject to change. Just because an evaluation approach worked previously, does not mean it will work forever. Building in a regular review of our evaluation approaches, and critically, doing so with those whose work you are seeking to evaluate, is an essential part of the evaluation process.
So that’s SCOPE! This model, as with all our work, is open to feedback to ensure it best serves the international communities we are a part of. Please join the conversation on the INORMS Research Evaluation Working Group discussion list, or contact the authors directly with your thoughts. We’d love to hear from you.
Laura Himanen is a Research Specialist working at the Research and Innovation Services at Tampere University, Finland. To date, she’s been the project manager for three research assessment exercises, she chairs a national expert group in Culture for Open Scholarship and leads a working group on responsible metrics, with the aim of defining and promoting the responsible use of research metrics in Finland.
Elizabeth Gadd is the Research Policy Manager (Publications) at Loughborough University. She is the chair of the Lis-Bibliometrics Forum and co-Champions the ARMA Research Evaluation Special Interest Group. She also chairs the INORMS International Research Evaluation Working Group.
Unless it states other wise, the content of the Bibliomagician is licensed under a Creative Commons Attribution 4.0 International License.
23 Replies to “Introducing SCOPE – a process for evaluating responsibly”
Interesting post! However, I have two concerns:
(1) This post seems to be based on the implicit idea that evaluation involves measurement. I don’t think this needs to be the case. I would argue that evaluation involves collecting relevant information and presenting and summarizing this information in useful ways. Making measurements is one particular way in which this can be done, but there are many other ways as well (e.g., creating visualizations, presenting lists of relevant information, writing narratives, etc.). Hence, I would argue that it is better to start from a broader perspective on evaluation.
(2) “As a general rule of thumb, we suggest the use of quantitative indicators for quantitative things: publications, money, citations and students, and qualitative indicators for qualitative things: excellence, quality, value, impact.”: I don’t think this rule of thumb will work. Ultimately, things that really matter are always of a qualitative nature, I believe. Quantitative things just serve as convenient placeholders for the qualitative things that we (should) really care about. Also, we don’t need indicators for quantitative things, since we can simply count them. Let’s acknowledge that the things we really care about are of a qualitative nature and usually cannot even be defined in a precise way. On a case-by-case basis, we should then to ask ourselves whether quantitative representations are helpful in getting a better understanding of the qualitative things that really matter. If the answer is negative, we should consider other ways of collecting, presenting, and summarizing information relevant for understanding the things we care about.
Thanks Ludo! Interesting observations.
With regards to your first point, I wouldn’t say this process was limited to measurement applications, however, I accept that many of our examples focus on measurement, simply because this is where we see most of the problems with research evaluation in our settings. We will bear this in mind for future iterations.
With regards to your second point, I think I would disagree that the things we value are always qualitative. If you are seeking to establish your research group as a centre of excellence, developing a critical mass (volume of publications and income) may be valuable to you. You can ‘count’ these things using a range of metrics/indicators (raw counts, count per FTE, field-weighted indicators) but you would need to make a choice of indicator. Saying we don’t need indicators for quantitative things, since we can simply count them, doesn’t quite cover the decision-making involved? But perhaps you are making a semantic point: that the word ‘indicator’ should only be used when a number acts as a proxy for something qualitative? We accept that quantitative measures can sometimes be helpful indicators of qualitative things (hence calling this a ‘general’ rule of thumb) but doing so is highly problematic when the indicator becomes the ONLY proxy for that quality.
I think the second point is indeed largely semantic. Regarding the first point, you may could consider replacing ‘OPTIONS for measuring’ by ‘OPTIONS for indicators’. Measurement is usually understood as a quantitative activity, while indicators could be either quantitative or qualitative. Therefore I think this better covers what you intend to say.
The OPTIONS stage is trying to get evaluators to think beyond indicators to all forms of assessment that might be appropriate given your values and the context, peer review included. So perhaps “OPTIONS for evaluating” would be better, given your first point about measurement?
Yes, I agree, Lizzie. ‘OPTIONS for evaluating’ may be even better.
We just had a discussion about this at my workplace (library) and we thought it would be useful to see some case studies where this model is applied. Perhaps a future blog post. Thanks for considering!