Lizzie Gadd asks whether more could be done to build confidence in the REF peer review process.
Peer review is still held up as the gold standard of research evaluation, and, for most, infinitely preferable to metrics. As such, the REF prides itself on being “a process of expert peer review”, so reads its guiding principle. But the status of peer review has taken rather a battering in recent years. There are concerns about the large number of retractions of peer reviewed papers in journals; regular reports of conscious and unconscious bias amongst peer reviewers, and worries about the increase in soundness-only peer review. And if you talk to any UK-based academic you’ll hear deep suspicions about whether REF peer review really offers the gold standard they might expect.
I’ve spent the last eight months of my life combing through REF guidance, and more recently, drafting Loughborough University’s REF Code of Practice. In so doing, three questions have formed in my mind about REF peer review and whether more could be done to build confidence in this process.
Is it equitable?
There are three principles underpinning the REF: equity, equality and transparency. All very laudable. And universities have been given very strict instructions about ensuring their processes for selecting outputs adhere to these criteria, and documenting them in their Code of Practice. Not only this, but universities have to commit to running Equality Impact Assessments (EIAs) on their processes for selecting outputs to double-check that they are in fact fair, and don’t treat any particular group unequally. Again, absolutely sensible. However, when it comes to the scoring of those outputs by REF panel members, there is currently no parallel commitment from the REF to run any EIAs.
Considering REF peer review is not double-blind, and there are a number of studies that show that peer review favours men, big names and famous institutions, this is an uncomfortable omission. Even more so, when we recall Alan Dix’s analysis of peer review scores with citations in the Computing panel post-REF 2014 where he found significant biases against women and those in applied disciplines rather than pure. It is somewhat reassuring that all panel members are undergoing unconscious bias training (REF guidance s.45b) but there is no detail as to the length, timing or content of that training – detail that all universities have been requested to provide in their Codes of Practice.
Now I know the REF will have a much harder job running EIAs than universities due to the lack of available data on protected characteristics. However, there is a wide range of existing methodologies for at least a gender-based analysis which don’t require a separate dataset, and a discipline-based one should be straightforward. As we’re supplying ECR data to REF this analysis should be possible too with a bit of data carpentry – indeed something similar was done post-REF 2014. Perhaps for REF 2027 a mechanism for building in other forms of EIA should also be thought through at an earlier stage? Even better, perhaps in future we make use of all the green open access manuscripts available in Institutional Repositories and make the process double-blind. This might be a challenge on the current volumes of outputs submitted, but if some form of random sample approach was taken, it could work and it should go a very long way towards building confidence in the REF peer review process.
Is it transparent?
Another of the REF’s underlying principles is transparency. It’s been a bit of a shock to me as a REF Virgin just how transparent institutions are required to be in their Codes of Practice. Who’s on that committee? How were they chosen? How were the people who chose them chosen? How often do you meet? Why do you meet? How do you get something on the agenda? I feel like our Code of Practice is the driest document on earth. But it’s transparent. Boy is it transparent. So it would seem logical that the REF holds itself to the same standards of transparency. And I think this is generally true. The REF is magnificently consultative. You’re consulted on the nature of consultation, the consultation itself and the outcomes of the consultation. And we saw this with the Metric Tide report. It ran to two volumes leaving absolutely no stone unturned on the question of whether metrics should play a part in research assessment. But when it comes to questions around the use of peer review in research assessment, it feels like there is far less scrutiny. We were told that metrics failed because they couldn’t replicate peer review, but how successful is peer review? Where is the ‘Peer Review Tide’ report?
We were told that metrics failed because they couldn’t replicate peer review, but how successful is peer review? Where is the ‘Peer Review Tide’ report?
A paper by Traag and Waltman of the CWTS Leiden recently called on the REF to perform an uncertainty analysis of initial output peer review scores to help model a comparison between metric-based and peer-reviewed output assessments. And it strikes me that some data around the levels of agreement between reviewers could make a useful contribution to the debate around the nature of peer review – but this should just be the starting point. There are questions around what we can expect from peer review. What is it supposed to do? What role does agreement or disagreement play in this context? Are there times when metrics can play a useful role (for instance in organising agreement)? How is expert judgement organised? And how are disciplinary differences taken into account?
Now I understand that there may be concerns about how these findings are interpreted. Exposing the inner workings of peer review might significantly undermine any remaining trust in the process. And what then? Call me naïve, but my hope is that greater transparency around peer review would build trust rather than destroy it. Some of the concerns around peer review seem to be based on too high an expectation as to what it can realistically deliver. Perhaps it has fallen victim to its own “gold standard” marketing? REF peer review, as with journal peer review, is a negotiation between experts. Lower levels of initial agreement might lead to greater debate and ultimately a better decision. Higher levels of agreement may reveal the conscious or unconscious use of proxies (journal reputation, citations, fame) for decision-making – so-called “thinking with indicators”. And in some disciplines this may not always be a bad thing. But this is all interesting stuff. And better it were known, understood and learned from, than hidden away and kept clouded in suspicion. After all, making the unknown known is what the scholarly enterprise is all about.
Is it actually peer review?
My final thought tries to get to the heart of the academy’s suspicions around REF peer review. This has been on my mind ever since delivering a training session where new-to-the-UK academics literally laughed out loud at my proclamation that the REF was a peer review process. I thought academics liked peer review? And given the choice between that and – heaven forbid – metrics, they would weep with gratitude at the scholarly critique? I don’t think I had this wrong, actually. Academics do prefer peer review in the main. What they are doubtful about is whether what the REF refers to as peer review is what they would describe as peer review.
For most academics, peer review is, to use the EUA’s definition, “the process of experts making a qualitative judgement of research quality”. The two key terms here being “experts” and “qualitative”. Now we know that the REF recruits experts to their panels – no doubt about it. But they are not experts in everything. And unlike journal peer review, where one always hopes one’s paper is being reviewed by an (anonymous) expert, the fact that the membership of REF sub-panels is well-known means that many academics can be quite categorical in their declarations that, for them REF is not expert peer review. The world of knowledge is simply too broad, and the REF’s coffers simply too limited, to ensure that it is a truly expert peer review process.
What [academics] are doubtful about is whether what the REF refers to as peer review, is actually peer review.
Now one might argue that for the activities required of REF peer reviewers, they are quite expert enough, after all they are not being called upon to advise the authors as to how they might improve their paper, only whether it meets some pre-established criteria for a quantity of stars. And this is where we hit up against the second problem with REF peer review: it does not result in a qualitative referee report that the authors can pick through, take issue with, and negotiate. It is a quantitative process. And this is quite definitely not what an academic wants from peer review. If the whole thing is going to end up as a series of numbers, why not cut out the intermediary and use metrics? Which is exactly what one new-to-the-UK academic recently argued with me at the aforementioned training session. No-one wants their work to be reduced to a number. And REF peer review does that.
So what’s the answer to all this? Well, I think this is where we get down to the fundamental purpose of the REF. Does it exist simply to judge where we’ve been? Or is its job to also ensure that we’re heading in the right direction? Because if it’s the latter, we need to make much better use of all those expert peers in not only reading outputs, but in providing qualitative feedback to universities across the whole of their research endeavour and advising as to how they might better meet their own objectives. If it’s simply a matter of providing star ratings as a basis on which to dole out cash, we might wish to think about a rebrand. Calling REF output evaluation an “expert peer review” process when it defies many academics’ views of expert peer review probably isn’t helping. “Peer scoring” is probably a more accurate term.
It’s not my intention to rain on REF’s parade. The peer review of outputs on a national scale is a nobly ambitious undertaking, and it’s notable that no other country even attempts it. And all the folks I’ve ever spoken to at Research England are genuinely open to discussion and debate around what good research evaluation might look like. (Indeed, Research England are looking at this right now and I hope these thoughts might feed into that process.) But I do think there is a piece of work to be done around peer review: around making better attempts at mitigating against unconscious bias; around understanding what peer review as a negotiated decision-making process actually means; and around sense-checking whether scoring historical outputs is the best use of peer reviewers’ expertise or whether, if we are really committed to “research excellence”, their time might be better spent guiding universities to a more effective use of their future.
Since writing this blog, I’m delighted to say that the Association of Research Managers and Administrators (ARMA) have submitted a formal request to the Chair of the REF Equality & Diversity Panel (EDAP), Professor Dianne Berry, for EIAs on the scoring of outputs. Professor Berry has agreed to put this on the agenda for their next meeting on 24 May. I will post any updates on this blog post as-and-when I have them.
I am grateful to Professor Sarah de Rijcke, Professor Steve Rothberg, Professor James Wilsdon, and members of the ARMA Research Evaluation and REF Special Interest Groups for conversations that informed the writing of this piece. Any nonsense is strictly mine of course
Elizabeth Gadd is the Research Policy Manager (Publications) at Loughborough University. She is the chair of the Lis-Bibliometrics Forum and co-Champions the ARMA Research Evaluation Special Interest Group. She also chairs the INORMS International Research Evaluation Working Group.
Unless it states other wise, the content of the Bibliomagician is licensed under a Creative Commons Attribution 4.0 International License.