Getting rid of the GRE

An investigation by Science has found that, today, just 3% of “PhD programs in eight disciplines at 50 top-ranked US universities” require applicants’ GRE scores, “compared with 84% four years ago”. This is good news about a test whose purpose I could never understand: first as a student who had to take it to apply to journalism programmes, then as a journalist who couldn’t unsee the barriers the test imposed on students from poorer countries with localy tailored learning systems and, yes, not fantastic English. (Before the test’s format was changed in 2011, taking the test required takers to memorise long lists of obscure English words, an exercise that was devoid of purpose because takers would never remember most of those words.) Obviously many institutes still require prospective students to take the GRE, but the fact that many others are alive to questions about the utility of standardised tests and the barriers they impose on students from different socioeconomic backgrounds is heartening. The Science article also briefly explored what proponents of the GRE have to say, and I’m sure you’ll see (below) as I did that the reasons are flimsy – either because this is the strength of the arguments on offer or because Science hasn’t sampled all the available arguments in favour, which seems to me to be more likely. This said, the reason offered by a senior member of the company that devises and administers the GRE is instructive.

“I think it’s a mistake to remove GRE altogether,” says Sang Eun Woo, a professor of psychology at Purdue University. Woo is quick to acknowledge the GRE isn’t perfect and doesn’t think test scores should be used to rank and disqualify prospective students – an approach many programs have used in the past. But she and some others think the GRE can be a useful element for holistic reviews, considered alongside qualitative elements such as recommendation letters, personal statements, and CVs. “We’re not saying that the test is the only thing that graduate programs should care about,” she says. “This is more about, why not keep the information in there because more information is better than less information, right?”

Removing test scores from consideration could also hurt students, argues Alberto Acereda, associate vice president of global higher education at the Educational Testing Service, the company that runs the GRE. “Many students from underprivileged backgrounds so often don’t have the advantage of attending prestigious programs or taking on unpaid internships, so using their GRE scores serves [as a] way to supplement their application, making them more competitive compared to their peers.”

Both arguments come across as reasonable – but they’re both undermined by the result of an exercise that the department of Earth and atmospheric sciences at Cornell University conducted in 2020: A group evaluated prospective students’ applications for MS and PhD programmes while keeping the GRE scores hidden. When the scores were revealed, the evaluations weren’t “materially affected”. Obviously the department’s findings are not generalisable – but they indicate the GRE’s redundancy, with the added benefit for evaluators to not have to consider the test’s exorbitant fee on the pool of applicants (around Rs 8,000 in 2014 and $160 internationally, up to $220 today) and the other pitfalls of using the GRE to ‘rank’ students’ suitability for a PhD programme. Some others quoted in the Science article vouched for “rubric-based holistic reviews”. The meaning of “rubric” in context isn’t clear from the article itself but the term as a whole seems to mean considering students on a variety of fronts, one of which is their performance on the GRE. This also seems reasonable, but it’s not clear what GRE brings to the table. One 2019 study found that GRE scores couldn’t usefully predict PhD outcomes in biomedical sciences. In this context, including the GRE – even as an option – in the application process could disadvantage some students from applying and/or being admitted due to the test’s requirements (including the fee) as well as, and as a counterexample to Acereda’s reasoning, due to their scores on the test not faithfully reflecting their ability to complete a biomedical research degree. But in another context – of admissions to the Texas MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences (GSBS) – researchers reported in 2019 that the GRE might be useful to “extract meaning from quantitative metrics” and when employed as part of a “multitiered holistic” admissions process, but which by itself could disproportionately triage Black, Native and Hispanic applicants. Taken together, more information is not necessarily better than less information, especially when there are other barriers to acquiring the ‘more’ bits.

Finally, while evaluators might enjoy the marginal utility of redundancy, as a way to ‘confirm’ their decisions, it’s an additional and significant source of stress and consumer of time to all test-takers. This is in addition to a seemingly inescapable diversity-performance tradeoff, which strikes beyond the limited question of whether one standardised test is a valid predictor of students’ future performance and at the heart of what the purpose of a higher-education course is. That is, should institutes consider diversity at the expense of students’ performance? The answer depends on the way each institute is structured, what its goal is and what it measures to that end. One that is focused on its members publishing papers in ‘high IF’ journals, securing high-value research grants, developing high h-indices and maintaining the institute’s own glamourous reputation is likely to see a ‘downside’ to increasing diversity. An institute focused on engendering curiosity, adherence to critical thinking and research methods, and developing blue-sky ideas is likely to not. But while the latter sounds great (strictly in the interests of science), it may be impractical from the point of view of helping tackle society’s problems and of fostering accountability on the scientific enterprise at large. The ideal institute lies somewhere in between these extremes: its admission process will need to assume a little more work – work that the GRE currently abstracts off into a single score – in exchange for the liberty to decouple from university rankings, impact factors, ‘prestige’ and other such preoccupations.

The scientist as inadvertent loser

Twice this week, I’d had occasion to write about how science is an immutably human enterprise and therefore some of its loftier ideals are aspirational at best, and about how transparency is one of the chief USPs of preprint repositories and post-publication peer-review. As if on cue, I stumbled upon a strange case of extreme scientific malpractice that offered to hold up both points of view.

In an article published January 30, three editors of the Journal of Theoretical Biology (JTB) reported that one of their handling editors had engaged in the following acts:

  1. “At the first stage of the submission process, the Handling Editor on multiple occasions handled papers for which there was a potential conflict of interest. This conflict consisted of the Handling Editor handling papers of close colleagues at the Handling Editor’s own institute, which is contrary to journal policies.”
  2. “At the second stage of the submission process when reviewers are chosen, the Handling Editor on multiple occasions selected reviewers who, through our investigation, we discovered was the Handling Editor working under a pseudonym…”
  3. Many forms of reviewer coercion
  4. “In many cases, the Handling Editor was added as a co-author at the final stage of the review process, which again is contrary to journal policies.”

On the back of these acts of manipulation, this individual – whom the editors chose not to name for unknown reasons but one of whom all but identified on Twitter as a Kuo-Chen Chou (and backed up by an independent user) – proudly trumpets the following ‘achievement’ on his website:

The same webpage also declares that Chou “has published over 730 peer-reviewed scientific papers” and that “his papers have been cited more than 71,041 times”.

Without transparencya and without the right incentives, the scientific process – which I use loosely to denote all activities and decisions associated with synthesising, assimilating and organising scientific knowledge – becomes just as conducive to misconduct and unscrupulousness as any other enterprise if only because it allows people with even a little more power to exploit others’ relative powerlessness.

a. Ironically, the JTB article lies behind a paywall.

In fact, Chen had also been found guilty of similar practices when working with a different journal, called Bioinformatics, and an article its editors published last year has been cited prominently in the article by JTB’s editors.

Even if the JTB and Bioinformatics cases are exceptional for their editors having failed to weed out gross misconduct shortly after its first occurrence – it’s not; but although there many such exceptional cases, they are still likely to be in the minority (an assumption on my part) – a completely transparent review process eliminates such possibilities as well as, and more importantly, naturally renders the process trustlessb. That is, you shouldn’t have to trust a reviewer to do right by your paper; the system itself should be designed such that there is no opportunity for a reviewer to do wrong.

b. As in trustlessness, not untrustworthiness.

Second, it seems Chou accrued over 71,000 citations because the number of citations has become a proxy for research excellence irrespective of whether the underlying research is actually excellent – a product of the unavoidable growth of a system in which evaluators replaced a complex combination of factors with a single number. As a result, Chou and others like him have been able to ‘hack’ the system, so to speak, and distort the scientific literature (which you might’ve seen as the stack of journals in a library representing troves of scientific knowledge).

But as long as the science is fine, no harm done, right? Wrong.

If you visualised the various authors of research papers as points and the lines connecting them to each other as citations, an inordinate number would converge on the point of Chou – and they would be wrong, led there not by Chou’s prowess as a scientist but misled there by his abilities as a credit-thief and extortionist.

This graphing exercise isn’t simply a form of visual communication. Imagine your life as a scientist as a series of opportunities, where each opportunity is contested by multiple people and the people in charge of deciding who ‘wins’ at each stage aren’t some or all of well-trained, well-compensated or well-supported. If X ‘loses’ at one of the early stages and Y ‘wins’, Y has a commensurately greater chance of winning a subsequent contest and X, lower. Such contests often determine the level of funding, access to suitable guidance and even networking possibilities, so over multiple rounds, by virtue of the evaluators at each step having more reasons to be impressed by Y‘s CV because, say, they had more citations, and fewer reasons to be impressed with X‘s, X ends up with more reasons to exit science and switch careers.

Additionally, because of the resources that Y has received opportunities to amass, they’re in a better position to conduct even more research, ascend to even more influential positions and – if they’re so inclined – accrue even more citations through means both straightforward and dubious. To me, such prejudicial biasing resembles the evolution of a Lorenz attractor: the initial conditions might appear to be the same to some approximation, but for a single trivial choice, one scientist ends up being disproportionately more successful than another.

The answer of course is many things, including better ways to evaluate and reward research, and two of them in turn have to be to eliminate the use of numbers to denote human abilities and to make the journey of a manuscript from the lab to the wild as free of opaque, and therefore potentially arbitrary, decision-making as possible.

Featured image: A still from an animation showing the divergence of nearby trajectories on a Lorenz system. Caption and credit: MicoFilós/Wikimedia Commons, CC BY-SA 3.0.