preprint repositories

There have been quite a few statements by various scientists on Twitter who, in pointing to some preprint paper’s untenable claims, point to the manuscript’s identity as a preprint paper as well. This is not fair, as I’ve argued many times before. A big part of the problem here is bad journalism. Bad preprint papers are a problem not because their substance is bad but because people who are not qualified to understand why it is bad read it and internalise its conclusions at face value.

There are dozens of new preprint papers uploaded onto arXiv, medRxiv and bioRxiv every week making controversial arguments and/or arriving at far-fetched conclusions, often patronising to the efforts of the subject’s better exponents. Most of them (at least according to what I know of preprints on arXiv) are debated and laid to rest by scientists familiar with the topics at hand. No non-expert is hitting up arXiv or bioRxiv every morning looking for preprints to go crazy on. The ones that become controversial enough to catch the attention of non-experts have, nine times out of then, been amplified to that effect by a journalist who didn’t suitably qualify the preprint’s claims and simply published it. Suddenly, scores (or more) of non-experts have acquired what they think is refined knowledge, and public opinion thereafter goes against the scientific grain.

Acknowledging that this collection of events is a problem on many levels, which particular event would you say is the deeper one?

Some say it’s the preprint mode of publishing, and when asked for an alternative, demand that the use of preprint servers be discouraged. But this wouldn’t solve the problem. Preprint papers are a relatively new development while ‘bad science’ has been published for a long time. More importantly, preprint papers improve public access to science, and preprints that contain good science do this even better.

To making sweeping statements against the preprint publishing enterprise because some preprints are bad is not fair, especially to non-expert enthusiasts (like journalists, bloggers, students) in developing countries, who typically can’t afford the subscription fees to access paywalled, peer-reviewed papers. (Open-access publishing is a solution too but it doesn’t seem to feature in the present pseudo-debate nor does it address important issues that beset itself as well as paywalled papers.)

Even more, if we admitted that bad journalism is the problem, as it really is, we achieve two things: prevent ‘bad science’ from reaching the larger population and retain access to ‘good science’.

Now, to the finer issue of health- and medicine-related preprints: Yes, acting based on the conclusions of a preprint paper – such as ingesting an untested drug or paying too much attention to an irrelevant symptom – during a health crisis in a country with insufficient hospitals and doctors can prove deadlier than usual. But how on Earth could a person have found that preprint paper, read it well enough to understand what it was saying, and act on its conclusions? (Put this way, a bad journalist could be even more to blame for enabling access to a bad study by translating its claims to simpler language.)

Next, a study published in The Lancet claimed – and thus allowed others to claim by reference – that most conversations about the novel coronavirus have been driven by preprint papers. (An article in Ars Technica on May 6 carried this provocative headline, for example: ‘Unvetted science is fuelling COVID-19 misinformation’.) However, the study was based on only 11 papers. In addition, those who invoke this study in support of arguments directed against preprints often fail to mention the following paragraph, drawn from the same paper:

… despite the advantages of speedy information delivery, the lack of peer review can also translate into issues of credibility and misinformation, both intentional and unintentional. This particular drawback has been highlighted during the ongoing outbreak, especially after the high-profile withdrawal of a virology study from the preprint server bioRxiv, which erroneously claimed that COVID-19 contained HIV “insertions”. The very fact that this study was withdrawn showcases the power of open peer-review during emergencies; the withdrawal itself appears to have been prompted by outcry from dozens of scientists from around the globe who had access to the study because it was placed on a public server. Much of this outcry was documented on Twitter and on longer-form popular science blogs, signalling that such fora would serve as rich additional data sources for future work on the impact of preprints on public discourse. However, instances such as this one described showcase the need for caution when acting upon the science put forth by any one preprint.”

The authors, Maimuna Majumder and Kenneth Mandl, have captured the real problem. Lots of preprints are being uploaded every week and quite a few are rotten. Irrespective of how many do or don’t drive public conversations (especially on the social media), it’s disingenuous to assume this risk by itself suffices to cut access.

Instead, as the scientists write, exercise caution. Instead of spoiling a good thing, figure out a way to improve the reporting habits of errant journalists. Otherwise, remember that nothing stops an irresponsible journalist from sensationalising the level-headed conclusions of a peer-reviewed paper either. All it takes is to quote from a grossly exaggerated university press-release and to not consult with an independent expert. Even opposing preprints with peer-reviewed papers only advances a false balance, comparing preprints’ access advantage to peer-review’s gatekeeping advantage (and even that is on shaky ground).

Twice this week, I’d had occasion to write about how science is an immutably human enterprise and therefore some of its loftier ideals are aspirational at best, and about how transparency is one of the chief USPs of preprint repositories and post-publication peer-review. As if on cue, I stumbled upon a strange case of extreme scientific malpractice that offered to hold up both points of view.

In an article published January 30, three editors of the Journal of Theoretical Biology (JTB) reported that one of their handling editors had engaged in the following acts:

“At the first stage of the submission process, the Handling Editor on multiple occasions handled papers for which there was a potential conflict of interest. This conflict consisted of the Handling Editor handling papers of close colleagues at the Handling Editor’s own institute, which is contrary to journal policies.”
“At the second stage of the submission process when reviewers are chosen, the Handling Editor on multiple occasions selected reviewers who, through our investigation, we discovered was the Handling Editor working under a pseudonym…”
Many forms of reviewer coercion
“In many cases, the Handling Editor was added as a co-author at the final stage of the review process, which again is contrary to journal policies.”

On the back of these acts of manipulation, this individual – whom the editors chose not to name for unknown reasons but one of whom all but identified on Twitter as a Kuo-Chen Chou (and backed up by an independent user) – proudly trumpets the following ‘achievement’ on his website:

The same webpage also declares that Chou “has published over 730 peer-reviewed scientific papers” and that “his papers have been cited more than 71,041 times”.

Without transparency^a and without the right incentives, the scientific process – which I use loosely to denote all activities and decisions associated with synthesising, assimilating and organising scientific knowledge – becomes just as conducive to misconduct and unscrupulousness as any other enterprise if only because it allows people with even a little more power to exploit others’ relative powerlessness.

a. Ironically, the JTB article lies behind a paywall.

In fact, Chen had also been found guilty of similar practices when working with a different journal, called Bioinformatics, and an article its editors published last year has been cited prominently in the article by JTB’s editors.

Even if the JTB and Bioinformatics cases are exceptional for their editors having failed to weed out gross misconduct shortly after its first occurrence – it’s not; but although there many such exceptional cases, they are still likely to be in the minority (an assumption on my part) – a completely transparent review process eliminates such possibilities as well as, and more importantly, naturally renders the process trustless^b. That is, you shouldn’t have to trust a reviewer to do right by your paper; the system itself should be designed such that there is no opportunity for a reviewer to do wrong.

b. As in trustlessness, not untrustworthiness.

Second, it seems Chou accrued over 71,000 citations because the number of citations has become a proxy for research excellence irrespective of whether the underlying research is actually excellent – a product of the unavoidable growth of a system in which evaluators replaced a complex combination of factors with a single number. As a result, Chou and others like him have been able to ‘hack’ the system, so to speak, and distort the scientific literature (which you might’ve seen as the stack of journals in a library representing troves of scientific knowledge).

But as long as the science is fine, no harm done, right? Wrong.

If you visualised the various authors of research papers as points and the lines connecting them to each other as citations, an inordinate number would converge on the point of Chou – and they would be wrong, led there not by Chou’s prowess as a scientist but misled there by his abilities as a credit-thief and extortionist.

This graphing exercise isn’t simply a form of visual communication. Imagine your life as a scientist as a series of opportunities, where each opportunity is contested by multiple people and the people in charge of deciding who ‘wins’ at each stage aren’t some or all of well-trained, well-compensated or well-supported. If X ‘loses’ at one of the early stages and Y ‘wins’, Y has a commensurately greater chance of winning a subsequent contest and X, lower. Such contests often determine the level of funding, access to suitable guidance and even networking possibilities, so over multiple rounds, by virtue of the evaluators at each step having more reasons to be impressed by Y‘s CV because, say, they had more citations, and fewer reasons to be impressed with X‘s, X ends up with more reasons to exit science and switch careers.

Additionally, because of the resources that Y has received opportunities to amass, they’re in a better position to conduct even more research, ascend to even more influential positions and – if they’re so inclined – accrue even more citations through means both straightforward and dubious. To me, such prejudicial biasing resembles the evolution of a Lorenz attractor: the initial conditions might appear to be the same to some approximation, but for a single trivial choice, one scientist ends up being disproportionately more successful than another.

The answer of course is many things, including better ways to evaluate and reward research, and two of them in turn have to be to eliminate the use of numbers to denote human abilities and to make the journey of a manuscript from the lab to the wild as free of opaque, and therefore potentially arbitrary, decision-making as possible.

Featured image: A still from an animation showing the divergence of nearby trajectories on a Lorenz system. Caption and credit: MicoFilós/Wikimedia Commons, CC BY-SA 3.0.

Tag: preprint repositories

Poor journalism is making it harder for preprints

The scientist as inadvertent loser