preprints – Close Read

Citations and media coverage

According to a press release accompanying a just-published study in PLOS ONE:

Highly cited papers also tend to receive more media attention, although the cause of the association is unclear.

One reason I can think of is a confounding factor that serves as the hidden cause of both phenomena. Discoverability matters just as much as the quality of a paper, and conventional journals implicated in the sustenance of notions like ‘prestige’ (Nature, Science, Cell, The Lancet, etc.) have been known to prefer more sensational positive results. And among researchers that still value publishing in these journals, these papers are more noticed, which leads to a ‘buzz’ that a reporter can pick up on.

Second, sensational results also easily lend themselves to sensational stories in the press, which has often been addicted to the same ‘positivity bias’ that the scientific literature harboured for many decades. In effect, highly cited papers are simply highly visible, and highly visibilised, papers – both to other scientists and journalists.

The press release continues:

The authors add: “Results from this study confirm the idea that media attention given to scientific research is strongly related to scientific citations for that same research. These results can inform scientists who are considering using popular media to increase awareness concerning their work, both within and outside the scientific community.”

I’m not sure what this comment means (I haven’t gone through the paper and it’s possible the paper’s authors discuss this in more detail), but there is already evidence that studies for which preprints are available receive more citations than those published behind a paywall. So perhaps scientists expecting more media coverage of their work should simply make their research more accessible. (It’s also a testament to the extent to which the methods of ‘conventional’ publishers – including concepts like ‘reader pays’ and the journal impact factor, accentuated by notions like ‘prestige’ – have become entrenched that this common-sensical solution is not so common sense.)

On the flip side, journalists also need to be weaned away from ‘top’ journals – I receive a significantly higher number of pitches offering to cover papers published in Nature journals – and retrained to spot interesting results published in less well-known journals as well as, on a slightly separate note, to situate the results of one study in a larger context instead of hyper-focusing on one context-limited set of results.

The work seems interesting, perhaps one of you will like to give it a comb.

The costs of correction

I was slightly disappointed to read a report in the New York Times this morning. Entitled ‘Two Huge COVID-19 Studies Are Retracted After Scientists Sound Alarms’, it discussed the implications of two large studies of COVID-19 recently being retracted by two leading medical journals they were published in, the New England Journal of Medicine and The Lancet. My sentiment stemmed from the following paragraph and some after:

I don’t know if just these two retractions raise troubling questions as if these questions weren’t already being asked well before these incidents. The suggestion that the lack of peer-review, or any form of peer-review at all in its current form (opaque, unpaid) could be to blame is more frustrating, as is the article’s own focus on the quality of the databases used in the two studies instead of the overarching issue. Perhaps this is yet another manifestation of the NYT’s crisis under Trump?

One of the benefits of the preprint publishing system is that peer-review is substituted with ‘open review’. And one of the purposes of preprints is that the authors of a study can collect feedback and suggestions before publishing in a peer-reviewed journal instead of accruing a significant correction cost post-publication, in the form of corrections or retractions, both of which continue to carry a considerable amount of stigma. So as such, the preprints mode ensures a more complete, a more thoroughly reviewed manuscript enters the peer-review system instead of vesting the entire burden of fact-checking and reviewing a paper on a small group of experts whose names and suggestions most journals don’t reveal, and who are generally unpaid for their time and efforts.

Shambles. @TheLancet and @NEJM should immediately submit themselves to an independent inquiry concerning the standard of their editing and internal review processes. https://t.co/TyYTbgeyQn
— Chris Chambers (@chrisdc77) June 4, 2020

In turn, the state of scientific research is fine. It would simply be even better if we reduced the costs associated with correcting the scientific record instead of heaping more penalties on that one moment, as the conventional system of publishing does. ‘Conventional – which in this sphere seems to be another word for ‘closed-off’ – journals also have an incentive to refuse to publish corrections or perform retractions because they’ve built themselves up on claims of being discerning, thorough and reliable. So retractions are a black mark on their record. Elisabeth Bik has often noted how long journals take to even acknowledge entirely legitimate complaints about papers they’ve published, presumably for this reason.

There really shouldn’t be any debate on which system is better – but sadly there is.

Poor journalism is making it harder for preprints

There have been quite a few statements by various scientists on Twitter who, in pointing to some preprint paper’s untenable claims, point to the manuscript’s identity as a preprint paper as well. This is not fair, as I’ve argued many times before. A big part of the problem here is bad journalism. Bad preprint papers are a problem not because their substance is bad but because people who are not qualified to understand why it is bad read it and internalise its conclusions at face value.

There are dozens of new preprint papers uploaded onto arXiv, medRxiv and bioRxiv every week making controversial arguments and/or arriving at far-fetched conclusions, often patronising to the efforts of the subject’s better exponents. Most of them (at least according to what I know of preprints on arXiv) are debated and laid to rest by scientists familiar with the topics at hand. No non-expert is hitting up arXiv or bioRxiv every morning looking for preprints to go crazy on. The ones that become controversial enough to catch the attention of non-experts have, nine times out of then, been amplified to that effect by a journalist who didn’t suitably qualify the preprint’s claims and simply published it. Suddenly, scores (or more) of non-experts have acquired what they think is refined knowledge, and public opinion thereafter goes against the scientific grain.

Acknowledging that this collection of events is a problem on many levels, which particular event would you say is the deeper one?

Some say it’s the preprint mode of publishing, and when asked for an alternative, demand that the use of preprint servers be discouraged. But this wouldn’t solve the problem. Preprint papers are a relatively new development while ‘bad science’ has been published for a long time. More importantly, preprint papers improve public access to science, and preprints that contain good science do this even better.

To making sweeping statements against the preprint publishing enterprise because some preprints are bad is not fair, especially to non-expert enthusiasts (like journalists, bloggers, students) in developing countries, who typically can’t afford the subscription fees to access paywalled, peer-reviewed papers. (Open-access publishing is a solution too but it doesn’t seem to feature in the present pseudo-debate nor does it address important issues that beset itself as well as paywalled papers.)

Even more, if we admitted that bad journalism is the problem, as it really is, we achieve two things: prevent ‘bad science’ from reaching the larger population and retain access to ‘good science’.

Now, to the finer issue of health- and medicine-related preprints: Yes, acting based on the conclusions of a preprint paper – such as ingesting an untested drug or paying too much attention to an irrelevant symptom – during a health crisis in a country with insufficient hospitals and doctors can prove deadlier than usual. But how on Earth could a person have found that preprint paper, read it well enough to understand what it was saying, and act on its conclusions? (Put this way, a bad journalist could be even more to blame for enabling access to a bad study by translating its claims to simpler language.)

Next, a study published in The Lancet claimed – and thus allowed others to claim by reference – that most conversations about the novel coronavirus have been driven by preprint papers. (An article in Ars Technica on May 6 carried this provocative headline, for example: ‘Unvetted science is fuelling COVID-19 misinformation’.) However, the study was based on only 11 papers. In addition, those who invoke this study in support of arguments directed against preprints often fail to mention the following paragraph, drawn from the same paper:

… despite the advantages of speedy information delivery, the lack of peer review can also translate into issues of credibility and misinformation, both intentional and unintentional. This particular drawback has been highlighted during the ongoing outbreak, especially after the high-profile withdrawal of a virology study from the preprint server bioRxiv, which erroneously claimed that COVID-19 contained HIV “insertions”. The very fact that this study was withdrawn showcases the power of open peer-review during emergencies; the withdrawal itself appears to have been prompted by outcry from dozens of scientists from around the globe who had access to the study because it was placed on a public server. Much of this outcry was documented on Twitter and on longer-form popular science blogs, signalling that such fora would serve as rich additional data sources for future work on the impact of preprints on public discourse. However, instances such as this one described showcase the need for caution when acting upon the science put forth by any one preprint.”

The authors, Maimuna Majumder and Kenneth Mandl, have captured the real problem. Lots of preprints are being uploaded every week and quite a few are rotten. Irrespective of how many do or don’t drive public conversations (especially on the social media), it’s disingenuous to assume this risk by itself suffices to cut access.

Instead, as the scientists write, exercise caution. Instead of spoiling a good thing, figure out a way to improve the reporting habits of errant journalists. Otherwise, remember that nothing stops an irresponsible journalist from sensationalising the level-headed conclusions of a peer-reviewed paper either. All it takes is to quote from a grossly exaggerated university press-release and to not consult with an independent expert. Even opposing preprints with peer-reviewed papers only advances a false balance, comparing preprints’ access advantage to peer-review’s gatekeeping advantage (and even that is on shaky ground).

Distracting from the peer-review problem

From an article entitled ‘The risks of swiftly spreading coronavirus research‘ published by Reuters:

A Reuters analysis found that at least 153 studies – including epidemiological papers, genetic analyses and clinical reports – examining every aspect of the disease, now called COVID-19 – have been posted or published since the start of the outbreak. These involved 675 researchers from around the globe. …
Richard Horton, editor-in-chief of The Lancet group of science and medical journals, says he’s instituted “surge capacity” staffing to sift through a flood of 30 to 40 submissions of scientific research a day to his group alone.
… much of [this work] is raw. With most fresh science being posted online without being peer-reviewed, some of the material lacks scientific rigour, experts say, and some has already been exposed as flawed, or plain wrong, and has been withdrawn.
“The public will not benefit from early findings if they are flawed or hyped,” said Tom Sheldon, a science communications specialist at Britain’s non-profit Science Media Centre. …
Preprints allow their authors to contribute to the scientific debate and can foster collaboration, but they can also bring researchers almost instant, international media and public attention.
“Some of the material that’s been put out – on pre-print servers for example – clearly has been… unhelpful,” said The Lancet’s Horton.
“Whether it’s fake news or misinformation or rumour-mongering, it’s certainly contributed to fear and panic.” …
Magdalena Skipper, editor-in-chief of Nature, said her group of journals, like The Lancet’s, was working hard to “select and filter” submitted manuscripts. “We will never compromise the rigour of our peer review, and papers will only be accepted once … they have been thoroughly assessed,” she said.

When Horton or Sheldon say some of the preprints have been “unhelpful” and that they cause panic among the people – which people do they mean? No non-expert person is hitting up bioRxiv looking for COVID-19 papers. They mean some lazy journalists and some irresponsible scientists are spreading misinformation, and frankly their habits represent a more responsible problem to solve instead of pointing fingers at preprints.

The Reuters analysis also says nothing about how well preprint repositories as well as scientists on social media platforms are conducting open peer-review, instead cherry-picking reasons to compose a lopsided argument against greater transparency in the knowledge economy. Indeed, crisis situations like the COVID-19 outbreak often seem to become ground zero for contemplating the need for preprints but really, no one seems to want to discuss “peer-reviewed” disasters like the one recently publicised by Elisabeth Bik. To quote from The Wire (emphasis added),

[Elisabeth] Bik, @SmutClyde, @mortenoxe and @TigerBB8 (all Twitter handles of unidentified persons), report – as written by Bik in a blog post – that “the Western blot bands in all 400+ papers are all very regularly spaced and have a smooth appearance in the shape of a dumbbell or tadpole, without any of the usual smudges or stains. All bands are placed on similar looking backgrounds, suggesting they were copy-pasted from other sources or computer generated.”
Bik also notes that most of the papers, though not all, were published in only six journals: Artificial Cells Nanomedicine and Biotechnology, Journal of Cellular Biochemistry, Biomedicine & Pharmacotherapy, Experimental and Molecular Pathology, Journal of Cellular Physiology, and Cellular Physiology and Biochemistry, all maintained reputed publishers and – importantly – all of them peer-reviewed.

Another controversy, another round of blaming preprints

On February 1, Anand Ranganathan, the molecular biologist more popular as a columnist for Swarajya, amplified a new preprint paper from scientists at IIT Delhi that (purportedly) claims the Wuhan coronavirus’s (2019 nCoV’s) DNA appears to contain some genes also found in the human immunodeficiency virus but not in any other coronaviruses. Ranganathan also chose to magnify the preprint paper’s claim that the sequences’ presence was “non-fortuitous”.

To be fair, the IIT Delhi group did not properly qualify what they meant by the use of this term, but this wouldn’t exculpate Ranganathan and others who followed him: to first amplify with alarmist language a claim that did not deserve such treatment, and then, once he discovered his mistake, to wonder out loud about whether such “non-peer reviewed studies” about “fast-moving, in-public-eye domains” should be published before scientific journals have subjected them to peer-review.

https://twitter.com/ARanganathan72/status/1223444298034630656

https://twitter.com/ARanganathan72/status/1223446546328326144

https://twitter.com/ARanganathan72/status/1223463647143505920

The more conservative scientist is likely to find ample room here to revive the claim that preprint papers only promote shoddy journalism, and that preprint papers that are part of the biomedical literature should be abolished entirely. This is bullshit.

The ‘print’ in ‘preprint’ refers to the act of a traditional journal printing a paper for publication after peer-review. A paper is designated ‘preprint’ if it hasn’t undergone peer-review yet, even though it may or may not have been submitted to a scientific journal for consideration. To quote from an article championing the use of preprints during a medical emergency, by three of the six cofounders of medRxiv, the preprints repository for the biomedical literature:

The advantages of preprints are that scientists can post them rapidly and receive feedback from their peers quickly, sometimes almost instantaneously. They also keep other scientists informed about what their colleagues are doing and build on that work. Preprints are archived in a way that they can be referenced and will always be available online. As the science evolves, newer versions of the paper can be posted, with older historical versions remaining available, including any associated comments made on them.

In this regard, Ranganathan’s ringing the alarm bells (with language like “oh my god”) the first time he tweeted the link to the preprint paper without sufficiently evaluating the attendant science was his decision, and not prompted by the paper’s status as a preprint. Second, the bioRxiv preprint repository where the IIT Delhi document showed up has a comments section, and it was brimming with discussion within minutes of the paper being uploaded. More broadly, preprint repositories are equipped to accommodate peer-review. So if anyone had looked in the comments section before tweeting, they wouldn’t have had reason to jump the gun.

Third, and most important: peer-review is not fool-proof. Instead, it is a legacy method employed by scientific journals to filter legitimate from illegitimate research and, more recently, higher quality from lower quality research (using ‘quality’ from the journals’ oft-twisted points of view, not as an objective standard of any kind).

This framing supports three important takeaways from this little scandal.

A. Much like preprint repositories, peer-reviewed journals also regularly publish rubbish. (Axiomatically, just as conventional journals also regularly publish the outcomes of good science, so do preprint repositories; in the case of 2019 nCoV alone, bioRxiv, medRxiv and SSRN together published at least 30 legitimate and noteworthy research articles.) It is just that conventional scientific journals conduct the peer-review before publication and preprint repositories (and research-discussion platforms like PubPeer), after. And, in fact, conducting the review after allows it to be continuous process able to respond to new information, and not a one-time event that culminates with the act of printing the paper.

But notably, preprint repositories can recreate journals’ ability to closely control the review process and ensure only experts’ comments are in the fray by enrolling a team of voluntary curators. The arXiv preprint server has been successfully using a similar team to carefully eliminate manuscripts advancing pseudoscientific claims. So as such, it is easier to make sure people are familiar with the preprint and post-publication review paradigm than to take advantage of their confusion and call for preprint papers to be eliminated altogether.

B. Those who support the idea that preprint papers are dangerous, and argue that peer-review is a better way to protect against unsupported claims, are by proxy advocating for the persistence of a knowledge hegemony. Peer-review is opaque, sustained by unpaid and overworked labour, and dispenses the same function that an open discussion often does at larger scale and with greater transparency. Indeed, the transparency represents the most important difference: since peer-review has traditionally been the demesne of journals, supporting peer-review is tantamount to designating journals as the sole and unquestionable arbiters of what knowledge enters the public domain and what doesn’t.

(Here’s one example of how such gatekeeping can have tragic consequences for society.)

C. Given these safeguards and perspectives, and as I have written before, bad journalists and bad comments will be bad irrespective of the window through which an idea has presented itself in the public domain. There is a way to cover different types of stories, and the decision to abdicate one’s responsibility to think carefully about the implications of what one is writing can never have a causal relationship with the subject matter. The Times of India and the Daily Mail will continue to publicise every new paper discussing whatever coffee, chocolate and/or wine does to the heart, and The Hindu and The Wire Science will publicise research published in preprint papers because we know how to be careful and of the risks to protect ourselves against.

By extension, ‘reputable’ scientific journals that use pre-publication peer-review will continue to publish many papers that will someday be retracted.

An ongoing scandal concerning spider biologist Jonathan Pruitt offers a useful parable – that journals don’t always publish bad science due to wilful negligence or poor peer-review alone but that such failures still do well to highlight the shortcomings of the latter. A string of papers the work on which Pruitt led were found to contain implausible data in support of some significant conclusions. Dan Bolnick, the editor of The American Naturalist, which became the first journal to retract Pruitt’s papers that it had published, wrote on his blog on January 30:

I want to emphasise that regardless of the root cause of the data problems (error or intent), these people are victims who have been harmed by trusting data that they themselves did not generate. Having spent days sifting through these data files I can also attest to the fact that the suspect patterns are often non-obvious, so we should not be blaming these victims for failing to see something that requires significant effort to uncover by examining the data in ways that are not standard for any of this. … The associate editor [who Bolnick tasked with checking more of Pruitt’s papers] went as far back as digging into some of Pruitt’s PhD work, when he was a student with Susan Riechert at the University of Tennessee Knoxville. Similar problems were identified in those data… Seeking an explanation, I [emailed and then called] his PhD mentor, Susan Riechert, to discuss the biology of the spiders, his data collection habits, and his integrity. She was shocked, and disturbed, and surprised. That someone who knew him so well for many years could be unaware of this problem (and its extent), highlights for me how reasonable it is that the rest of us could be caught unaware.

Why should we expect peer-review – or any kind of review, for that matter – to be better? The only thing we can do is be honest, transparent and reflexive.

To see faces where there are none

This week in “neither university press offices nor prestigious journals know what they’re doing”: a professor emeritus at Ohio University who claimed he had evidence of life on Mars, and whose institution’s media office crafted a press release without thinking twice to publicise his ‘findings’, and the paper that Nature Medicine published in 2002, cited 900+ times since, that has been found to contain multiple instances of image manipulation.

I’d thought the professor’s case would remain obscure because it’s evidently crackpot but this morning, articles from Space.com and Universe Today showed up on my Twitter setting the record straight: that the insects the OU entomologist had found in pictures of Mars taken by the Curiosity rover were just artefacts of his (insectile) pareidolia. Some people have called this science journalism in action but I’d say it’s somewhat offensive to check if science journalism still works by gauging its ability, and initiative, to countering conspiracy theories, the lowest of low-hanging fruit.

The press release, which has since been taken down. Credit: EurekAlert and Wayback Machine

The juicier item on our plate is the Nature Medicine paper, the problems in which research integrity super-sleuth Elisabeth Bik publicised on November 21, and which has a science journalism connection as well.

Remember the anti-preprints article Nature News published in July 2018? Its author, Tom Sheldon, a senior press manager at the Science Media Centre, London, argued that preprints “promoted confusion” and that journalists who couldn’t bank on peer-reviewed work ended up “misleading millions”. In other words, it would be better if we got rid of preprints and journalists deferred only to the authority of peer-reviewed papers curated and published by journals, like Nature. Yet here we are today, with a peer-reviewed manuscript published in Nature Medicine whose checking process couldn’t pick up on repetitive imagery. Is this just another form of pareidolia, to see a sensational result – knowing prestigious journals’ fondness for such results – where there was actually none?

(And before you say this is just one paper, read this analysis: “… data from several lines of evidence suggest that the methodological quality of scientific experiments does not increase with increasing rank of the journal. On the contrary, an accumulating body of evidence suggests the inverse: methodological quality and, consequently, reliability of published research works in several fields may be decreasing with increasing journal rank.” Or this extended critique of peer-review on Vox.)

This isn’t an argument against the usefulness, or even need for, peer-review, which remains both useful and necessary. It’s an argument against ludicrous claims that peer-review is infallible, advanced in support of the even more ludicrous argument that preprints should be eliminated to enable good journalism.

The case for preprints

Daniel Mansur, the principal investigator of a lab at the Universidade Federal de Santa Catarina that studies how cells respond to viruses, had this to say about why preprints are useful in an interview to eLife:

Let’s say the paper that we put in a preprint is competing with someone and we actually have the same story, the same set of data. In a journal, the editors might ask both groups for exactly the same sets of extra experiments. But then, the other group that’s competing with me works at Stanford or somewhere like that. They’ll order everything they need to do the experiments, and the next day three postdocs will be working on the project. If there’s something that I don’t have in the lab, I have to wait six months before starting the extra experiments. At least with a preprint the work might not be complete, but people will know what we did.

Preprints level the playing field by eliminating one’s “ability to publish” in high-IF journals as a meaningful measure of the quality of one’s work.

While this makes it easier for scientists to compete with their better-funded peers, my indefatigable cynicism suggests there must be someone out there who’s unhappy about this. Two kinds of people come immediately to mind: journal publishers and some scientists at highfalutin universities like Stanford.

Titles like Nature, Cell, New England Journal of Medicine and Science, and especially those published by the Elsevier group, have ridden the impact factor (IF) wave to great profit through many decades. In fact, IF continues to be the dominant mode of evaluation of research quality because it’s easy and not time-consuming, so – given how IF is defined – these journals continue to be important for being important. They also provide a valuable service – the double-blind peer review, which Mansur thinks is the only thing preprints are currently lacking in. But other than that (and with post-publication peer-review being largely suitable), their time of obscene profits is surely running out.

The pro-preprint trend in scientific publishing is also bound to have jolted some scientists whose work received a leg-up by virtue of their membership in elite faculty groups. Like Mansur says, a scientist from Stanford or a similar institution can no longer claim primacy, or uniqueness, by default. As a result, preprints definitely improve the forecast for good scientists working at less-regarded institutions – but an equally important consideration would be whether preprints also diminish the lure of fancy universities. They do have one less thing to offer now, or at least in the future.