The problem with rooting for science

The idea that trusting in science involves a lot of faith, instead of reason, is lost on most people. More often than not, as a science journalist, I encounter faith through extreme examples – such as the Bloch sphere (used to represent the state of a qubit) or wave functions (‘mathematical objects’ used to understand the evolution of certain simple quantum systems). These and other similar concepts require years of training in physics and mathematics to understand. At the same time, science writers are often confronted with the challenge of making these concepts sensible to an audience that seldom has this training.

More importantly, how are science writers to understand them? They don’t. Instead, they implicitly trust scientists they’re talking to to make sense. If I know that a black hole curves spacetime to such an extent that pairs of virtual particles created near its surface are torn apart – one particle entering the black hole never to exit and the other sent off into space – it’s not because I’m familiar with the work of Stephen Hawking. It’s because I read his books, read some blogs and scientific papers, spoke to physicists, and decided to trust them all. Every science journalist, in fact, has a set of sources they’re likely to trust over others. I even place my faith in some people over others, based on factors like personal character, past record, transparency, reflexivity, etc., so that what they produce I take only with the smallest pinch of salt, and build on their findings to develop my own. And this way, I’m already creating an interface between science and society – by matching scientific knowledge with the socially developed markers of reliability.

I choose to trust those people, processes and institutions that display these markers. I call this an act of faith for two reasons: 1) it’s an empirical method, so to speak; there is no proof in theory that such ‘matching’ will always work; and 2) I believe it’s instructive to think of this relationship as being mediated by faith if only to amplify its anti-polarity with reason. Most of us understand science through faith, not reason. Even scientists who are experts on one thing take the word of scientists on completely different things, instead of trying to study those things themselves (see ad verecundiam fallacy).

Sometimes, such faith is (mostly) harmless, such as in the ‘extreme’ cases of the Bloch sphere and the wave function. It is both inexact and incomplete to think that quantum superposition means an object is in two states at once. The human brain hasn’t evolved to cognate superposition exactly; this is why physicists use the language of mathematics to make sense of this strange existential phenomenon. The problem – i.e. the inexactitude and the incompleteness – arises when a communicator translates the mathematics to a metaphor. Equally importantly, physicists are describing whereas the rest of us are thinking. There is a crucial difference between these activities that illustrates, among other things, the fundamental incompatibility between scientific research and science communication that communicators must first surmount.

As physicists over the past three or four centuries have relied increasingly on mathematics rather than the word to describe the world, physics, like mathematics itself, has made a “retreat from the word,” as literary scholar George Steiner put it. In a 1961 Kenyon Review article, Steiner wrote, “It is, on the whole, true to say that until the seventeenth century the predominant bias and content of the natural sciences were descriptive.” Mathematics used to be “anchored to the material conditions of experience,” and so was largely susceptible to being expressed in ordinary language. But this changed with the advances of modern mathematicians such as Descartes, Newton, and Leibniz, whose work in geometry, algebra, and calculus helped to distance mathematical notation from ordinary language, such that the history of how mathematics is expressed has become “one of progressive untranslatability.” It is easier to translate between Chinese and English — both express human experience, the vast majority of which is shared — than it is to translate advanced mathematics into a spoken language, because the world that mathematics expresses is theoretical and for the most part not available to our lived experience.

Samuel Matlack, ‘Quantum Poetics’, The New Atlantic, 2017

However, the faith becomes more harmful the further we move away from the ‘extreme’ examples – of things we’re unlikely to stumble on in our daily lives – and towards more commonplace ideas, such as ‘how vaccines work’ or ‘why GM foods are not inherently bad’. The harm emerges from the assumption that we think we know something when in fact we’re in denial about how it is that we know that thing. Many of us think it’s reason; most of the time it’s faith. Remember when, in Friends, Monica Geller and Chandler Bing ask David the Scientist Guy how airplanes fly, and David says it has to do with Bernoulli’s principle and Newton’s third law? Monica then turns to Chandler with a knowing look and says, “See?!” To which Chandler says, “Yeah, that’s the same as ‘it has something to do with wind’!”

The harm is to root for science, to endorse the scientific enterprise and vest our faith in its fruits, without really understanding how these fruits are produced. Such understanding is important for two reasons.

First, if we trust scientists, instead of presuming to know or actually knowing that we can vouch for their work. It would be vacuous to claim science is superior in any way to another enterprise that demands our faith when science itself also receives our faith. Perhaps more fundamentally, we like to believe that science is trustworthy because it is evidence-based and it is tested – but the COVID-19 pandemic should have clarified, if it hasn’t already, the continuous (as opposed to discrete) nature of scientific evidence, especially if we also acknowledge that scientific progress is almost always incremental. Evidence can be singular and thus clear – like a new avian species, graphene layers superconducting electrons or tuned lasers cooling down atoms – or it can be necessary but insufficient, and therefore on a slippery slope – such as repeated genetic components in viral RNA, a cigar-shaped asteroid or water shortage in the time of climate change.

Physicists working with giant machines to spot new particles and reactions – all of which are detected indirectly, through their imprints on other well-understood phenomena – have two important thresholds for the reliability of their findings: if the chance of X (say, “spotting a particle of energy 100 GeV”) being false is 0.27%, it’s good enough to be evidence; if the chance of X being false is 0.00006%, then it’s a discovery (i.e., “we have found the particle”). But at what point can we be sure that we’ve indeed found the particle we were looking for if the chance of being false will never reach 0%? One way, for physicists specifically, is to combine the experiment’s results with what they expect to happen according to theory; if the two match, it’s okay to think that even a less reliable result will likely be borne out. Another possibility (in the line of Karl Popper’s philosophy) is that a result expected to be true, and is subsequently found to be true, is true until we have evidence to the contrary. But as suitable as this answer may be, it still doesn’t neatly fit the binary ‘yes’/’no’ we’re used to, and which we often expect from scientific endeavours as well (see experience v. reality).

(Minor detour: While rational solutions are ideally refutable, faith-based solutions are not. Instead, the simplest way to reject their validity is to use extra-scientific methods, and more broadly deny them power. For example, if two people were offering me drugs to suppress the pain of a headache, I would trust the one who has a state-sanctioned license to practice medicine and is likely to lose that license, even temporarily, if his prescription is found to have been mistaken – that is, by asserting the doctor as the subject of democratic power. Axiomatically, if I know that Crocin helps manage headaches, it’s because, first, I trusted the doctor who prescribed it and, second, Crocin has helped me multiple times before, so empirical experience is on my side.)

Second, if we don’t know how science works, we become vulnerable to believing pseudoscience to be science as long as the two share some superficial characteristics, like, say, the presence and frequency of jargon or a claim’s originator being affiliated with a ‘top’ institute. The authors of a scientific paper to be published in a forthcoming edition of the Journal of Experimental Social Psychology write:

We identify two critical determinants of vulnerability to pseudoscience. First, participants who trust science are more likely to believe and disseminate false claims that contain scientific references than false claims that do not. Second, reminding participants of the value of critical evaluation reduces belief in false claims, whereas reminders of the value of trusting science do not.

(Caveats: 1. We could apply the point of this post to this study itself; 2. I haven’t checked the study’s methods and results with an independent expert, and I’m also mindful that this is psychology research and that its conclusions should be taken with salt until independent scientists have successfully replicated them.)

Later from the same paper:

Our four experiments and meta-analysis demonstrated that people, and in particular people with higher trust in science (Experiments 1-3), are vulnerable to misinformation that contains pseudoscientific content. Among participants who reported high trust in science, the mere presence of scientific labels in the article facilitated belief in the misinformation and increased the probability of dissemination. Thus, this research highlights that trust in science ironically increases vulnerability to pseudoscience, a finding that conflicts with campaigns that promote broad trust in science as an antidote to misinformation but does not conflict with efforts to install trust in conclusions about the specific science about COVID-19 or climate change.

In terms of the process, the findings of Experiments 1-3 may reflect a form of heuristic processing. Complex topics such as the origins of a virus or potential harms of GMOs to human health include information that is difficult for a lay audience to comprehend, and requires acquiring background knowledge when reading news. For most participants, seeing scientists as the source of the information may act as an expertise cue in some conditions, although source cues are well known to also be processed systematically. However, when participants have higher levels of methodological literacy, they may be more able to bring relevant knowledge to bear and scrutinise the misinformation. The consistent negative association between methodological literacy and both belief and dissemination across Experiments 1-3 suggests that one antidote to the influence of pseudoscience is methodological literacy. The meta-analysis supports this.

So rooting for science per se is not just not enough, it could be harmful vis-à-vis the public support for science itself. For example (and without taking names), in response to right-wing propaganda related to India’s COVID-19 epidemic, quite a few videos produced by YouTube ‘stars’ have advanced dubious claims. They’re not dubious at first glance, if also because they purport to counter pseudoscientific claims with scientific knowledge, but they are – either for insisting a measure of certainty in the results that neither exist nor are achievable, or for making pseudoscientific claims of their own, just wrapped up in technical lingo so they’re more palatable to those supporting science over critical thinking. Some of these YouTubers, and in fact writers, podcasters, etc., are even blissfully unaware of how wrong they often are. (At least one of them was also reluctant to edit a ‘finished’ video to make it less sensational despite repeated requests.)

Now, where do these ideas leave (other) science communicators? In attempting to bridge a nearly unbridgeable gap, are we doomed to swing only between most and least unsuccessful? I personally think that this problem, such as it is, is comparable to Zeno’s arrow paradox. To use Wikipedia’s words:

He states that in any one (duration-less) instant of time, the arrow is neither moving to where it is, nor to where it is not. It cannot move to where it is not, because no time elapses for it to move there; it cannot move to where it is, because it is already there. In other words, at every instant of time there is no motion occurring. If everything is motionless at every instant, and time is entirely composed of instants, then motion is impossible.

To ‘break’ the paradox, we need to identify and discard one or more primitive assumptions. In the arrow paradox, for example, one could argue that time is not composed of a stream of “duration-less” instants, that each instant – no matter how small – encompasses a vanishingly short but not nonexistent passage of time. With popular science communication (in the limited context of translating something that is untranslatable sans inexactitude and/or incompleteness), I’d contend the following:

  • Awareness: ‘Knowing’ and ‘knowing of’ are significantly different and, I hope, self-explanatory also. Example: I’m not fluent with the physics of cryogenic engines but I’m aware that they’re desirable because liquefied hydrogen has the highest specific impulse of all rocket fuels.
  • Context: As I’ve written before, a unit of scientific knowledge that exists in relation to other units of scientific knowledge is a different object from the same unit of scientific knowledge existing in relation to society.
  • Abstraction: 1. perfect can be the enemy of the good, and imperfect knowledge of an object – especially a complicated compound one – can still be useful; 2. when multiple components come together to form a larger entity, the entity can exhibit some emergent properties that one can’t derive entirely from the properties of the individual components. Example: one doesn’t have to understand semiconductor physics to understand what a computer does.

An introduction to physics that contains no equations is like an introduction to French that contains no French words, but tries instead to capture the essence of the language by discussing it in English. Of course, popular writers on physics must abide by that constraint because they are writing for mathematical illiterates, like me, who wouldn’t be able to understand the equations. (Sometimes I browse math articles in Wikipedia simply to immerse myself in their majestic incomprehensibility, like visiting a foreign planet.)

Such books don’t teach physical truths; what they teach is that physical truth is knowable in principle, because physicists know it. Ironically, this means that a layperson in science is in basically the same position as a layperson in religion.

Adam Kirsch, ‘The Ontology of Pop Physics’, Tablet Magazine, 2020

But by offering these reasons, I don’t intend to over-qualify science communication – i.e. claim that, given enough time and/or other resources, a suitably skilled science communicator will be able to produce a non-mathematical description of, say, quantum superposition that is comprehensible, exact and complete. Instead, it may be useful for communicators to acknowledge that there is an immutable gap between common English (the language of modern science) and mathematics, beyond which scientific expertise is unavoidable – in much the same way communicators must insist that the farther the expert strays into the realm of communication, the closer they’re bound to get to a boundary beyond which they must defer to the communicator.

Priggish NEJM editorial on data-sharing misses the point it almost made

Twitter outraged like only Twitter could on January 22 over a strange editorial that appeared in the prestigious New England Journal of Medicine, calling for medical researchers to not make their research data public. The call comes at a time when the scientific publishing zeitgeist is slowly but surely shifting toward journals requiring, sometimes mandating, the authors of studies to make their data freely available so that their work can be validated by other researchers.

Through the editorial, written by Dan Longo and Jeffrey Drazen, both doctors and the latter the chief editor, NEJM also cautions medical researchers to be on the lookout for ‘research parasites’, a coinage that the journal says is befitting “of people who had nothing to do with the design and execution of the study but use another group’s data for their own ends, possibly stealing from the research productivity planned by the data gatherers, or even use the data to try to disprove what the original investigators had posited”. As @omgItsEnRIz tweeted, do the authors even science?

https://twitter.com/martibartfast/status/690503478813261824

The choice of words is more incriminating than the overall tone of the text, which also tries to express the more legitimate concern of replicators not getting along with the original performers. However, by saying that the ‘parasites’ may “use the data to try to disprove what the original investigators had posited”, NEJM has crawled into an unwise hole of infallibility of its own making.

In October 2015, a paper published in the Journal of Experimental Psychology pointed out why replication studies are probably more necessary than ever. The misguided publish-or-perish impetus of scientific research, together with publishing in high impact-factor journals being lazily used as a proxy for ‘good research’ by many institutions, has led researchers to hack their results – i.e. prime them (say, by cherry-picking) so that the study ends up reporting sensational results when, really, duller ones exist.

The JEP paper had a funnel plot to demonstrate this. Quoting from the Neuroskeptic blog, which highlighted the plot when the paper was published, “This is a funnel plot, a two-dimensional scatter plot in which each point represents one previously published study. The graph plots the effect size reported by each study against the standard error of the effect size – essentially, the precision of the results, which is mostly determined by the sample size.” Note: the y-axis is running top-down.

funnel_shanks1

The paper concerned itself with 43 previously published studies discussing how people’s choices were perceived to change when they were gently reminded about sex.

As Neuroskeptic goes on to explain, there are three giveaways in this plot. One is obvious – that the distribution of replication studies is markedly separated from that of the original studies. Second: the least precise results from the original studies worked with the larger sample sizes. Third: the original studies all seemed to “hug” the outer edge of the grey triangles, which represents a statistical measure responsible for indicating if some results are reliable. The uniform ‘hugging’ is an indication that all those original studies were likely guilty of cherry-picking from their data to conclude with results that are just about reliable, an act called ‘p-hacking’.

A line of research can appear to progress rapidly but without replication studies it’s difficult to establish if the progress is meaningful for science – a notion famously highlighted by John Ioannidis, a professor of medicine and statistics at Stanford University, in his two landmark papers in 2005 and 2014. Björn Brembs, a professor of neurogenetics at the Universität Regensburg, Bavaria, also pointed out how the top journals’ insistence on sensational results could result in a congregation of unreliability. Together with a conspicuous dearth of systematically conducted replication studies, this ironically implies that the least reliable results are often taken the most seriously thanks to the journals they appear in.

The most accessible sign of this is a plot between the retraction index and the impact factor of journals. The term ‘retraction index’ was coined in the same paper in which the plot first appeared; it stands for “the number of retractions in the time interval from 2001 to 2010, multiplied by 1,000, and divided by the number of published articles with abstracts”.

Impact factor of journals plotted against the retraction index. The highest IF journals – Nature, Cell and Science – are farther along the trend line than they should be. Source: doi: 10.1128/IAI.05661-11
Impact factor of journals plotted against the retraction index. The highest IF journals – Nature, Cell and Science – are farther along the trend line than they should be. Source: doi: 10.1128/IAI.05661-11

Look where NEJM is. Enough said.

The journal’s first such supplication appeared in 1997, then writing against pre-print copies of medical research papers becoming available and easily accessible – á la the arXiv server for physics. Then, the authors, again two doctors, wrote, “medicine is not physics: the wide circulation of unedited preprints in physics is unlikely to have an immediate effect on the public’s well-being even if the material is biased or false. In medicine, such a practice could have unintended consequences that we all would regret.” Though a reasonable PoV, the overall tone appeared to stand against the principles of open science.

More importantly, both editorials, separated by almost two decades, make one reasonable argument that sadly appears to make sense to the journal only in the context of a wider set of arguments, many of them contemptible. For example, Drazen seems to understand the importance of data being available for studies to be validated but has differing views on different kinds of data. Two days before his editorial was published, another appeared co-authored by 16 medical researchers – Drazen one of them – in the same journal, this time calling for anonymised patient data from clinical trials being made available to other researchers because it would “increase confidence and trust in the conclusions drawn from clinical trials. It will enable the independent confirmation of results, an essential tenet of the scientific process.”

(At the same time, the editorial also says, “Those using data collected by others should seek collaboration with those who collected the data.”)

For another example, NEJM labours under the impression that the data generated by medical experiments will not ever be perfectly communicable to other researchers who were not involved in the generation of it. One reason it provides is that discrepancies in the data between the original group and a new group could arise because of subtle choices made by the former in the selection of parameters to evaluate. However, the solution doesn’t lie in the data being opaque altogether.

A better way to conduct replication studies

An instructive example played out in May 2014, when the journal Social Psychology published a special issue dedicated to replication studies. The issue contained both successful and failed attempts at replicating some previously published results, and the whole process was designed to eliminate biases as much as possible. For example, the journal’s editors Brian Nosek and Daniel Lakens didn’t curate replication studies but instead registered the studies before they were performed so that their outcomes would be published irrespective of whether they turned out positive or negative. For another, all the replications used the same experimental and statistical techniques as in the original study.

One scientist who came out feeling wronged by the special issue was Simone Schnall, the director of the Embodied Cognition and Emotion Laboratory at Cambridge University. The results of a paper co-authored by Schnall in 2008 hadfailed to be replicated, but she believed there had been a mistake in the replication that, when corrected, would corroborate her group’s findings. However, her statements were quickly and widely interpreted to mean she was being a “sore loser”. In one blog, her 2008 findings were called an “epic fail” (though the words were later struck out).

This was soon followed a rebuttal by Schnall, followed by a counter by the replicators, and then Schnall writing two blog posts (here and here). Over time, the core issue became how replication studies were conducted – who performed the peer review, the level of independence the replicators had, the level of access the original group had, and how journals could be divorced from having a choice about which replication studies to publish. But relevant to the NEJM context, the important thing was the level of transparency maintained by Schnall & co. as well as the replicators, which provided a sheen of honesty and legitimacy to the debate.

The Social Psychology issue was able to take the conversation forward, getting authors to talk about the psychology of research reporting. There have been few other such instances – of incidents exploring the proper mechanisms of replication studies – so if the NEJM editorial had stopped itself with calling for better organised collaborations between a study’s original performers and its replicators, it would’ve been great. As Longo and Drazen concluded, “How would data sharing work best? We think it should happen symbiotically … Start with a novel idea, one that is not an obvious extension of the reported work. Second, identify potential collaborators whose collected data may be useful in assessing the hypothesis and propose a collaboration. Third, work together to test the new hypothesis. Fourth, report the new findings with relevant coauthorship to acknowledge both the group that proposed the new idea and the investigative group that accrued the data that allowed it to be tested.”

https://twitter.com/significantcont/status/690507462848450560

The mistake lies in thinking anything else would be parasitic. And the attitude affects not just other scientists but some science communicators as well. Any journalist or blogger who has been reporting on a particular beat for a while stands to become a ‘temporary expert‘ on the technical contents of that beat. And with exploratory/analytical tools like R – which is easier than you think to pick up – the communicator could dig deeper into the data, teasing out issues more relevant to their readers than what the accompanying paper thinks is the highlight. Sure, NEJM remains apprehensive about how medical results could be misinterpreted to terrible consequence. But the solution there would be for the communicators to be more professional and disciplined, not for the journal to be more opaque.

The Wire
January 24, 2016

Psych of Science: Hello World

Hello, world. 🙂 I’m filing this post under a new category on Is Nerd called Psych of Science. A dull name but it’ll do. This category will host my personal reflections on the science in the stories I’ve written or read and, more importantly, of the people in those stories.

I decided to create this category after the Social Psychology replications incident. While it was not a seminal episode, reading and understanding the kind of issues faced by authors of the original paper and the replicators really got me thinking about the psychology of science. It wasn’t an eye-opening incident but I was surprised by how interested I was in how the conversation was going to play out.

Admittedly, I’m a lousy people person, and that especially comes across in my writing. I’ve always been interested in understanding how things work, not how people work. This is a discrepancy I hope will be fixed during my stint at NYU, which I’m slated to attend this fall (2014). In the meantime, and after if I get the time, I’ll leave my reflections here, and you’re welcome to add to it, too.

Replication studies, ceiling effects, and the psychology of science

On May 25, I found Erika Salomon’s tweet:

The story started when the journal Social Psychology decided to publish successful and failed replication attempts instead of conventional papers and their conclusions for a Replications Special Issue (Volume 45, Number 3 / 2014). It accepted proposals from scientists stating which studies they wanted to try to replicate, and registered the accepted ones. This way, the journal’s editors Brian Nosek and Daniel Lakens could ensure that a study was published no matter the outcome – successful or not.

All the replication studies were direct replication studies, which means they used the same experimental procedure and statistical methods to analyze the data. And before the replication attempt began, the original data, procedure and analysis methods were scrutinized, and the data was shared with the replicating group. Moreover, an author of the original paper was invited to review the respective proposals and have a say in whether the proposal could be accepted. So much is pre-study.

Finally, the replication studies were performed, and had their results published.


The consequences of failing to replicate a study

Now comes the problem: What if the second group failed to replicate the findings of the first group? There are different ways of looking at this from here on out. The first person such a negative outcome affects is the original study’s author, whose reputation is at stake. Given the gravity of the situation, is the original author allowed to ask for a replication of the replication?

Second, during the replication study itself (and given the eventual negative outcome), how much of a role is the original author allowed to play when performing the experiment, analyzing the results and interpreting them? This could swing both ways. If the original author is allowed to be fully involved during the analysis process, there will be a conflict of interest. If the original author is not allowed to participate in the analysis, the replicating group could get biased toward a negative outcome for various reasons.

Simone Schnall, a psychology researcher from Cambridge writes on the SPSP blog (linked to in the tweet above) that, as an author of a paper whose results have been unsuccessfully replicated and reported in the Special Issue, she feels “like a criminal suspect who has no right to a defense and there is no way to win: The accusations that come with a “failed” replication can do great damage to my reputation, but if I challenge the findings I come across as a “sore loser.””

People on both sides of this issue recognize the importance of replication studies; there’s no debate there. But the presence of these issues calls into question how replication studies are designed, reviewed and published, with a just as firm support structure, or they all suffer the risk of becoming personalized. Forget who replicates the replicators, it could just as well become who bullies the bullies. And in the absence of such rules, replication studies are becoming actively disincentivized. Simone Schnall acceded to a request to replicate her study, but the fallout could set a bad example.

During her commentary, Schnall links to a short essay by Princeton University psychologist Daniel Kahneman titled ‘A New Etiquette for Replication‘. In the piece, Kahneman writes, “… tension is inevitable when the replicator does not believe the original findings and intends to show that a reported effect does not exist. The relationship between replicator and author is then, at best, politely adversarial. The relationship is also radically asymmetric: the replicator is in the offense, the author plays defense.”

In this blog post by one of the replicators, the phrase “epic fail” is an example of how things could be personalized. Note: the author of the post has struck out the words and apologized.

In order to eliminate these issues, the replicators could be asked to keep things specific. Various stakeholders have suggested different ways to resolve this issue. For one, replicators should address the questions and answers raised in the original study instead of the author and her/his credentials. Another way is to regularly publish reports of replication results instead of devoting a special issue to it, and make them part of the scientific literature.

This is one concern that Schnall raises in her answers (in response to question #13):”I doubt anybody would have widely shared the news had the replication been considered “successful.”” So there’s a need to address a bias here: are journals likelier to publish replication studies that fail to replicate previous results? Erasing this bias requires publishers to actively incentivize replication studies.

A paper published in Perspectives on Psychological Science in 2012 paints a slightly different picture. It looks at the number of replication studies published in the field and pegs the replication rate at 1.07%. Despite the low rate, one of the paper’s conclusions was that among all published replication studies, most of them reported successful, not unsuccessful, replications. It also notes that since 2000, among all replication studies published, the fraction reporting successful outcomes stands at 69.4%, and that reporting unsuccessful outcomes at 11.8%.

chart_1
Sorry about the lousy resolution. Click on the chart for a better view.

At the same time, Nosek and Lakens concede in this editorial that, “In the present scientific culture, novel and positive results are considered more publishable than replications and negative results.”


The ceiling effect

Schnall does raise many questions about the replication, including alleging the presence of a ceiling effect. As she describes it (in response to question #8):

“Imagine two people are speaking into a microphone and you can clearly understand and distinguish their voices. Now you crank up the volume to the maximum. All you hear is this high-pitched sound (“eeeeee”) and you can no longer tell whether the two people are saying the same thing or something different. Thus, in the presence of such a ceiling effect it would seem that both speakers were saying the same thing, namely “eeeeee”.

The same thing applies to the ceiling effect in the replication studies. Once a majority of the participants are giving extreme scores, all differences between two conditions are abolished. Thus, a ceiling effect means that all predicted differences will be wiped out: It will look like there is no difference between the two people (or the two experimental conditions).”

She states this as an important reason to get the replicators’ results replicated.


My opinions

// Because Schnall thinks the presence of a ceiling effect is a reason to have the replicators’ results replicated, it implies that there could be a problem with the method used to evaluate the authors’ hypothesis. Both the original and the replication studies used the same method, and the emergence of an effect in one of them but not the other implies the “fault”, if that, could lie with the replicator – for improperly performing the experiment – or with the original author – for choosing an inadequate set-up to verify the hypothesis. Therefore, one thing that Schnall felt strongly about, the scrutiny of her methods, should also have been formally outlined, i.e. a replication study is not just about the replication of results but about the replication of methods as well.

// Because both papers have passed scrutiny and have been judged worthy of publication, it makes sense to treat them as individual studies in their own right instead of one being a follow up to the other (even though technically that’s what they are), and to consider both together instead of selecting one over the other – especially in terms of the method. This sort of debate gives room for Simone Schnall to publish an official commentary in response to the replication effort and make the process inclusive. In some sense, I think this is also the sort of debate that Ivan Oransky and Adam Marcus think scientific publishing should engender.

// Daniel Lakens explains in a comment on the SPSP blog that there was peer-review of the introduction, method, and analysis plan by the original authors and not an independent group of experts. This was termed “pre-data peer review”: a review of the methods and not the numbers. It is unclear to what extent this was sufficient because it’s only with a scrutiny of the numbers does any ceiling effect become apparent. While post-publication peer-review can check for this, it’s not formalized (at least in this case) and does little to mitigate Schnall’s situation.

// Schnall’s paper was peer-reviewed. The replicators’ paper was peer-reviewed by Schnall et al. Even if both passed the same level of scrutiny, they didn’t pass the same type of it. On this basis, there might be reason for Schnall to be involved with the replication study. Ideally, however, it would have been better if the replication was better formulated, with normal peer-review, in order to eliminate Schnall’s interference. Apart from the conflict of interest that could arise, a replication study needs to be fully independent to make it credible, just like the peer-review process is trusted to be credible because it is independent. So while it is commendable that Schnall shared all the details of her study, it should have been made possible for her participation to end there.

// While I’ve disagreed with Kahneman over the previous point, I do agree with point #3 in his essay that describes the new etiquette: “The replicator is not obliged to accept the author’s suggestions [about the replicators’ M.O.], but is required to provide a full description of the final plan. The reasons for rejecting any of the author’s suggestions must be explained in detail.” [Emphasis mine]

I’m still learning about this fascinating topic, so if I’ve made mistakes in interpretations, please point them out.


Featured image: shutterstock/(c)Sunny Forest