Publication Bias in Medical Informatics

This issue of JAMIA includes two articles submitted in response to a special call, issued on February 29, 2000, for papers reporting “null, negative, or disappointing results.” The motivation for this call was a working hypothesis that medical informatics, like other fields, is afflicted with the academic malady known as publication bias.

Publication bias exists when well-executed studies with null results (no significant differences between groups), negative results (favoring the control or placebo arm of the study), or disappointing results (effects that may be positive but of little practical significance) do not find their way into the archival literature. Publication bias therefore skews the archival literature toward work with positive findings. Negative studies have been shown to be 2.6 times less likely than positive studies to reach publication, which creates the potential to distort conclusions drawn from systematic reviews and meta-analyses.¹ This phenomenon may have multiple causes—manuscripts may not be recommended for publication by reviewers and editors who see negative or null results as scientifically unimportant, and investigators may not write up such results because they are seen as unlikely to be published—or perhaps embarrassing if they are. Potential embarrassment may be a potent deterrent to informatics researchers, because many investigators also hold administrative positions that might be compromised if these individuals' own studies revealed negative or ambiguous effects of expensive technology that they themselves built or purchased.

In response to the call for manuscripts, we received six letters of intent and four completed manuscripts. After review and revision, two manuscripts addressing complementary topics were accepted and appear in this issue. The first, by Patterson and Harasym,² examines the impact of an educational program for medical students; the second, by Rocha and colleagues,³ explores the impact of a clinical decision support system on clinicians.

As the process of soliciting, reviewing, and working with authors to revise these manuscripts unfolded, we began to understand that the seemingly simple concept of a negative study was anything but simple. It is, in fact, a concept laden with subtlety, nuance, and cultural overtones. Using the two published manuscripts as examples, we will examine some of these complexities and the important issues they raise for our field.

This exploration begins with our collective value orientation. When we in informatics embark on a study not yet undertaken, or begin reading a newly published study, our values direct us to believe that “the system will work.” We hold this belief because we see the work we do as a fundamental good. In other words, we in informatics are really part scientists, part innovators. As scientists, we try to approach our work dispassionately, believing that just as much can be learned from our results when they are negative as when they are positive. As innovators, we are ideologues who believe that, if we do everything right, our interventions should yield benefits. A well-designed system that is installed impeccably and studied flawlessly is one that, by definition, will yield a beneficial effect. To the extent that we are driven by the values of the innovator, well-executed negative studies should not exist. If a study yields null or negative results, the authors must have done something wrong in designing their information resource, in implementing it, or in conducting the study itself.

Further complicating this landscape is the prerogative of the researcher to select the hypotheses or research questions that become the foci of his or her study. It is never possible to address all questions of interest, so this selection process is key to what a study will reveal. When the study results are positive, it is possible that other research questions not explored would have generated negative results; and when results are negative, other questions not explored might have generated positive results. Moreover, once the research questions are selected for a study, researchers can choose among a wide range of methods for addressing these questions. This suggests another source of bias, apart from the review process, that will skew what is reported in the literature. This occurs as researchers, guided implicitly by their values, ask questions and select methods that will cast in a more positive light the systems they study. This practice, to the extent it occurs, has nothing whatsoever to do with scientific fraud or misconduct. It addresses prerogatives that have always been assigned to investigators and how, well within the bounds of accepted conduct, these prerogatives are exercised.

Such issues of values and prerogative give more intricate shape to the notion of a “negative study” in informatics. Our initial notion was that negative results could occur only when a well-designed system was deployed impeccably and then examined comprehensively, with results suggesting no beneficial effect. But, as discussed above, this pristine notion of a negative study runs counter to our field's ideology and how investigators, driven by this ideology, may approach their work. No manuscript meeting this strict definition is likely ever to be written or submitted. So for purposes of this issue, we adopted over time a more relaxed definition of a negative study—specifically, a study of an interesting intervention that is well enough conducted to suggest why, for reasons related to system design or implementation or study method, the intervention did not yield the expected beneficial effect. A good negative study is thus a study that contains something significant from which we all can learn. Both “negative reports” included in this issue meet this relaxed definition.

The Patterson and Harasym work describes an intervention that is certainly interesting. Their effort exposes medical students to the clinical information systems they will be using routinely in their future practice, while attempting to embed educational experiences into this exposure. We agree with the authors that there have been few, if any, such efforts documented in the archival literature. These authors did an excellent job of integrating their intervention into the students' workflow. There is ample evidence to conclude that these information resources were used by the students, yet no effects on “learning,” as the authors measured learning, were observed.

How to measure the outcome of interest is one of many decisions that is a matter of author prerogative. In electing the standard test used in the surgery clerkship, the authors made, in our view, an interesting but perhaps less than optimal choice. This is a conservative choice in the sense that, were statistically significant differences observed, their importance would be unquestioned. But perhaps it was not the best choice, because it may not have been sensitive to the specific effects the authors' intervention engendered. So perhaps a customized test tailored to the outcomes of the intervention would have been better. Were differences observed using a more focused and specific measure, they would have spawned a debate about the significance of these results, but at least there would have been differences to discuss. Interestingly, Patterson and Harasym's conservative choice of outcome measure runs counter to our speculation that informatics researchers will select measures tending to cast their intervention in a positive light. These authors did the opposite.

The paper by Rocha and colleagues also satisfies the criterion of having intrinsic interest through the central role envisioned for clinical decision support systems in reducing medical errors and unneeded variations in practice. The authors conclude their paper with multiple reasons why theirs is one of a small number of studies of clinical decision support to yield null results. The study documents, in fact, that their decision support system was not fully integrated into patient care in the clinical environments studied. It relates this experience in a way that helps us understand how difficult the integration process can be, and the care that must be taken to ensure sufficient integration to realize the effects the authors were seeking. Of almost equal interest in this paper is the authors' noble attempt to impose a rigorous experimental method on a dynamic clinical setting, which resulted in a set of matched cases perhaps too small and too narrowly focused on specific clinical problems to detect any effects that may have existed. Like the work of Patterson and Harasym, the paper by Rocha and colleagues is a negative study from which several important lessons can be learned.

In conclusion, and with reference to the entire exercise that led to the publication of these two negative studies, we should examine the evidence this experience provides regarding the existence of a significant publication bias in medical informatics. When we issued the call for manuscripts, it was tempting to fantasize that years of pent-up demand would trigger a downpour of papers from authors who had been sitting on interesting, but negative, studies, all this time despairing of a place to publish them. The four manuscripts received were, quantitatively, more of a drizzle than the fantasized torrent. It is possible that our welcoming of negative studies and promises of appropriate review were received cynically, making researchers reluctant to write up data that would report no differences or resubmit work that had perhaps been rejected in the past. Nonetheless, the response to our solicitation seems to lessen the chance that there is a substantial hidden literature depicting a large number of instances where “the system just didn't work” or “no differences were observed.”

Beyond that, we enter the realm of speculation as to the extent of publication bias in our field. The motivations to publish and the factors that determine whether a completed work results in a manuscript are themselves complex. Concern about personal or corporate embarrassment may indeed play a major role. As medical system deployments become more extensive and expensive, the consequences attaching to lack of success become Brobdingnagian.

Moreover, publication bias does not result exclusively from failure to publish negative studies. In a complementary way, many efforts with positive results do not find their way into the literature because of distraction, overwork, lack of motivation or rewards for publication, or perhaps the desire not to reveal too much about a resource that could be commercializable. These factors also speak to our field's core values. They should be the focus of continuing discussions that go well beyond the issue of negative studies that was our original motivation.

We hope that this exercise and the resulting publication of two “negative studies” will, over time, work to reduce whatever level of publication bias exists in medical informatics. We hope we have raised the level of concern about this important scientific issue, and perhaps deepened understanding of what this elusive term means. Authors who have performed careful work should not be hesitant to report this work, even if the results were negative, null, or “disappointing.” We encourage these authors to analyze carefully the reasons these results were obtained, and to be articulate about these factors in their submitted manuscripts.