pmc logo imageJournal ListSearchpmc logo image
Logo of bmjBMJ helping doctors make better decisionsSearchLatest content
BMJ. 1999 October 16; 319(7216): 1070.
PMCID: PMC1116858
Influence of data display formats on decisions to stop clinical trials
Paper is misleading, like a sheep dressed in a wolf’s clothing
James M Walker, senior clinical information architect
Penn State College of Medicine, PO Box 850 (H-136), Herhey, PA 17033, USA Email: jmwalker/at/psghs.edu
 
Editor—The abstract of Elting et al’s paper gave the impression that icon displays resulted in significantly more correct decisions than did tables (P=0.03).1 In fact, the P value of 0.03 applies only to the comparison between icon displays and bar graphs or pie charts. The P value for the comparison between icon displays and tables is not significant (P=0.17 (p 1529)).

The study showed no significant difference between icon displays and tables for time to make the decision (P=0.81) or for the quality of the decision. In view of this, the abstract and discussion are deceptive.

Authors’ reply

Elting, Linda S Martin, Charles G Cantor, Scott B Rubenstein, Edward B (Department of Medical Specialties, University of Texas MD Anderson Cancer Center, 1515 Holcombe Boulevard–Box 40, Houston, TX 77030-4095, USA lelting@notes.mdacc.tmc.edu).

Editor—The statement in the abstract regarding the superiority of the icon display is correct, as is the P value ascribed to the comparison. This reflects the overall comparison among the four displays, using Cochran’s Q test of the repeated measures of correct decisions. This is reported in the abstract and the results section because it was the planned analysis of our primary hypothesis. Exploratory, pairwise analyses were also reported. Coincidentally, the P values for the McNemar tests of the difference between icon displays and the bar charts or pie graphs were also 0.03; the P value for the pairwise comparison between the icon and table displays was 0.17.

The interpretation of the overall, repeated measures test of the primary hypothesis is straightforward. The accuracy rate with icons was superior to that with the other display methods, and the observed difference was unlikely to have occurred by chance. As is commonly the case with exploratory analyses, however, interpretation of the pairwise comparisons is an exercise in explaining the results of underpowered tests. The study was not powered to test these hypotheses, and the P value of 0.17 for the difference between the icon displays (82%) and table displays (68%) reflects the small sample size. (The overall test included 136 observations, and the pairwise tests included only 68.)

One could argue that a difference of 82% versus 68% and a P value <0.20 in a small sample warrant further study and that in a larger sample this difference would be significant. As the P value suggests, however, one could argue equally strongly that there is a 17% probability that the observed difference occurred by chance alone.

Unfortunately, no matter how one explains the results of underpowered tests, in statistics, as in sheep herding, it’s the size of the flock that counts. The prudent reader will cry wolf only in cases justified by adequate sample sizes. Accordingly, we remind readers to interpret the pairwise, exploratory analyses in our study with caution because of their low power. Although underpowered for hypothesis testing, the pairwise comparisons are useful for hypothesis generation. As we have stated, accuracy rates with table displays were intermediate. Future studies should examine this issue.

References
1.
Elting, LS; Martin, CG; Cantor, SB; Rubenstein, EB. Influence of data display formats on physician investigators’ decisions to stop clinical trials: prospective trial with repeated measures. BMJ. 1999;318:1527–1531. . (5 June.). [PubMed]