Speakers
Gregory E. Simon, MD, MPH
Kaiser Permanente Washington Health Research Institute
Susan Huang, MD, MPH
University of California Irvine
Elizabeth Turner, PhD
Duke University
Keywords
P-Values; Significance; Statistical Analysis; Pragmatic Trials; Decision-Makers
Key Points
- P-values are a part of the statistical process of hypothesis and significance testing. They quantify of the degree of “surprise” in a finding. The result is dichotomous; a P-value of less than 0.05 is considered statistically significant, while a P-value greater than or equal to 0.05 is not.
- 0.05 is a useful but somewhat arbitrary cutoff. It was probably first described in Statistical Methods for Medical Workers by R. A. Fisher: “It is convenient to take this point as a limit in judging whether a deviation is to be considered significant or not.” According to an anecdote shared by Fisher’s daughter, he identified the cutoff as “convincing enough” based on an informal experiment with a colleague.
- Using a single threshold to determine significance can be problematic in real-world settings. Healthcare decisionmakers are seeking solutions to multi-dimensional problems, and they care about subgroups. Dr. Huang illustrated this point with an overview of ABATE Infection trial and her team’s subsequent collaboration with decision-makers.
- ABATE Infection was a pragmatic, cluster-randomized trial assessing universal decolonization in non-ICUs. While decolonization wasn’t effective for all non-ICU patients, a post-hoc analysis found that the intervention was highly effective in patients with medical devices. This finding was practically significant and was included in national guidance around decolonization.
- In a cost-effectiveness analysis of universal, targeted, or no decolonization for patients with medical devices, the ABATE team found that the optimal outcome was dependent on site circumstances, i.e. prevalence of device use, adherence to targeted decolonization, and financial penalties for bloodstream infection.
- For years, experts have questioned the reliance on P-values. On the other hand, there are concerns that rejecting “H1 – H0” could prove to be a slippery slope to data dredging and “post-hoc chicanery.”
- The dogma of the P-value may be more applicable to a clinical trial setting than to a pragmatic setting. Establishing the standard of care requires a high level of certainty. Scientific rigor demands rules and a threshold that isn’t affected by cost.
- In hospitals, clinical decisions are rarely based on certainty; safe interventions that are low-cost and have a possible benefit are given more consideration. Decision-makers should understand the probability of benefit at a given P-value; circumstances may warrant adoption.
- In pragmatic trials, valuable information may include the intervention effect size, the effect for various outcomes and on various subgroups, and information pertinent to implementation: fidelity, reach, cost, etc.
- Decision-making is complex and multidimensional. What is important may depend on context, audience, or other situational factors. While P-values can be useful in decision-making, they aren’t the only piece of the puzzle.
Discussion Themes
Changing the reliance on P-values would require a multi-pronged, multi-dimensional approach; sponsors, journals, and other stakeholders each uphold the use of P-values for various reasons. Perhaps the best way to start integrating this perspective shift into the clinical trials ecosystem is to hold the line, routinely seeking and providing information about a variety of outcomes and confidence levels.
If we hold that the underlying but unknown truth is fixed, then our process for arriving at conclusions regarding a treatment’s effectiveness (or whether the treatment has a favorable benefit-risk profile) inherently has important operating characteristics, such as the Type I error rate. If we move away from P-values, we will need to define a design approach that considers these operating characteristics.
Maybe it’s more practical to think about honing into a standard of care as an iterative process, in the way that human learning is an iterative process; to state that we know something to some degree of certainty, then modify, refine, and get closer to defining these truths.
In a special session of Rethinking Clinical Trials Grand Rounds on September 26, longtime leaders from the NIH Pragmatic Trials Collaboratory will present
The P value is a statistic frequently used in biomedical research for the presentation of study findings. It represents a dichotomous decision about whether a finding is “statistically significant” based on a predetermined level, typically < .05.
A key feature of the NIH Pragmatic Trials Collaboratory is its culture of learning and knowledge sharing among investigators who are planning and conducting pragmatic clinical trials. In a session at the program’s
In a new episode of the NIH Pragmatic Trials Collaboratory Podcast, Rich Platt, Hayden Bosworth, and Greg Simon of the NIH Pragmatic Trials Collaboratory discuss their JAMA Viewpoint, “Making Pragmatic Clinical Trials More Pragmatic.”
The Health Care Systems Research Network (HCSRN) extended the early-bird registration deadline for its 