Grand Rounds September 26, 2025: Significance in ePCTs: P Values vs Decision-Maker Perspectives (Gregory E. Simon, MD, MPH; Susan Huang, MD, MPH; Elizabeth Turner, PhD)

Speakers

Gregory E. Simon, MD, MPH
Kaiser Permanente Washington Health Research Institute

Susan Huang, MD, MPH
University of California Irvine

Elizabeth Turner, PhD
Duke University

Keywords

P-Values; Significance; Statistical Analysis; Pragmatic Trials; Decision-Makers

Key Points

  • P-values are a part of the statistical process of hypothesis and significance testing. They quantify of the degree of “surprise” in a finding. The result is dichotomous; a P-value of less than 0.05 is considered statistically significant, while a P-value greater than or equal to 0.05 is not.
  • 0.05 is a useful but somewhat arbitrary cutoff. It was probably first described in Statistical Methods for Medical Workers by R. A. Fisher: “It is convenient to take this point as a limit in judging whether a deviation is to be considered significant or not.” According to an anecdote shared by Fisher’s daughter, he identified the cutoff as “convincing enough” based on an informal experiment with a colleague.
  • Using a single threshold to determine significance can be problematic in real-world settings. Healthcare decisionmakers are seeking solutions to multi-dimensional problems, and they care about subgroups. Dr. Huang illustrated this point with an overview of ABATE Infection trial and her team’s subsequent collaboration with decision-makers.
  • ABATE Infection was a pragmatic, cluster-randomized trial assessing universal decolonization in non-ICUs. While decolonization wasn’t effective for all non-ICU patients, a post-hoc analysis found that the intervention was highly effective in patients with medical devices. This finding was practically significant and was included in national guidance around decolonization.
  • In a cost-effectiveness analysis of universal, targeted, or no decolonization for patients with medical devices, the ABATE team found that the optimal outcome was dependent on site circumstances, i.e. prevalence of device use, adherence to targeted decolonization, and financial penalties for bloodstream infection.
  • For years, experts have questioned the reliance on P-values. On the other hand, there are concerns that rejecting “H1 – H0” could prove to be a slippery slope to data dredging and “post-hoc chicanery.”
  • The dogma of the P-value may be more applicable to a clinical trial setting than to a pragmatic setting. Establishing the standard of care requires a high level of certainty. Scientific rigor demands rules and a threshold that isn’t affected by cost.
  • In hospitals, clinical decisions are rarely based on certainty; safe interventions that are low-cost and have a possible benefit are given more consideration. Decision-makers should understand the probability of benefit at a given P-value; circumstances may warrant adoption.
  • In pragmatic trials, valuable information may include the intervention effect size, the effect for various outcomes and on various subgroups, and information pertinent to implementation: fidelity, reach, cost, etc.
  • Decision-making is complex and multidimensional. What is important may depend on context, audience, or other situational factors. While P-values can be useful in decision-making, they aren’t the only piece of the puzzle.

Discussion Themes

Changing the reliance on P-values would require a multi-pronged, multi-dimensional approach; sponsors, journals, and other stakeholders each uphold the use of P-values for various reasons. Perhaps the best way to start integrating this perspective shift into the clinical trials ecosystem is to hold the line, routinely seeking and providing information about a variety of outcomes and confidence levels.

If we hold that the underlying but unknown truth is fixed, then our process for arriving at conclusions regarding a treatment’s effectiveness (or whether the treatment has a favorable benefit-risk profile) inherently has important operating characteristics, such as the Type I error rate. If we move away from P-values, we will need to define a design approach that considers these operating characteristics.

Maybe it’s more practical to think about honing into a standard of care as an iterative process, in the way that human learning is an iterative process; to state that we know something to some degree of certainty, then modify, refine, and get closer to defining these truths.

September 24, 2025: P Values vs Decision-Maker Perspectives, in This Week’s Rethinking Clinical Trials Grand Rounds

In a special session of Rethinking Clinical Trials Grand Rounds on September 26, longtime leaders from the NIH Pragmatic Trials Collaboratory will present “Significance in Pragmatic Clinical Trials: P Values vs Decision-Maker Perspectives.”

The Grand Rounds session will be held on Friday, September 26, 2025, at 1:00 pm eastern.

Greg Simon is a senior investigator at the Kaiser Permanente Washington Health Research Institute, a member of the NIH Collaboratory leadership team, and a cochair of the NIH Collaboratory’s Health Care Systems Interactions Core. Susan Huang is a Chancellor’s Professor in the Division of Infectious Diseases at the UC Irvine School of Medicine and the medical director of epidemiology and infection prevention at UCI Health. She was the principal investigator for ABATE Infection, an NIH Collaboratory Trial. Liz Turner is an associate professor of biostatistics and bioinformatics and global health at Duke University and a cochair of the NIH Collaboratory’s Biostatistics and Study Design Core.

Join the online meeting.

September 8, 2025: P Values vs Decision-Maker Perspectives, in a Special Grand Rounds Session on September 26

In a special session of Rethinking Clinical Trials Grand Rounds on September 26, longtime leaders from the NIH Pragmatic Trials Collaboratory will present “Significance in Pragmatic Clinical Trials: P Values vs Decision-Maker Perspectives.”

The Grand Rounds session will be held on Friday, September 26, 2025, at 1:00 pm eastern.

Greg Simon is a senior investigator at the Kaiser Permanente Washington Health Research Institute, a member of the NIH Collaboratory leadership team, and a cochair of the NIH Collaboratory’s Health Care Systems Interactions Core. Susan Huang is a Chancellor’s Professor in the Division of Infectious Diseases at the UC Irvine School of Medicine and the medical director of epidemiology and infection prevention at UCI Health. She was the principal investigator for ABATE Infection, an NIH Collaboratory Trial. Liz Turner is an associate professor of biostatistics and bioinformatics and global health at Duke University and a cochair of the NIH Collaboratory’s Biostatistics and Study Design Core.

Join the online meeting.

 

July 10, 2025: Researchers Consider the P Value’s Usefulness in Healthcare Systems Research

The P value is a statistic frequently used in biomedical research for the presentation of study findings. It represents a dichotomous decision about whether a finding is “statistically significant” based on a predetermined level, typically  < .05.

Although the peer-reviewed journals in which researchers aspire to publish their work are anchored to P values, the information used to drive decisions in healthcare is not. At the NIH Pragmatic Trials Collaboratory’s 2025 Annual Steering Committee Meeting, a panel led by Greg Simon, leader of the Health Care Systems Interactions Core, discussed P values versus decision-maker perspectives.

Communities, partners, and healthcare systems leaders make decisions based on many, multidimensional factors.

“We care about health outcomes, but we also we care about cost and the satisfaction of members, patients, and employees. Any attempt to roll those up into one statistic is really problematic,” Simon said.

Key Takeaways

  • Where possible, measure and report on what is meaningful to partners, including effect sizes, confidence intervals, cost, and patient and employee satisfaction.
  • Recognize that that P values are a useful metric, but they are only one piece of a larger toolbox.
  • Understand that what is important depends on context, the audience, and local and national priorities.

The panelists included Corita Grudzen, co–principal investigator for the PRIM-ER trial; Rich Platt, co-lead of the NIH Collaboratory’s Distributed Research Network; and Liz Turner, colead of the Biostatistics and Study Design Core.

This summer, we are sharing highlights from the 2025 Annual Steering Committee Meeting.  Access the complete collection of meeting materials.

October 15, 2024: Case Study Describes a Reassessment of Sample Size in an Ongoing Cluster Randomized Trial

FM-TIPS logoA new case study from the NIH Pragmatic Trials Collaboratory highlights an interim reassessment of sample size during an ongoing cluster randomized trial. The case study was published this week in the Living Textbook of Pragmatic Clinical Trials.

Researchers in cluster randomized trials must account for potential correlation between clusters in the design and analysis of their trial by estimating the intraclass correlation when calculating the target sample size. Often they use preliminary data from the planned enrollment sites to estimate the correlation. However, when preliminary data are unavailable at the time of study design, they may use interim data collected during the trial itself to reassess the trial’s sample size.

The contributors of the case study focus on FM-TIPS, an NIH Collaboratory Trial, to describe an approach to conducting an interim reassessment of sample size in an ongoing trial. Read the full case study.

FM-TIPS is examining whether the addition of transcutaneous electrical nerve stimulation to routine physical therapy improves movement-evoked pain compared with physical therapy alone among patients with fibromyalgia. The trial is supported by the National Institute of Arthritis and Musculoskeletal and Skin Diseases through the NIH HEAL Initiative. Learn more about FM-TIPS.

The contributors of the case study include members of the FM-TIPS study team and leaders of the NIH Collaboratory’s Biostatistics and Study Design Core. David-Erick Lafontant is a statistician, Bridget Zimmerman is a clinical professor of biostatistics, and Emine Bayman is an associate professor of biostatistics—all at the University of Iowa. Megan McCabe is an assistant professor of biostatistics at the University of Alabama at Birmingham. Patrick Heagerty is a professor of biostatistics at the University of Washington. Liz Turner is an associate professor of biostatistics and bioinformatics at Duke University.

August 19, 2024: Ethics Core and Biostatistics Core Guide Newest Trials Through Planning Phase

Leaders of 2 of the NIH Pragmatic Trials Collaboratory’s long-standing Core Working Groups recently shared updates from their work with the newest cohort of NIH Collaboratory Trials. We spoke with them during the NIH Collaboratory’s 2024 Annual Steering Committee Meeting in May.

Over the last year, the Ethics and Regulatory Core engaged in a formal onboarding process with the program’s 9 newest pragmatic trials, consulting with the investigators about their trial planning and implementation. Cochairs Stephanie Morain and Pearl O’Rourke summarized several of the ongoing and emerging challenges.

“One of the challenges we’re continuing to see is understanding what are the appropriate duties that institutions and investigators have in the context of a [pragmatic clinical trial],” said Morain. “One concrete area is in data and safety monitoring. What kinds of issues need to be monitored as adverse events? How do we think about them as being related to the trial vs relating to the background care?” she added.

Onboarding documentation from the Ethics and Regulatory Core’s consultations with the NIH Collaboratory Trials is available on our Data and Resource Sharing page.

We also spoke with Liz Turner and Patrick Heagerty, cochairs of the Biostatistics and Study Design Core. They have spent the past year advising the NIH Collaboratory Trial investigators on key study design challenges.

“Many of these studies have individually randomized patients but then they’re studying implementation pathways when they implement through a specific person that puts them in groups—these are individually randomized group treatment trials,” said Heagerty. “Several of the studies didn’t see that, and so we helped them see it and we helped them work through how to adapt their analysis and modify their sample size work to ensure the trial was properly sized,” he explained.

In addition to consultations with the NIH Collaboratory Trials, the Biostatistics and Study Design Core continues to develop and innovate pragmatic trials methodology.

Learn more about the Core Working Groups.

August 16, 2022: Biostatistics Core Develops Tools and Strategies for Common Research Challenges

Head shot of Dr. Patrick HeagertyHead shot of Dr. Liz TurnerIn an interview at the NIH Pragmatic Trials Collaboratory’s annual Steering Committee meeting and 10th anniversary celebration, we asked Dr. Liz Turner and Dr. Patrick Heagerty to reflect on the role of the Biostatistics and Study Design Core Working Group in helping the NIH Collaboratory Trial teams design their trials and analyze the data, and to discuss their focus for the Core's future contributions to pragmatic clinical trials.

Based on your experience working with the NIH Collaboratory Trials, what are some of the common challenges of the Core?

Given the pragmatic nature of the NIH Collaboratory Trials, most use a design that involves some kind of clustering of outcomes. This could be a cluster randomized design or an individually randomized group treatment trial. As a consequence, nearly all projects face the challenge of how to account for clustering in both the design and analysis of the trial.

For the NIH Collaboratory Trials that use a cluster randomized design, one of the most common challenges is deciding between a stepped-wedge design and a standard parallel-arm design. The Core’s recommendation is clear: only use a stepped-wedge design if you have to! Likewise, only use a cluster randomized design if you have to and, if possible, use an individually randomized design. Nevertheless, a cluster randomized design is often the design of choice to address a pragmatic research question, and a stepped-wedge cluster randomized design may be the only way to perform a randomized evaluation of an intervention (for example, when all centers wish to receive the intervention in order to agree to participate in the trial).

From an analysis perspective, common challenges involve how to handle missing outcome data and how to handle longitudinal (that is, repeated) measures data. For both design and analysis, as you can imagine, the COVID-19 pandemic has posed huge challenges, including how to handle the disruption of an ongoing stepped-wedge trial (as in the GGC4H NIH Collaboratory Trial). In short, clustering of outcomes is the biggest theme (and challenge) across the NIH Collaboratory Trials.

What strategies have NIH Collaboratory Trials used to overcome these barriers?

A common strategy used by the NIH Collaboratory Trials to overcome these barriers has been to leverage what we call the “Core group process.” This dynamic process is driven by the NIH Collaboratory Trials and supported by the Core, together with NIH Collaboratory leadership. The process is centered around the monthly Core meeting to which all NIH Collaboratory Trial teams are invited and that involves all Core members. These meetings provide dedicated time for each study team to provide project updates and elicit feedback from the Core and the other NIH Collaboratory Trial teams. In particular, all the study teams are invited to present at least once during the UG3 planning phase and on multiple occasions during the UH3 implementation phase. Core members are also available for ad hoc, smaller group meetings, as requested. What this process allows is for the NIH Collaboratory Trials to present challenges and for us to jointly identify solutions.

How are the NIH Collaboratory Trials’ experiences with the Core helping the field of pragmatic research?

Through the challenges and ideas that have been brought to the Core, the NIH Collaboratory Trials have pushed the field of pragmatic research. In particular, through the Core group process, they have pushed the Core to solve methodological challenges and provide tools to tackle the design issues that arise in the changing research landscape.

Thumbnail image of the COVID-19 checklist

A key example of the Core’s methodological work was inspired by the STOP CRC NIH Collaboratory Trial and is related to the design and analysis choices faced in the unique context of embedded pragmatic trials. This example addresses a common challenge in embedded pragmatic trials, namely how to handle varying cluster sizes, something that arises in so many of the NIH Collaboratory Trials. The research, recently published in Contemporary Clinical Trials, highlights that a seemingly natural analysis in this context may produce a biased inference about intervention effectiveness, which is clearly problematic.

The second example is the Core’s recently published Statistical Analysis Plan Checklist for Addressing COVID-19 Impacts. Development of this tool was inspired by the many challenges faced by the NIH Collaboratory Trials as a result of the COVID-19 pandemic, such as delayed recruitment (as in the BackInAction NIH Collaboratory Trial) and adjustments to how interventions were delivered (as in the ACP PEACE NIH Collaboratory Trial).

What do you think the Core can contribute over the next decade?

The Core has a lot to contribute over the next decade. A key goal is to ensure we are building and diversifying the next generation of statisticians who are experts in pragmatic trials and who can engage deeply in the design and analysis of pragmatic trials embedded in healthcare systems.

To achieve this, we need to continue to bring trainees into the Core, as we have done over the past 6 years, through funded graduate research assistant positions. By doing this, we should be able to not only build the next generation of pragmatic trial experts but also build scholarship in pragmatic trial methodology by identifying methodological gaps needed to be filled so the NIH Collaboratory Trials study teams—and pragmatic trialists in the broader research community—have the best methods available to them.

The opportunity to participate in a cross-institution working group such as ours is surprisingly rare. As a consequence, we are in a unique position to not only build the next generation of experts but also to strength our own collective expertise and knowledge by learning from each other’s perspectives.

December 16, 2021: NIH Collaboratory Publishes COVID-19 Checklist for Statistical Analysis Plans in Pragmatic Trials

Thumbnail image of the COVID-19 checklistA new tool from the NIH Collaboratory assists investigators in identifying impacts of the COVID-19 public health emergency on ongoing pragmatic clinical trials. The Statistical Analysis Plan Checklist for Addressing COVID-19 Impacts summarizes impacts on trial conduct that study teams should document, measure, analyze, and report.

The new checklist was developed by the NIH Collaboratory’s Biostatistics and Study Design Core Working Group. Since the beginning of the COVID-19 pandemic, many of the NIH Collaboratory Trials have had to postpone recruitment, alter methods of participant engagement, and modify tools for research assessment and intervention delivery.

The leaders of the Biostatistics Core, Dr. Patrick Heagerty and Dr. Liz Turner, spoke in a recent interview about the impacts of the pandemic on the NIH Collaboratory Trials. Early next year, the Coordinating Center will report the results of a survey of the study teams about their experiences with these impacts.

Download the Statistical Analysis Plan Checklist for Addressing COVID-19 Impacts.

August 19, 2021: Biostatistics Core Helps Projects ‘Roll With the Punches’ of the Pandemic

Leaders of the NIH Collaboratory’s Biostatistics and Study Design Core Working Group spoke in a recent interview about the impacts of the COVID-19 pandemic on the NIH Collaboratory Trials, including the 2 newest projects, BeatPain Utah and GRACE.

“BeatPain Utah and GRACE are fascinating studies, as all our NIH Collaboratory Trials are, and are giving us lots of food for thought at the Biostatistics Core,” said Dr. Liz Turner, associate professor of biostatistics and bioinformatics at Duke University and a cochair of the Core. View the full video.

The 2 studies “have been pretty well positioned to roll with some of the distancing required or the lack of in-person visits,” said Dr. Patrick Heagerty, professor of biostatistics at the University of Washington and the other cochair of the Core. “The BeatPain project had a remote delivery from the beginning, so I think the impact of COVID was not as dramatic as it’s been for other projects. But GRACE, where acupuncture is part of it, they have to figure out what are the elements of the research protocol they can do remotely but still need to get folks in person to do that acupuncture,” Heagerty said.

“There really have been some considerable challenges for several of the other NIH Collaboratory Trials,” said Turner. “Good examples of these challenges are those faced by 2 stepped-wedge cluster randomized trials, ACP PEACE and PRIM-ER. …They had to really restructure the design and respond very quickly to what was happening in practice out in the field. Interestingly, on the flip side, the disruptions last spring in 2020 did provide opportunities to address other research questions and perhaps generate other interesting evidence,” Turner said.

(Learn more about the ACP PEACE study’s COVID-19 supplement: “Can a Primary Care Telehealth Intervention Change the Paradigm for Advance Care Planning?”)

Heagerty and Turner also described ongoing projects of the Core to support pragmatic research, including guidance on longitudinal analysis in randomized trials, considerations for studies with multiple outcomes, and handing of studies with variable cluster sizes. Learn more about the Biostatistics and Study Design Core.

 

Screen shot of interview with Patrick Heagerty and Liz Turner