Grand Rounds April 5, 2024: A New Look at P Values for Randomized Clinical Trials (Erik van Zwet, PhD)

Speaker

Erik van Zwet, PhD
Department of Biomedical Data Sciences
Leiden University Medical Center, the Netherlands

Keywords

P Values, Randomized Clinical Trials, Biostatistics

Key Points

  • The “essence” of a clinical trial is a set of 3 numbers:  β, b, and s. β is the unobserved, “true” effect of the treatment. B is a normally distributed, unbiased estimator of β with standard error s. It is helpful to think of the estimate b as the true effect β plus a normally distributed “error” (b= β + N (0,s).
  • There are 2 more quantities to consider. The z-stat, where z= b/s and the signal-to-noise ration (SNR= β/s). Usually in a clinical trial we want to test the null hypothesis that the treatment has zero effect. If the z statistic is greater than 1.96 then the p-value is less than 0.05. The power depends on the SNR. For example, if SNR = 2.8 then the power is 80%.
  • Researchers have been studying data from the Cochrane Database of Systematic Reviews (CDSR) that include about 23,000 z-stats for the primary efficacy outcome of RCTs in the database.
  • From the CDSR, we can estimate the distribution of the z-stat and, also, surprisingly, of the signal-to-noise ration. This is possible because there is such a simple relationship between the two. First, estimate the distribution of z directly from the observed z-stats, then derive the distribution of SNR by “removing” the standard normal error component.
  • Researchers can use the estimated distributions of the z-stats and the SNRs to build a “synthetic” version of the CDSR with the same statistical properties of the real CDSR. With the synthetic database researchers can get insights to the real CDSR.
  • The first thing to look at is power. RCTs are designed to have 80% or 90% power for testing that the true effect is actually 0 against an alternative that is considered to be of clinical interest, or plausible, or both. But the SNR is larger than 2.8 in only 12% of the CDSR trials.
  • Many trials have low power against true effect. This has 2 consequences: if p is greater than 0.05 you might be discarding a useful treatment because you didn’t collect enough information to show that it works. If p is less than 0.05 you got very lucky. The effect estimate is likely overestimated and replication attempts will likely fail. A potential solution is to calculate the shrinkage estimation.

Learn More

Read more in NEJM Evidence.

Discussion Themes

-Overestimation was found across medical disciplines. There are a lot of small low power trials that should be published and part of math analyses. There is a lot of financial pressure to design a trial that is not too large, not to small and will have a good chance of success.

Do you have any thoughts on how this applies to non-inferiority trials? Did you exclude those from these analyses? We did not exclude those. There is a small minority where we know they were non-inferiority trials. Maybe they would be differently sized.

Tags

#pctGR, @Collaboratory1