Experimental Designs and Randomization Schemes
Section 5
Stepped-Wedge Designs
In CRTs, the simplest approach to randomization is parallel randomization, in which the clusters are randomly assigned to the intervention condition or the control condition throughout the trial with no crossover. An alternative approach is stepped-wedge randomization (Hussey and Hughes 2007). In stepped-wedge designs, the clusters are randomized into several groups or waves that define when the intervention will begin, and all clusters start the trial in the control condition. Groups of clusters cross over to the intervention condition on a staggered schedule, and all groups receive the intervention before the end of the trial (see Figure).
Figure. Stepped-Wedge Cluster Randomization
Abbreviations: UC, usual care; Int, intervention. Adapted with permission from Cook et al (2016).
Advantages of Stepped-Wedge Designs
A common justification for stepped-wedge designs is feasibility; in some studies, it is impractical or impossible to roll out the intervention to all participants at the same time (Federico et al 2022). Stepped-wedge randomization overcomes this problem by introducing the intervention to groups of clusters at different times. Moreover, because all sites in a stepped-wedge trial will eventually receive the intervention, these designs may be more appealing to the broader community and thus lead to greater study participation, especially in trials of interventions that seem particularly promising (Hussey and Hughes 2007).
In addition, stepped-wedge CRTs accommodate potentially more between-cluster heterogeneity than CRTs with parallel randomization, because the treatment effect can be partially estimated by means of before-and-after comparisons in which each cluster serves as its own control. As a result, stepped-wedge designs generally do not need as many clusters to achieve the same statistical power.
Stepped-wedge designs may also have the benefit of enabling researchers to record and incorporate into the analysis changes to the site or healthcare system that happen over time and have the potential to affect the study (Cook et al 2016). This ability to account for secular trends can also allow for simultaneous clinical implementation and research evaluation in the pragmatic trial that would not be possible if clinical implementation occurred at a common calendar time across all clusters.
Ethical Considerations
Past justifications for stepped-wedge designs have appealed to an ethical imperative to treat in life-threatening crisis situations, such as pandemics. During the Ebola outbreak, it seemed wrong to deny a potentially lifesaving vaccine (even one that was unproven) to anyone in the population in the way that a randomized placebo-controlled trial would require. Critics pointed out that, even in the absence of a placebo, individual participants in a stepped-wedge CRT will not receive the intervention under investigation. A temporal delay in who receives the intervention can be functionally equivalent to receiving a placebo.
A further justification for stepped-wedge designs arises in the context of pragmatic trials where the interventions are presumed (often with little evidence) to be beneficial. Quality improvement measures are often introduced in hospitals with limited evidence and only evaluated retrospectively, if at all. Stepped-wedge CRTs are appealing because the trials and the implementation happen simultaneously. If there are resource constraints that make it impossible to roll out a change to everyone at once, a stepped-wedge CRT becomes a superior rollout plan that allows for more rigorous data establishing the efficacy of the proposed changes, while preventing any delay in rollout. This perspective makes sense in the absence of equipoise, where there is sufficient evidence that the new intervention is better and the stepped-wedge trial just adds further support. But if the evidence in favor of the intervention is weak and there is equipoise between the new intervention and the status quo, then this approach is just as likely to result in universal implementation of a worse intervention. This is, therefore, a weak justification for not doing the research needed to establish that the intervention is, in fact, superior.
Another challenge for stepped-wedge designs is informed consent. In the United States, regulations governing consent for research participation were designed with individual-level randomization in mind. Yet, in many stepped-wedge CRTs, randomization occurs at the provider or clinic level rather than the patient level, which complicates identification of individual research participants. Some stepped-wedge CRTs even have both cluster-level participants (such as providers) and individual-level participants.
Federal regulations offer alternatives to the traditional informed consent process: waivers or alterations of consent, and waivers of documentation of consent. As of 2022, all of the NIH Pragmatic Trials Collaboratory Trials that used a stepped-wedge design were granted a waiver of consent (Federico et al 2022). Studies using waivers or alterations of consent must meet certain regulatory requirements, and many such studies use various forms of notification or opt-out provisions, which can have implications for study design and analysis, recruitment, and generalizability. (See the Waivers and Alterations section of the Living Textbook’s Consent, Waiver of Consent, and Notification chapter.)
Analytical Challenges
Stepped-wedge designs combine the essential characteristics of both cluster randomized designs and longitudinal designs. Therefore, proper design and valid analysis of stepped-wedge CRTs must consider both (1) within-cluster correlations among observations obtained from subjects within a given cluster within a given time period (a traditional intraclass correlation coefficient) and (2) the cross-time correlation of observations taken on the same cluster but in different time periods. There are several statistical formulations that account for cross-time correlations, including a common cluster autocorrelation coefficient (CAC) formulation (Hooper et al 2016; Li et al 2021) and more traditional longitudinal correlation models such as inclusion of random intercepts and time slopes, or potentially random treatment effects (Hughes, Granston, and Heagerty 2015; Voldal et al 2022). Valid prespecified analysis must consider multiple sources of heterogeneity and account for these through appropriately sophisticated mixed-model specification or through adoption of a generalized estimating equations (GEE) approach (Liang and Zeger 1993) when sufficient numbers of clusters are randomized to allow valid robust standard error estimation.
Multiple statistical tools exist for the planning of a stepped-wedge design, and these address both within-period correlations and across-period correlations. See Hemming et al (2020) and Voldal et al (2020) for flexible software implementations of sample size and power evaluation.
A critical aspect of analysis for stepped-wedge CRTs is the formulation for the treatment effect and whether consideration of time on treatment should be specified as a factor that modifies the treatment effect. For example, a cluster that is randomized to an early wave will have intervention for multiple subsequent time periods; whether the impact of the intervention is constant, delayed, or wanes over time must be considered. Time on treatment is discussed by Hughes, Granston, and Heagerty (2015) and detailed by Li et al (2021). Furthermore, Kenny et al (2022) have shown that biased estimates of overall or average treatment effect estimates will be generated by naïve models that assume a common or constant treatment effect in the presence of a time-varying treatment effect.
Case Examples
LIRE
The Pragmatic Trial of Lumbar Image Reporting With Epidemiology (LIRE), an NIH Collaboratory Trial, tested an intervention that inserted epidemiologic benchmarks into reports from lumbar spine imaging. The goal of LIRE was to determine whether this simple informational intervention reduced subsequent testing and treatments that may not provide benefit to patients. LIRE was designed as a stepped-wedge CRT (Jarvik JG et al 2020). Clinics in 4 large healthcare systems were randomized to initiate the intervention as part of 1 of 5 "waves" corresponding to prespecified calendar dates.
TSOS
A Policy-Relevant US Trauma Care System Pragmatic Trial for PTSD and Comorbidity (Trauma Survivors Outcomes and Support [TSOS]) was a stepped-wedge, cluster randomized pragmatic clinical trial testing the delivery of a stepped collaborative care intervention vs usual care for injured patients with posttraumatic stress disorder (PTSD) symptoms and comorbid conditions. This NIH Collaboratory Trial was conducted at 25 level I trauma centers in the United States.
Several events during the TSOS trial led the research team to rethink the original stepped-wedge design, including a regulatory pause in the final period, when recruitment was only occurring in the intervention arm. Because of the staggered schedule of intervention rollout in the stepped-wedge design, this regulatory pause introduced a significant disruption to the trial’s implementation, and the researchers speculated that parallel randomization might have been better.
Enrollment drift also occurred in the TSOS trial. Across the sites, there was a significant increase in PTSD symptoms among patients in the intervention arm, who were recruited later than patients in the control arm. Parallel randomization might have avoided this complication. Differential site variability between the arms over time was a key factor influencing enrollment drift.
One advantage of stepped-wedge designs is that each site in a study contributes to both the control group and the intervention group. However, differential heterogeneity over time is a challenge for both parallel and stepped-wedge designs. The lesson learned at the end of the TSOS trial was that, regardless of whether the research team chooses stepped-wedge cluster randomization or parallel cluster randomization, site variability considerations favor individual-level randomization when possible.
Conclusions
Opting for a stepped-wedge CRT design should be based on the design’s superiority over other trial designs in answering the research question. In some cases, a stepped-wedge design is the only practical way to generate the appropriate data. Nevertheless, considering the practical, ethical, and analytical challenges, stepped-wedge designs should be avoided when another rigorous study design is feasible and will answer the study question.
SECTIONS
sections
- Introduction
- Statistical Design Considerations
- Cluster Randomized Trials
- Alternative Cluster Randomized Designs
- Stepped-Wedge Designs
- Choosing Between Cluster and Individual Randomization
- Covariate-Constrained Randomization
- Pair Matching and Stratification With Cluster Designs
- Concealment and Blinding
- Designing to Avoid Identification Bias
- Additional Resources
Resources
What Are the Arguments For and Against the Stepped-Wedge Design?
One-minute training module from the NIH Pragmatic Trials Collaboratory's video library. Dr. Liz Turner discusses arguments for and against using the stepped-wedge design when choosing the right cluster randomized trial in pragmatic research.
Advanced Methods for Primary Care Research: The Stepped Wedge Design
Presentation from the Agency for Healthcare Research and Quality providing a technical overview of applications of the stepped-wedge design in clinical research
The Stepped Wedge Cluster Randomized Trial: Friend or Foe? NIH Pragmatic Trials Collaboratory PCT Grand Rounds; December 9, 2022
Current Issues in the Design and Analysis of Stepped-Wedge Trials
NIH Pragmatic Trials Collaboratory PCT Grand Rounds; December 1, 2023
REFERENCES
Cook AJ, Delong E, Murray DM, Vollmer WM, Heagerty PJ. 2016. Statistical lessons learned for designing cluster randomized pragmatic clinical trials from the NIH Health Care Systems Collaboratory Biostatistics and Design Core. Clin Trials. 13:504-512. doi:10.1177/1740774516646578. PMID: 27179253.
Federico CA, Heagerty PJ, Lantos J, et al. 2022. Ethical and epistemic issues in the design and conduct of pragmatic stepped-wedge cluster randomized clinical trials. Contemp Clin Trials. 115:106703. doi: 10.1016/j.cct.2022.106703. PMID: 35176501.
Hemming K, Kasza J, Hooper R, Forbes A, Taljaard M. 2020. A tutorial on sample size calculation for multiple-period cluster randomized parallel, cross-over and stepped-wedge trials using the Shiny CRT Calculator. Int J Epidemiol. 49:979-995. doi: 10.1093/ije/dyz237. PMID: 32087011.
Hooper R, Teerenstra S, de Hoop E, Eldridge S. 2016. Sample size calculation for stepped wedge and other longitudinal cluster randomised trials. Stat Med. 35:4718-4728. doi: 10.1002/sim.7028. PMID: 27350420.
Hughes JP, Granston TS, Heagerty PJ. 2015. Current issues in the design and analysis of stepped wedge trials. Contemp Clin Trials. 45(Pt A):55-60. doi: 10.1016/j.cct.2015.07.006. PMID: 26247569.
Hussey MA, Hughes JP. Design and analysis of stepped wedge cluster randomized trials. 2007. Contemp Clin Trials. 28:182-191. doi:10.1016/j.cct.2006.05.007. PMID: 16829207.
Kenny A, Voldal EC, Xia F, Heagerty PJ, Hughes JP. 2022. Analysis of stepped wedge cluster randomized trials in the presence of a time-varying treatment effect. Stat Med. 41:4311-4339. doi: 10.1002/sim.9511. PMID: 35774016.
Li F, Hughes JP, Hemming K, Taljaard M, Melnick ER, Heagerty PJ. 2021. Mixed-effects models for the design and analysis of stepped wedge cluster randomized trials: An overview. Stat Methods Med Res. 30:612-639. doi: 10.1177/0962280220932962. PMID: 32631142.
Liang KY, Zeger SL. 1993. Regression analysis for correlated data. Annu Rev Public Health. 14:43-68. doi: 10.1146/annurev.pu.14.050193.000355. PMID: 8323597.
Voldal EC, Xia F, Kenny A, Heagerty PJ, Hughes JP. 2022. Model misspecification in stepped wedge trials: Random effects for time or treatment. Stat Med. 41:1751-1766. doi: 10.1002/sim.9326. PMID: 35137437.
current section : Stepped-Wedge Designs
- Introduction
- Statistical Design Considerations
- Cluster Randomized Trials
- Alternative Cluster Randomized Designs
- Stepped-Wedge Designs
- Choosing Between Cluster and Individual Randomization
- Covariate-Constrained Randomization
- Pair Matching and Stratification With Cluster Designs
- Concealment and Blinding
- Designing to Avoid Identification Bias
- Additional Resources