As with individually randomized trials, a number of considerations need to be addressed up front for CRTs to avoid downstream problems. In particular, potential confounding is always an issue. For example, if elderly patients are more likely than younger patients to acquire nosocomial infections, it would be important to ensure that one of the arms of the trial is not more likely to consist of elderly patients. In this example, if the clusters are hospital wards, there should be some assurance of balance in the average ages of the wards assigned to one arm compared to the other. Sometimes there are several potential confounders that might play a role.
Pair Matching and Stratification With Cluster Designs
Two popular mechanisms for achieving balance are pair matching and stratification. With pair matching, clusters are paired in terms of their potential confounders and then within each pair, one cluster is randomized to receive one of the arms and the other cluster receives the opposite arm. For example, considering age and sex as potential confounders, clusters would be matched into pairs such that the average age and the percent female are approximately equal. Likewise, the sizes of the 2 clusters should be similar. Stratification is a generalization of pair matching in that strata are formed based on the potential confounders; within each stratum, a randomization scheme that ensures balance is developed. For example, if there are 11 clusters in one stratum, the randomization would assign 5 clusters to one arm and 6 to the other. However, when there are several confounders, it can be difficult to use stratification or pair matching.
Another method that is increasingly being studied and implemented for CRTs is constrained randomization (Li et al, 2016). Exploiting the fact that all of the clusters are identified before randomization, they can each be characterized in terms of the levels of several potential confounders. For any possible randomization of this set of clusters, a balance metric (several exist) is applied to “measure” the amount of imbalance that would exist if that particular randomization were applied. It is possible to generate a large number of potential randomization schemes; in fact, with very few clusters, every possible randomization scheme can be tabulated in this way, along with their respective balance scores. By some predefined criterion, such as a certain percentage of all possible randomizations, a set of clusters with the least amount of imbalance is chosen as the “randomization space.” From this “randomization space,” a single randomization scheme is selected. There are many statistical issues that are still being explored with respect to this strategy.
Case Example: The LIRE Study
The Pragmatic Trial of Lumbar Image Reporting with Epidemiology (LIRE), an NIH Collaboratory Demonstration Project, is testing an intervention that inserts epidemiologic benchmarks into reports from lumbar spine imaging. LIRE will seek to determine whether this simple informational intervention can reduce subsequent testing and treatments that may not provide any benefit to patients. LIRE is an example of a stepped-wedge CRT (Jarvik JG et al, 2015), in which clinics at 4 large health systems are randomized to initiate the intervention as part of 1 of 5 “waves” corresponding to prespecified calendar dates.
When a PCT employs cluster randomization, in which randomization may occur at the level of the hospital or health system, the simplest approach to cluster randomization is to start the study at the same time for all the clusters. Given that half of the clusters will be randomized to the control condition, they will not receive the intervention until the trial ends. In addition, preparing all clusters to be ready to start the intervention at the same time may not be feasible. The stepped-wedge design overcomes these problems by gradually introducing the intervention to groups of clusters over time. Clusters are divided into several groups, usually 4 to 6, and the time that the intervention is “turned on” for each group is randomized. The stepped-wedge design also has the benefit of allowing researchers to record and incorporate into the analyses changes to the hospital or health system that happen over time and have the potential to affect the study (Cook et al, 2016).
Because all sites eventually will receive the intervention, a stepped-wedge trial may be more appealing to the broader community and thus lead to more robust study participation, especially in cases where the intervention seems particularly promising (Hussey and Hughes, 2007).
For additional information about considerations affecting study design decisions, see also Designing With Implementation and Dissemination in Mind.
A guidance document from the Biostatistics and Study Design Core
Presentation from the Agency for Healthcare Research and Quality (AHRQ) provides a technical overview of applications of the stepped-wedge design in clinical research
Cook AJ, Delong E, Murray DM, Vollmer WM, Heagerty PJ. 2016. Statistical lessons learned for designing cluster randomized pragmatic clinical trials from the NIH Health Care Systems Collaboratory Biostatistics and Design Core. Clin Trials. 13:504-512. doi:10.1177/1740774516646578. PMID: 27179253.
Li F, Lokhnygina Y, Murray DM, Heagerty PJ, DeLong ER. 2016. An evaluation of constrained randomization for the design and analysis of group-randomized trials. Stat Med. 35:1565-1579. doi:10.1002/sim.6813. PMID: 26598212.
Jarvik JG, Comstock BA, James KT, et al. 2015. Lumbar Imaging With Reporting Of Epidemiology (LIRE)--Protocol for a pragmatic cluster randomized trial. Contemp Clin Trials. 45(Pt B):157-163. doi:10.1016/j.cct.2015.10.003. PMID: 26493088.
Hussey MA, Hughes JP. Design and analysis of stepped wedge cluster randomized trials. 2007. Contemp Clin Trials. 28:182-191. doi:10.1016/j.cct.2006.05.007. PMID: 16829207.