Randomization Methods

Experimental Designs and Randomization Schemes

Section 6

Randomization Methods


Elizabeth R. DeLong, PhD

For the NIH Health Care Systems Collaboratory Biostatistics and Study Design Core


Contributing Editor

Jonathan McCall, MS

The simplest forms of individually randomized RCTs feature two study arms (an experimental therapy vs comparator/placebo) in which study participants are randomly assigned in a 1:1 ratio to receive either the experimental therapy or the comparator. Other designs may incorporate 3 or more study arms, weighted randomization (i.e., participants are not randomized in a simple 1:1 ratio), stratification, or crossover designs. When crossover designs are feasible, they reduce the necessary sample size by using the participants as their own controls.

As with individually randomized trials, a number of considerations need to be addressed upfront for CRTs in order to avoid downstream problems. In particular, potential confounding is always an issue. For example, if elderly patients are more likely than younger patients to get nosocomial infections, it would be important to ensure that one of the arms of the trial is not more likely to consist of elderly patients. In this example, if the clusters are hospital wards, there should be some assurance of balance in the average ages of the wards assigned to one arm versus the other. Sometimes there are several potential confounders that might play a role.

Pair-Matching and Stratification

Two popular mechanisms for achieving balance are pair-matching and stratification. With pair-matching, clusters are paired in terms of their potential confounders and then within each pair, one cluster is randomized to receive one of the arms and the other cluster receives the opposite arm. For example, considering age and sex as potential confounders, clusters would be matched into pairs such that the average age and the percent female are approximately equal. Likewise, the sizes of the 2 clusters should be similar. Stratification is a generalization of pair-matching in that strata are formed based on the potential confounders; within each stratum, a randomization scheme that ensures balance is developed. For example, if there are 11 clusters in one stratum, then the randomization would assign 5 clusters to one arm and 6 to the other. However, when there are several confounders, it can be difficult to stratify or pair match.

Constrained Randomization

Another method that is increasingly being studied and implemented for cluster randomized trials is constrained randomization (Li et al. 2016). Exploiting the fact that all of the clusters are identified prior to randomization, they can each be characterized in terms of the levels of several potential confounders. For any possible randomization of this set of clusters, a balance metric (several exist) is applied to “measure” the amount of imbalance that would exist if that particular randomization were applied. It is possible to generate a large number of potential randomizations; in fact, with very few clusters, every possible randomization scheme can be tabulated in this way, along with their respective balance scores. By some predefined criterion, such as a certain percentage of all possible randomizations, a set of clusters with the least amount of imbalance is chosen as the “randomization space.” From this “randomization space” a single randomization scheme is selected. There are many statistical issues that are still being explored with respect to this strategy.

Stepped-Wedge Designs

Case Example: The LIRE Study

The NIH Collaboratory’s Pragmatic Trial of Lumbar Image Reporting with Epidemiology (LIRE) study is testing an intervention that inserts epidemiologic benchmarks into reports from lumbar spine imaging. The LIRE study will seek to determine whether this simple informational intervention can reduce subsequent testing and treatments that may not provide any benefit to patients. LIRE is an example of a stepped-wedge cluster-randomized design (Jarvik JG et al. 2015), in which clinics at four large health systems are randomized to initiate the intervention as part of one of five “waves” corresponding to prespecified calendar dates.

When a PCT employs cluster randomization, in which randomization may occur at the level of the hospital or health system, the simplest approach to cluster randomization means that some hospitals or health systems will not receive the intervention until the trial ends. Additionally, preparing all clusters to be ready to start the intervention at the same time may not be feasible. The stepped-wedge design overcomes these problems by gradually introducing the intervention to groups of clusters over time. Clusters are divided into several groups, usually between four and six, and the time that the intervention is “turned on” for each group is what is randomized. The stepped-wedge design also has the benefit of allowing researchers to record and incorporate into the analyses changes to the hospital or health system that happen over time and have the potential to affect the study (Cook et al. 2016).

Because all sites eventually will receive the intervention, a stepped-wedge trial may be more appealing to the broader community and thus lead to more robust study participation, especially in cases where the intervention seems particularly promising (Hussey and Hughes 2007).

For additional information about considerations affecting study design decisions, please see also Designing with Implementation and Dissemination in Mind.

Stepped-Wedge Example

Stepped-Wedge Cluster Randomization Example




Pair-Matching vs Stratification in Cluster-Randomized Trials

A guidance document from the Biostatistics and Study Design Core

Advanced Methods for Primary Care Research: The Stepped Wedge Design

Presentation from the Agency for Healthcare Research and Quality (AHRQ) provides a technical overview of applications of the stepped-wedge design in clinical research


back to top

Cook AJ, Delong E, Murray DM, Vollmer WM, Heagerty PJ. 2016. Statistical lessons learned for designing cluster randomized pragmatic clinical trials from the NIH Health Care Systems Collaboratory Biostatistics and Design Core. Clin Trials. 13:504-512. doi:10.1177/1740774516646578. PMID: 27179253.

Li F, Lokhnygina Y, Murray DM, Heagerty PJ, DeLong ER. 2016. An evaluation of constrained randomization for the design and analysis of group-randomized trials. Stat Med. 35:1565-1579. doi:10.1002/sim.6813. PMID: 26598212.

Jarvik JG, Comstock BA, James KT, et al. 2015. Lumbar Imaging With Reporting Of Epidemiology (LIRE)--Protocol for a pragmatic cluster randomized trial. Contemp Clin Trials. 45(Pt B):157-163. doi:10.1016/j.cct.2015.10.003. PMID: 26493088.

Hussey MA, Hughes JP. Design and analysis of stepped wedge cluster randomized trials. 2007. Contemp Clin Trials. 28:182-191. doi:10.1016/j.cct.2006.05.007. PMID: 16829207.


DeLong ER. Experimental Designs and Randomization Schemes: Randomization Methods. In: Rethinking Clinical Trials: A Living Textbook of Pragmatic Clinical Trials. Bethesda, MD: NIH Health Care Systems Research Collaboratory. Available at: http://rethinkingclinicaltrials.org/experimental-designs-randomization-schemes-top/randomization-methods/. Updated August 24, 2017.