Unanticipated Changes

Analysis Plan

Section 7 Unanticipated Changes

Contributors

Conditions change during the course of any clinical trial. Most trials, especially those that span multiple years, must deal with the evolution of clinical practice and other changes in medical care. Pragmatic clinical trials face additional challenges because of their less structured approach and because they often are embedded within the workflows of large healthcare systems.

In this section, we discuss unanticipated changes during study implementation that investigators should consider in advance so that the impact of such changes can be minimized during the planning phase. The section addresses the following topics:

Changes in the potential study population, including demographic characteristics and insurance coverage patterns, fundamental data issues, and patient/participant decisions
Changes in participating healthcare systems, including healthcare system leadership, personnel turnover and training issues, site withdrawal due to burden, and structural changes such as reorganization
Changes in clinical practice, regulations, and standards, including contamination of the control group with elements of the intervention, and new healthcare initiatives that focus on the same problem as the intervention
Trial- or site-imposed data differences
Planning for and responding to unanticipated changes

Changes in the Potential Study Population

One potential challenge for any clinical trial is an unanticipated change in the putative study population. Such a change may arise for a variety of reasons, including shifts in insurance coverage, changes in demographic characteristics, and extrinsic factors (such as media reports) that affect the willingness of patients, healthcare providers, or other participants to enroll in the trial. When these changes occur, they can have differential effects on the study arms.

Demographic Characteristics and Insurance Coverage Patterns

The population that is potentially available for study participation may change as a result of underlying demographic shifts. Changes in insurance plans and Medicare or Medicaid coverage may also strongly affect the populations seen at the level of healthcare systems, hospitals, or individual clinics and practices.

Case Example: STOP CRC Trial

The Strategies and Opportunities to Stop Colorectal Cancer (STOP CRC) trial, an NIH Pragmatic Trials Collaboratory Trial, was a cluster randomized trial designed to evaluate strategies for improving rates of colorectal cancer screening. The study, which was conducted in a network of federally qualified health centers in Oregon and Northern California, encountered unanticipated changes in the potential study population when implementation of the Affordable Care Act led to Medicaid expansion and the inclusion of colorectal cancer screening as a quality metric in Oregon’s new pay-for-performance Medicaid program. The changes increased the size of Oregon’s Medicaid-enrolled population that was age-eligible for colorectal cancer screening by 55% (Coronado et al 2015).

Fundamental Data Issues

Evolution of technology and advances in diagnostic codes can modify the eligible population and hence change the definition of the target population. For example, when troponin began to be used for diagnosis of myocardial infarction, some sites were ahead in its use while others lagged. Unless a study specified exact diagnosis and onset criteria in selecting the study population, it could be subject to differing eligibility across sites. Likewise, a study with myocardial infarction as the outcome would be subject to the same problem.

Patient/Participant Decisions

Extrinsic factors affecting decisions by patients or providers about whether to participate or continue in a trial may influence the available study population in ways that are difficult to anticipate. For example, preexisting conceptions about the study, media coverage of a particular therapy, and even coverage of clinical research generally (whether positive or negative) may affect willingness to participate in a trial. In addition, positive or negative media coverage of reputational issues may affect willingness to seek treatment at a particular clinic or healthcare system.

Changes in Participating Healthcare Systems

Healthcare System Leadership

Leadership changes in a hospital or healthcare system may occur rapidly and frequently in an era of frequent consolidation and reorganization. Such changes may also affect the level of support available for a study, whether in the start-up, conduct, or follow-up phase, and in some cases can effectively stop an ongoing trial at a given site. Written agreements prior to study start should be a matter of course, specifying the level of participation and guaranteeing that the protocol will be followed for the duration of the study.

Personnel Turnover and Training Issues

Because clinical trials, whether traditional or pragmatic, depend on relatively uniform implementation of a research protocol across all sites, the experience, knowledge, training, and commitment of investigators and site personnel have major implications for the quality of the trial’s implementation and the validity of the data collected. Frequent turnover among investigators and/or support staff can disrupt trial operations, especially if there is a lack of familiarity with the trial or if staff training has been inadequate.

For example, a trial might have the goal of increasing patient satisfaction by implementing a training program, supported by the hospital, for nurses and residents. Changes in leadership of either the hospital or the healthcare system might realign priorities, and the program could be halted or modified in some or all of the sites. Although this issue could affect both intervention and control sites, it could have a differential impact on the study groups. Statistical adjustment for such changes may not be possible. When planning the study, the study team should ensure that a clear memorandum of understanding documents the specific conduct and obligations of all partners.

Case Example: PROVEN Trial

The Pragmatic Trial of Video Education in Nursing Homes (PROVEN) trial, an NIH Collaboratory Demonstration Project, is a cluster randomized study designed to test a video-assisted decision support intervention in advance care planning for patients in nursing homes. Investigators designed a “video status report” that the healthcare system staff integrated into the EHR to document staff offering the intervention to patients. Intervention monitoring reports from these records revealed considerable variation in video offer rates across facilities, indicating that some site staff needed additional training to ensure that patients were offered the intervention appropriately.

Site Withdrawal Due to Burden

If sites, or the healthcare system that is responsible for the overall agreements, is not fully cognizant of the requirements of trial conduct, some of the conditions of participation could turn out to be onerous. A proactive memorandum of understanding that identifies site responsibilities, defines the eligible population, and clearly specifies activities to be performed by the site research team should prevent or at least lower the risk of site dropout. A pilot study or feasibility assessment can help work out the details.

Structural Changes, Such as Reorganization

As healthcare systems struggle to provide high-quality, cost-effective healthcare, changes in structure are inevitable. It is possible that one or more study sites will close or merge, or the healthcare delivery model will be revamped to address newer performance metrics. To quote Yogi Berra, “It’s tough to make predictions, especially about the future.” However, prior to implementing a study, consideration of the possibilities and discussions regarding the likelihood of changes that might affect the trial should occur.

Changes in Clinical Practice, Regulations, and Standards

Changes in practice, standards, and regulations may create challenges for ongoing studies or render a study in planning or development impracticable. Many newer pragmatic trial designs depend on facilitated access to electronic patient data for cohort identification through federated systems and distributed research networks. Changes in regulations and practices affecting access to these data could have significant effects on pragmatic trials.

For example, an NIH Collaboratory Trial that had the goal of reducing the use of pain medication was initiated before the growing awareness of the opioid epidemic. Greater recognition of the problem drove local efforts to address it. Again, anticipation and monitoring of a trend among control sites is important. If the intervention has not been adopted on a widespread basis, it may be possible to investigate differentially evolving trends in outcomes among controls that are adopters.

Case Example: PPACT

The Collaborative Care for Chronic Pain in Primary Care Trial (PPACT), an NIH Collaboratory Trial, was a cluster randomized trial comparing multimodal approaches to pain management to reduce reliance on opioids. The study was initiated before the current recognition of an epidemic of opioid misuse and abuse that has been widely covered in the news media. As a result, healthcare systems have implemented some of the same strategies as PPACT during the trial, and the usual care arm became more like the intervention arm. To the extent that these changes were beneficial, the study might not have been able to test the overall effect of the intervention.

Contamination of the Control Group With Elements of the Intervention

A trial that has the goal of increasing mental health screening may encounter unanticipated contamination of the control group with elements of the intervention. For example, this challenge may arise in a trial that has the goal of increasing mental health screening as more and more primary care providers spontaneously decide to use the Patient Health Questionnaire-9 (PHQ-9) during routine clinic visits. If the intervention is a program that promotes use of the PHQ-9 among intervention clinics and the outcome is the proportion of patients who were screened, this spontaneous use could dilute the effect of the intervention. In planning for such a trial, baseline rates over time, rather than at a single time point, should be calculated and the trial should include monitoring for a time effect. In any case, it is important to recognize and monitor intervention-like activities using the same instruments and procedures and to incorporate these considerations in the design and analysis.

New Healthcare Initiatives That Focus on the Same Problem as the Intervention

This issue is similar to the previous challenge, though it is possible in this case for all control sites to come under the umbrella of the increased efforts. In both cases, programs such as these vary in their approach and intensity. It may be possible to disentangle common elements from the unique aspects of the ongoing trial and to adjust the analysis accordingly, though sample size may become an issue. An attempt might be made in the design stage to craft an intervention that would be unique and unlikely to be implemented outside the trial. The more complicated and intense the intervention, the less likely it is to be duplicated. However, it also becomes less likely to be generalized and sustainable.

Trial- or Site-Imposed Data Differences

Several unforeseen site-specific situations may arise during the conduct of the trial. For example, it might become evident when comparing accrual rates among sites that a different ordering in check boxes for comorbid conditions had been created according to specialty, enabling the most likely to appear first on the checklist. In a pragmatic trial, complete standardization could challenge workflow and efficiency. In this case, stratification by specialty could have been considered.

A situation that likely would have a more pronounced impact would be disappointing accrual rates. In some cases, changes to operational aspects of the protocol are made for the purpose of enhancing enrollment. Such changes implicitly change the definition of the intervention. Again, careful planning, monitoring, and recording of situations such as these are necessary. The statistical analysis cannot be assumed to overcome evolving differences.

Some study modifications can have major effects on both the conduct and results of a trial. When accrual rates are disappointing, studies might alter the design by expanding the eligibility criteria, thereby redefining the target population. Doing so is likely to dilute the intervention effect and have a substantial impact on statistical power and sample size. The statistical analysis must accommodate recognition of these changes and account for them.

The opposite problem occurs when the overall event rate is lower than anticipated. In this case, studies sometimes enrich the population during the trial with patients at higher risk to obtain a higher baseline risk. Again, this kind of change creates a different target population and complicates the interpretation of the results. The statistical analysis may be able to account for these changes, but the trade-off between event rate and accrual rate is always complicated.

Planning for and Responding to Unanticipated Changes

Planning for circumstances that may affect the study arms should begin at the time of initial study design. Although the specific causes of disruption may not be predictable, contingency planning should account for impacts such as those described in this section. Planning for challenges may include efforts to incorporate robustness and flexibility into the study design.

To ensure that unanticipated changes to study populations do not go undetected during the course of a trial, investigators and staff should include a plan for continuous monitoring of study implementation and progress. Depending on the nature of the study, measures chosen for monitoring may be quantitative, qualitative, or both.

Previous Section Next Section

SECTIONS

CHAPTER SECTIONS

sections

Resources

Building Partnerships to Ensure a Successful Trial
Living Textbook chapter describing stakeholder partnerships throughout the ePCT life cycle. Healthcare system leaders can provide valuable advice regarding how to handle unexpected changes during the conduct of ePCTs.

Statistical Analysis Plan Checklist for Addressing COVID-19 Impacts
Tool from the NIH Pragmatic Trials Collaboratory to assist investigators in identifying impacts of the COVID-19 public health emergency on ongoing pragmatic clinical trials

REFERENCES

Coronado GD, Petrik AF, Bartelmann SE, Coyner LA, Coury J. 2015. Health policy to promote colorectal cancer screening: improving access and aligning federal and state incentives. Clin Res (Alex). 29:50-55. doi:10.14524/CR-14-0044. PMID: 27135047.

Version History

April 30, 2024: Made nonsubstantive changes to the text as part of the annual content updated (changes made by D. Seils).

June 23, 2022: Updated the name of the NIH Collaboratory in the contributors list, added an item to the Resources sidebar, and made nonsubstantive changes to the text as part of the annual content update (changes made by D. Seils).

July 2, 2020: Minor corrections to layout and formatting (changes made by D. Seils).

May 27, 2020: Added Heagerty to the contributors list and reordered the sections of this chapter as part the annual content update (changed made by D. Seils).

May 1, 2020: Added a table of contents to the introduction to aid navigation, and made nonsubstantive changes to the Resources sidebar as part of the annual content update (changes made by D. Seils).

February 28, 2020: Added resource sidebar with link to Building Partnership to Ensure and Successful Trial chapter (changes made by K. Staman).

July 19, 2019: Revised the section by adding content and reorganizing the text (changes made by D. Seils).

January 16, 2019: Made nonsubstantive changes to the text and formatting as part of the annual content update (changes made by D. Seils).

Published August 25, 2017

current section :

Unanticipated Changes

Electronic Health Record Data Extraction

CHAPTER SECTIONS

Analysis Plan

Section 6 Electronic Health Record Data Extraction

Contributors

Many pragmatic clinical trials, whether designed as cluster randomized trials or as individually randomized trials, rely on data extraction from the participant’s electronic health record (EHR). Although study data extraction allows pragmatic trials to be performed quickly and at less expense than traditional clinical trials that establish redundant parallel data capture systems, they also introduce methodological and logistical challenges, such as those described in the white paper, "Assessing Data Quality for Healthcare Systems Data Used in Clinical Research."

EHR data extraction also poses challenges for statistical analysis. Data gathered from EHRs (which, by definition, are not purposely designed or optimized to support research activities) may have higher rates of missingness and error than data captured with purpose-built systems and subjected to “cleaning” and validation. Missing data, including that caused by the dropout of whole clusters, pose special issues for pragmatic trials. Preliminary data capture and assessment will provide a guide as to whether the intended study is feasible, given the availability and quality of the data.

Previous Section Next Section

SECTIONS

CHAPTER SECTIONS

sections

Resources

Using Electronic Health Record Data
Living Textbook chapter describing considerations for the use of EHR data in pragmatic trials

What Are the Key Factors in Using EHR Data for Endpoints and Outcomes?
Two-minute training module from the NIH Pragmatic Trials Collaboratory’s video library

What Are the Challenges of Using Data Directly From the EHRs?
Two-minute training module from the NIH Pragmatic Trials Collaboratory’s video library

Key Issues in Extracting Usable Data from Electronic Health Records for Pragmatic Clinical Trials
Guidance document from the Biostatistics and Study Design Core

Version History

April 30, 2024: Made nonsubstantive changes to the text and added items to the Resources sidebar as part of the annual content update (changes made by D. Seils).

June 23, 2022: Updated the name of the NIH Collaboratory in the contributors list and made nonsubstantive changes as part of the annual content update (changes made by D. Seils).

July 2, 2020: Minor corrections to layout and formatting (changes made by D. Seils).

May 1, 2020: Made nonsubstantive changes to the Resources sidebar as part of the annual content update (changes made by D. Seils).

January 16, 2019: Made nonsubstantive changes to the text as part of the annual content update (changes made by D. Seils).

Published August 25, 2017

current section :

Electronic Health Record Data Extraction

Accounting for Residual Confounding in the Analysis

CHAPTER SECTIONS

Analysis Plan

Section 4 Accounting for Residual Confounding in the Analysis

Contributors

Despite incorporating design-based control for confounding—such as stratification, pair matching, or constrained randomization—it is sometimes advisable to also include in the analysis covariates that might still be unbalanced across the arms of the study. Depending on the goals of the study, these covariates might be at the cluster level or even at the individual level. However, depending on the sample size, the number of covariates might be limited.

Key Question: Residual Confouding

When the number of clusters is small, permutation tests might be recommended. Again, the study statistician, in collaboration with the investigators, will determine the appropriate design and analysis methods.

Previous Section Next Section

SECTIONS

CHAPTER SECTIONS

sections

Resources

Pragmatic and Group-Randomized Trials in Public Health and Medicine—Part 3. Analysis Approaches
Online course From the NIH Office of Disease Prevention

Version History

June 23, 2022: Updated the name of the NIH Collaboratory in the contributors list and made nonsubstantive changes to the text as part of the annual content update (changes made by D. Seils).

July 2, 2020: Minor corrections to layout and formatting (changes made by D. Seils).

May 27, 2020: Added Heagerty to the contributors list and reordered the sections of this chapter as part the annual content update (changes made by D. Seils).

May 1, 2020: Added a Resources sidebar with a link to online course material as part of the annual content update (changes made by D. Seils).

January 16, 2019: Added a “key question” image, and made nonsubstantive changes to the text as part of the annual content update (changes made by D. Seils).

Published August 25, 2017

current section :

Accounting for Residual Confounding in the Analysis

Unequal Cluster Sizes

CHAPTER SECTIONS

Analysis Plan

Section 3 Unequal Cluster Sizes

Contributors

One challenge that may arise with cluster randomization is that, although a cluster’s units are typically assumed to be of equivalent size, this may not be true of cluster randomized trials in healthcare settings. Clusters such as physician practices or clinics may be of substantially different sizes, which can affect the statistical power of the study and decisions about sample size (Cook et al 2016). To address these issues, study statisticians need an estimate of the range in potential sample sizes. For example, in a trial that randomly assigns clinics to an intervention that will be applied to patients who are newly diagnosed with diabetes, the statistician will need information about the number of such patients coming into each clinic over the past several months. The statistician will then consider the range in sample sizes when calculating the number of clinics and patients needed for the study.

Previous Section Next Section

SECTIONS

CHAPTER SECTIONS

sections

Resources

Unequal Cluster Sizes in Cluster-Randomized Clinical Trials
A guidance document from the Biostatistics and Study Design Core

REFERENCES

Cook AJ, Delong E, Murray DM, Vollmer WM, Heagerty PJ. 2016. Statistical lessons learned for designing cluster randomized pragmatic clinical trials from the NIH Health Care Systems Collaboratory Biostatistics and Design Core. Clin Trials. 13:504-512. doi:10.1177/1740774516646578. PMID: 27179253.

Version History

June 23, 2022: Updated the name of the NIH Collaboratory in the contributors list and made nonsubstantive changes to the text as part of the annual content update (changes made by D. Seils).

July 2, 2020: Minor corrections to layout and formatting (changes made by D. Seils).

May 27, 2020: Added Heagerty to the contributors list and reordered the sections of this chapter as part the annual content update (changed made by D. Seils).

May 1, 2020: Made nonsubstantive formatting changes to the text as part of the annual content update (changes made by D. Seils).

January 16, 2019: Made nonsubstantive changes to the text as part of the annual content update (changes made by D. Seils).

Published August 25, 2017

current section :

Unequal Cluster Sizes

Intraclass Correlation

CHAPTER SECTIONS

Analysis Plan

Section 2 Intraclass Correlation

Contributors

A primary driver of whether a study should randomize at a cluster level is the intraclass correlation coefficient (ICC). (For other considerations in choosing between cluster and individual randomization, see Choosing Between Cluster and Individual Randomization in the Experimental Designs and Randomization Schemes chapter of the Living Textbook.) The ICC is a measure of how similar the outcomes of individuals within a cluster are likely to be, relative to those of other clusters. For example, an intervention designed to improve medication adherence might be implemented in several communities, where individuals of any single community might belong to the same socioeconomic group and behave similarly in terms of ability to pay for and remember to take their medication. Hence, if the primary outcome of the study is a measure of compliance, there is likely to be substantial intraclass correlation.

The ICC is closely linked to the sample size necessary to conduct the trial with adequate statistical power. The ICC ranges from complete correlation (ICC = 1) to no correlation (ICC = 0). In the extreme case of an ICC of 1, all participants in a cluster are likely to have exactly the same outcome; thus, sampling 1 participant from that cluster is as informative as sampling the whole cluster. In other words, each cluster contributes a single data point to the study, and the effective sample size for the study is the number of clusters. On the other hand, if participants in a cluster behave essentially independently of each other and their outcomes are no more related than if they were from different clusters, then the ICC is 0 and the available sample size for the study is the total number of participants. There is a substantial literature on taking account of the ICC when determining sample size requirements for cluster randomized trials. It is critical to have preliminary data that provide some estimate of the likely ICC.

The level at which to cluster creates a trade-off between potential contamination and bias on the one hand and available sample size on the other. For example, in one NIH Collaboratory Trial, the researchers originally intended to randomize at the provider level. However, because the providers shared clinic staff and facilities, a preliminary evaluation of the ICC demonstrated more than negligible correlation between clinics in the potential outcomes. So the researchers needed to randomize at the clinic level rather than the provider level and to recruit additional clinics to meet their expectations for statistical power. In contrast, another NIH Collaboratory Trial study team intended to randomize at the clinic level but found negligible correlation in outcomes between providers within clinics and were able to randomize providers.

Much depends on the type of intervention, and it is always prudent to obtain a preliminary estimate of the ICC before planning the study.

Accounting for the ICC in the Analysis

When individual outcomes are recorded within a cluster, it is important to understand how the outcomes relate to the primary hypothesis of the study and what exactly is to be tested and/or estimated. If conclusions are to be made at a cluster level, the individual outcomes might be accumulated into a summary measure for each cluster and the analysis performed at the cluster level. This approach eliminates the need to address the ICC directly, but it assumes that either there is little variation in cluster sizes or the size of the cluster is relatively unimportant. The analysis will place equal weight on each cluster, regardless of its size, unless a weighting mechanism is used.

For analyses performed at the individual level, random-effects models or generalized estimating equations are typically used. Again, the interpretation of the results hinges on the type of analysis, and it is important for the investigators to discuss their hypotheses clearly with the study statisticians.

Previous Section Next Section

SECTIONS

CHAPTER SECTIONS

sections

Resources

What Is the Intraclass Correlation Coefficient?
Two-minute training module from the NIH Pragmatic Trials Collaboratory’s video library

The Intraclass Correlation Coefficient (ICC)
Guidance document from the Biostatistics and Study Design Core

Intraclass Correlation Coefficient Cheat Sheet
Introductory description of the ICC

The Intraclass Correlation Coefficient as a Pie Eating Contest
Video tutorial by NIH Collaboratory investigator Dr. Greg Simon

Intraclass Correlation Coefficients for Cluster Randomized Trials With Pain Outcomes
Working document from the NIH-DOD-VA Pain Management Collaboratory Biostatistics/Design Work Group

The Perils and Pitfalls of Complex Clustering in Pragmatic Trials
NIH Pragmatic Trials Collaboratory PCT Grand Rounds; November 3, 2023

Version History

April 30, 2024: Made nonsubstantive changes to the text and added items to the Resources sidebar as part of the annual content update (changes made by D. Seils).

July 2, 2020: Minor corrections to layout and formatting (changes made by D. Seils).

May 27, 2020: Added Heagerty to the contributors list and reordered the sections of this chapter as part the annual content update (changed made by D. Seils).

May 1, 2020: Made nonsubstantive changes to the Resources sidebar as part of the annual content update (changes made by D. Seils).

January 16, 2019: Embedded a video on intraclass correlation, added a resource to the Resource box, and made nonsubstantive changes to the text as part of the annual content update (changes made by D. Seils).

Published August 25, 2017

current section :

Intraclass Correlation

Additional Resources – ARCHIVED

CHAPTER SECTIONS

Experimental Designs and Randomization Schemes

Section 9 Additional Resources – ARCHIVED

Contributors

This biostatistical research tool set includes a series of guidance documents developed by the Collaboratory Biostatistics and Study Design Core. These documents, which focus on detailed aspects of statistical design for conducting pragmatic clinical trials, provide a synthesis of current developments, discuss possible future directions, and, where appropriate, make recommendations for application to pragmatic clinical research.

In 2019, NIH Health Care Systems Research Collaboratory held a comprehensive workshop to explore and discuss statistical issues encountered with embedded pragmatic clinical trials (ePCTs). The Workshop Summary describes panel discussions with the principal investigators and statisticians of NIH Collaboratory Trials and the challenges and solutions encountered during the design and analysis of their trials.

The 4 panel discussions covered the following topics:

Measurement and Data: Outcomes, Exposures, and Subgroups Based on EHR Data
To Cluster or Not to Cluster?
Choosing a Parallel Group or Stepped-Wedge Design
Unique Complications

This Workshop Summary also provides lessons learned and recommends tools to help others design and analyze future ePCTs. For more on the design and analysis of pragmatic clinical trials, see the tools provided by the Biostatistics and Study Design Core.

The Biostatistics and Study Design Core wishes to thank David M. Murray, PhD, Director, Office of Disease Prevention, National Institutes of Health, for his invaluable input into the creation of the research tools.

Previous Section

SECTIONS

CHAPTER SECTIONS

sections

Version History

July 2, 2020: Minor corrections to layout and formatting (changes made by D. Seils).

May 27, 2020: Added Heagerty to the contributors list and reordered the sections of this chapter as part the annual content update (changed made by D. Seils).

April 21, 2020: Added information about the NIH Collaboratory Workshop on the Design and Analysis of Embedded Pragmatic Trials (changes made by K. Staman).

Published August 25, 2017.

current section :

Additional Resources – ARCHIVED

Randomization Methods – ARCHIVED

CHAPTER SECTIONS

Experimental Designs and Randomization Schemes

Section 4 Randomization Methods – ARCHIVED

Contributors

As with individually randomized trials, a number of considerations need to be addressed up front for CRTs to avoid downstream problems. In particular, potential confounding is always an issue. For example, if elderly patients are more likely than younger patients to acquire nosocomial infections, it would be important to ensure that one of the arms of the trial is not more likely to consist of elderly patients. In this example, if the clusters are hospital wards, there should be some assurance of balance in the average ages of the wards assigned to one arm compared to the other. Sometimes there are several potential confounders that might play a role.

Stepped-Wedge Designs

When a PCT employs cluster randomization, the simplest approach is parallel randomization, in which clusters are randomly assigned to the intervention or control condition throughout the trial with no crossover. In some studies, preparing all of the intervention clusters to start the intervention at the same time may not be feasible. The stepped-wedge CRT design overcomes this problem by gradually introducing the intervention to groups of clusters over time. Clusters are divided into several groups, usually 4 to 6, and all clusters start the trial in the control condition. Groups of clusters cross over to the intervention condition in random order and on a staggered schedule, and all groups receive the intervention before the end of the trial (see figure below). The stepped-wedge design also has the benefit of allowing researchers to record and incorporate into the analyses changes to the hospital or health system that happen over time and have the potential to affect the study (Cook et al 2016).

Stepped-Wedge Cluster Randomization Example

Because all sites eventually will receive the intervention, a stepped-wedge trial may be more appealing to the broader community and thus lead to more robust study participation, especially in cases where the intervention seems particularly promising (Hussey and Hughes 2007).

Case Example: The LIRE Study

The Pragmatic Trial of Lumbar Image Reporting with Epidemiology (LIRE), an NIH Collaboratory Trial, is testing an intervention that inserts epidemiologic benchmarks into reports from lumbar spine imaging. LIRE will seek to determine whether this simple informational intervention can reduce subsequent testing and treatments that may not provide any benefit to patients. LIRE is an example of a stepped-wedge CRT (Jarvik JG et al 2015). Clinics in 4 large healthcare systems are randomized to initiate the intervention as part of 1 of 5 "waves" corresponding to prespecified calendar dates.

Pair Matching and Stratification With Cluster Designs

Two popular mechanisms for achieving balance are pair matching and stratification. With pair matching, clusters are paired in terms of their potential confounders and then within each pair, one cluster is randomized to receive one of the arms and the other cluster receives the opposite arm. For example, considering age and sex as potential confounders, clusters would be matched into pairs such that the average age and the percent female are approximately equal. Likewise, the sizes of the 2 clusters should be similar. Stratification is a generalization of pair matching in that strata are formed based on the potential confounders; within each stratum, a randomization scheme that ensures balance is developed. For example, if there are 11 clusters in one stratum, the randomization would assign 5 clusters to one arm and 6 to the other. However, when there are several confounders, it can be difficult to use stratification or pair matching.

Constrained Randomization

Another method that is increasingly being studied and implemented for CRTs is constrained randomization (Li et al 2016). Exploiting the fact that all of the clusters are identified before randomization, they can each be characterized in terms of the levels of several potential confounders. For any possible randomization of this set of clusters, a balance metric (several exist) is applied to “measure” the amount of imbalance that would exist if that particular randomization were applied. It is possible to generate a large number of potential randomization schemes; in fact, with very few clusters, every possible randomization scheme can be tabulated in this way, along with their respective balance scores. By some predefined criterion, such as a certain percentage of all possible randomizations, a set of clusters with the least amount of imbalance is chosen as the “randomization space.” From this “randomization space,” a single randomization scheme is selected. There are many statistical issues that are still being explored with respect to this strategy.

For additional information about considerations affecting study design decisions, see also Designing With Implementation and Dissemination in Mind.

Previous Section Next Section

SECTIONS

CHAPTER SECTIONS

sections

Resources

Pragmatic and Group-Randomized Trials in Public Health and Medicine—Part 3. Analysis Approaches
Online course From the NIH Office of Disease Prevention

Pair-Matching vs Stratification in Cluster-Randomized Trials
Guidance document from the Biostatistics and Study Design Core

Advanced Methods for Primary Care Research: The Stepped Wedge Design
Presentation from the Agency for Healthcare Research and Quality providing a technical overview of applications of the stepped-wedge design in clinical research

REFERENCES

Cook AJ, Delong E, Murray DM, Vollmer WM, Heagerty PJ. 2016. Statistical lessons learned for designing cluster randomized pragmatic clinical trials from the NIH Health Care Systems Collaboratory Biostatistics and Design Core. Clin Trials. 13:504-512. doi:10.1177/1740774516646578. PMID: 27179253.

Li F, Lokhnygina Y, Murray DM, Heagerty PJ, DeLong ER. 2016. An evaluation of constrained randomization for the design and analysis of group-randomized trials. Stat Med. 35:1565-1579. doi:10.1002/sim.6813. PMID: 26598212.

Jarvik JG, Comstock BA, James KT, et al. 2015. Lumbar Imaging With Reporting Of Epidemiology (LIRE)--Protocol for a pragmatic cluster randomized trial. Contemp Clin Trials. 45(Pt B):157-163. doi:10.1016/j.cct.2015.10.003. PMID: 26493088.

Hussey MA, Hughes JP. Design and analysis of stepped wedge cluster randomized trials. 2007. Contemp Clin Trials. 28:182-191. doi:10.1016/j.cct.2006.05.007. PMID: 16829207.

Version History

July 7, 2020: Moved down the LIRE case example (changes made by D. Seils).

July 2, 2020: Minor corrections to layout and formatting (changes made by D. Seils).

May 27, 2020: Revised and moved up the Stepped-Wedge Designs subsection, added the “Part 3. Analysis Approaches” online tutorial to the Resources sidebar, added Heagerty to the contributors list, and reordered the sections of this chapter as part the annual content update (changes made by D. Seils).

January 16, 2019: Updated the introduction and the description of stepped-wedge designs, embedded a video on randomization methods, and made nonsubstantive changes to the text as part of the annual content update (changes made by D. Seils).

Published August 25, 2017

current section :

Randomization Methods – ARCHIVED

Concealment and Blinding – ARCHIVED

CHAPTER SECTIONS

Experimental Designs and Randomization Schemes

Section 7 Concealment and Blinding – ARCHIVED

Contributors

Whenever feasible, randomized trials should incorporate mechanisms for safeguarding the identity of the treatment assignment from investigators, study staff, and participants. In traditional individually randomized trials, the assignment is concealed from the investigators before randomization to protect against treatment selection bias. For example, in an individually randomized trial, a provider may be biased to believe that one of the “treatments” would not be good for elderly patients. If the provider is able to predict the next treatment assignment, the provider may hold the elderly patient back from being randomized or might only “randomize” such patients when the other treatment is likely.

After randomization, blinding (also called "masking") is used to guard against a placebo effect and/or biased outcome assessment. It is most important for the primary outcome to be measured and recorded objectively, without knowledge of the actual treatment assignment. Trials may be single-blind, in which the participant does not know whether they are receiving the experimental therapy or the comparator; double-blind, in which the physician and staff also do not have access to this information but statistical analysis personnel do; or triple-blind, in which statisticians are also unaware of which participants have been assigned to which treatments.

With CRTs, concealment is not usually an issue because all of the clusters are identified ahead of time and are randomized at the same time. Hence, unlike individually randomized trials in which participants enter the study over time, there is no opportunity to “predict” the assignment and alter behavior accordingly. However, it is important to obtain written assurances before randomization that each cluster will comply with the assigned strategy for the duration of the study.

Blinding, on the other hand, is usually either not possible or not practical in CRTs. Most interventions that call for cluster randomization need to be disclosed to those who are implementing them. However, it is important to maintain as much objectivity as possible in recording the outcome assessments. For example, in the previously mentioned TiME trial, objective measures such as hospitalizations and mortality should not be collected differentially for the 2 arms of the trial. When evaluating quality of life in such a trial, care needs to be taken so that the measures are not elicited by referring to the conditions of the trial, such as starting a questionnaire with "Did you find that your dialysis time influenced how you feel on a daily basis?" Likewise, for a handwashing campaign, the assessment of nosocomial infection should be conducted according to objective criteria applied in the same way in both arms.

Previous Section Next Section

SECTIONS

CHAPTER SECTIONS

sections

Version History

July 2, 2020: Minor corrections to layout and formatting (changes made by D. Seils).

May 27, 2020: Added Heagerty to the contributors list and reordered the sections of this chapter as part the annual content update (changed made by D. Seils).

January 16, 2019: Made nonsubstantive changes to the text as part of the annual content update (changes made by D. Seils).

Published August 25, 2017

current section :

Concealment and Blinding – ARCHIVED

Choosing Between Cluster and Individual Randomization – ARCHIVED

CHAPTER SECTIONS

Experimental Designs and Randomization Schemes

Section 5 Choosing Between Cluster and Individual Randomization – ARCHIVED

Contributors

Although CRT designs confer certain advantages in conducting PCTs, they are also characterized by significant theoretical limitations and implementation challenges (Torgerson et al 2001), and careful consideration is needed before settling on a particular approach to randomization. The following assessment questions, adapted from Designing Multi-Center Cluster-Randomized Trials: An Introductory Toolkit (2014), may help clarify whether a CRT represents an appropriate design choice for answering a particular research question:

Is the phenomenon of interest something that takes place primarily at the level of the individual patient or study participant (for example, response to an experimental drug or comparator)? If so, a traditional RCT design may be most appropriate. However, if the phenomenon of interest affects individual patients but is primarily taking place at a different level (for example, whether implementation of new physician treatment guidelines is yielding better patient outcomes), a CRT design may be more appropriate.
Is the proposed intervention delivered at the level of a group or organization rather than an individual study participant? For example, a study that investigated whether adoption of a new treatment guideline affected the efficiency of service provision across hospitals in a health system would lend itself to a cluster design.
If individual participants were randomized, would it be difficult for physicians or other clinical staff to modify their approaches or behaviors in ways that avoid contamination? Similarly, is it likely that participants or study staff might have occasion or opportunity to discuss details of the study (“compare notes”) among themselves? If so, a cluster randomized approach may be preferable for ensuring trial validity.

Finally, it is important to consider that any cluster randomized design will introduce an important statistical effect known as clustering. When several participants (a cluster) are subjected to similar circumstances that may differ from those of other clusters, such as patients in the same ward being treated by the same providers, their outcomes can be correlated. (See also "Intraclass Correlation" under Analysis Plan). This important consideration should be addressed both when randomizing and when calculating the required sample size for a given study.

For additional information about design considerations, see Designing With Implementation and Dissemination in Mind.

Previous Section Next Section

SECTIONS

CHAPTER SECTIONS

sections

Resources

Pragmatic and Group-Randomized Trials in Public Health and Medicine—Part 1. Introduction and Overview
Online course From the NIH Office of Disease Prevention

REFERENCES

Torgerson DJ. 2001. Contamination in trials: is cluster randomisation the answer? BMJ. 322:355-357. doi:10.1136/bmj.322.7282.355. PMID: 11159665.

Version History

July 2, 2020: Minor corrections to layout and formatting (changes made by D. Seils).

May 27, 2020: Added Heagerty to the contributors list and reordered the sections of this chapter as part the annual content update (changed made by D. Seils).

May 5, 2020: Added the Resources sidebar as part of the annual content update (changes made by D. Seils).

January 16, 2019: Made nonsubstantive changes to the text as part of the annual content update (changes made by D. Seils).

Published August 25, 2017

current section :

Choosing Between Cluster and Individual Randomization – ARCHIVED

Cluster Randomized Trials – ARCHIVED

CHAPTER SECTIONS

Experimental Designs and Randomization Schemes

Section 3 Cluster Randomized Trials – ARCHIVED

Contributors

Cluster randomized trials (CRTs) differ from individually randomized RCTs in that the unit of randomization is something other than the individual participant or patient. CRTs are in common use in areas such as education and public health research; they are particularly well suited to testing differences in a method or approach to patient care (as opposed to evaluating the physiological effects of a specific intervention).

Watch the video module: Understanding Clustering in Cluster Randomized Trials

Why Choose Cluster Randomization?

There are several reasons why CRT designs might be preferred to or more suitable than a traditional RCT. First, a CRT might be preferred when the target of the intervention is a collective or system rather than a particular person, such as a patient. For example, while a traditional RCT may be better suited to determining whether a novel therapy works in patients with a given disease or condition, a CRT is better able to evaluate whether a new standard of care, guideline recommendation, or other practice-wide, hospital-wide, or system-wide change is affecting patient outcomes. Second, a CRT might be preferred when there is a significant potential for contamination in the study. Contamination occurs when aspects of an intervention are adopted by members of the group that was randomized to not receive that intervention. (See also "What Is Contamination, and Why Does it Matter?" immediately below).

There are also compelling practical reasons for randomizing clusters rather than individuals (Cook et al 2016). For example, in a trial comparing 12-hour nursing shifts to 8-hour shifts, implementing these protocols on a patient-specific level would be nearly impossible. In this case, randomizing wards or floors would be much more practical and would also accommodate the need to avoid contamination.

What Is Contamination, and Why Does it Matter?

The most compelling reason to randomize at the cluster level rather than at the individual level is the potential for contamination, whereby participants within a cluster are likely to be treated similarly and hence exhibit similar outcomes.

When contamination occurs during a clinical trial, it will dilute the observed differences between comparators and can affect the reliability and validity of the study.

Case Examples: Contamination

Example 1: Participants who share the same provider in a trial comparing different weight-loss strategies may meet each other in the waiting room and communicate about their respective strategies, or the provider might not be able to adapt to coaching differently depending on the randomization. Some participants in each group might even adopt elements of both strategies, and neither group would demonstrate the impact of its intended strategy. Randomization at the provider level, with each provider coaching only one of the strategies, would reduce the risk of contamination.
Example 2: A trial evaluating a campaign designed to reduce nosocomial infections by encouraging better staff handwashing practices might include posters in each of the rooms. Staff generally cover several rooms on a floor and would be exposed to the posters, which would likely change their behavior if the posters were actually effective. Not only would it be infeasible to randomize at the provider or patient level, doing so would minimize the difference between groups due to the contamination. The campaign might then be declared unsuccessful despite actually having had a positive effect. The solution would be to randomize different areas of the hospital (taking care to consider potential confounding as described in the coming sections) with only half of the areas receiving the posters.

Although avoidance of contamination is one of the most important reasons for using CRT designs, pragmatic concerns can dominate the need for cluster randomization when it is practically impossible to randomize at an individual level.

Previous Section Next Section

SECTIONS

CHAPTER SECTIONS

sections

Resources

A Brief History of the Cluster Randomized Trial Design
Historical overview of the development and application of CRTs in research, with key references

Designing Multi-center Cluster Randomized Trials: An Introductory Toolkit
Overview of considerations for designing and implementing CRTs in healthcare systems

Research Methods Resources
NIH resources for investigators considering cluster randomized designs

REFERENCES

Version History

January 22, 2021: Added embedded video (change made by G. Uhlenbrauck).

July 2, 2020: Minor corrections to layout and formatting (changes made by D. Seils).

May 27, 2020: Added Heagerty to the contributors list and reordered the sections of this chapter as part the annual content update (changed made by D. Seils).

January 16, 2019: Added a resource to the Resources box and made nonsubstantive changes to the text as part of the annual content update (changes made by D. Seils).

Published August 25, 2017