Experimental Designs and Randomization Schemes
Designing to Avoid Identification Bias
For the NIH Health Care Systems Collaboratory Biostatistics and Study Design Core
Damon M. Seils, MA
Many PCTs rely on data from electronic health records, including screening measures and diagnosis codes collected as part of routine clinical care, to identify the study population of interest. Cluster randomized trials that use information from electronic health records to determine study population eligibility are prone to selection bias if the study interventions influence who undergoes screening or receives a diagnosis in clinical care. For example, if eligibility is determined by who receives a diagnosis during clinical care, patients identified as eligible in the intervention group may differ from patients identified in the usual care group, possibly with respect to important predictors of the study outcome, such as symptom severity.
Interventions that have the potential for selection bias can influence screening by design (ie, interventions specifically intended to increase screening or diagnosis rates) or as a byproduct (eg, an intervention to train physicians about how to treat a condition, thereby increasing awareness and yielding higher screening or diagnosis rates). This type of selection bias, which has been referred to as identification bias (Eldridge et al, 2009; Eldridge et al, 2016), can be attributed to selecting the analysis sample after randomization of the clusters (see Figure). This phenomenon is analogous to selection bias that can occur in standard randomized clinical trials when the analysis is conditioned on a variable measured after randomization that could be influenced by the intervention assignment, such as in a complete case analysis (Mansournia et al, 2017; Hernán et al, 2004).
In this section, we provide 3 examples of cluster randomized PCTs in which identification bias may be a concern. We describe possible approaches to addressing identification bias in these trials and outline considerations for the design and analysis of PCTs when there is the potential for differential selection of study participants depending on the intervention assignment.
Figure. Schematic illustrating situations in which an analysis may be affected by identification bias in (A) a parallel-group design or (B) a stepped-wedge design. Gray areas indicate assignment to the intervention group; white areas correspond to the control group.
* Stepped-wedge trials randomly assign clusters to the sequence of the intervention that will be given.
PROUD Cluster Randomized Trial
The Primary Care Opioid Use Disorders Treatment (PROUD) trial (NCT03407638) is a pragmatic, cluster randomized, parallel-group trial to evaluate a program for increasing medication-based treatment of patients with opioid use disorders in 12 clinics within 6 healthcare systems. A main objective of the study is to evaluate whether the intervention reduces the use of acute care services among patients with opioid use disorders.
The intervention in the PROUD trial is expected to increase diagnoses of opioid use disorders among patients previously seen in the clinics, and to attract new patients not previously seen in the clinics or the healthcare system overall. To capture all patients who may be affected by the intervention, the study team considered including all patients with a diagnosis of opioid use disorders in the primary analysis, including those diagnosed after randomization. However, because patients diagnosed after their clinic has been assigned to the intervention group may differ from patients diagnosed after their clinic has been assigned to the control group (ie, they may be either sicker or healthier), an analysis that includes all patients could lead to bias in estimates of treatment effect. On the other hand, because opioid use disorders are underdiagnosed—meaning only a small proportion of persons affected by opioid use disorders receive a diagnosis documented in the electronic health record—an analysis restricted to patients with a diagnosis before randomization, while avoiding the potential for identification bias, may not reflect the broader population of patients with opioid use disorders who could be affected by the intervention.
To address these trade-offs, the primary analysis in the PROUD trial will include patients with a diagnosis of opioid use disorders before randomization to avoid the potential for selection bias, and secondary analyses will consider patients diagnosed after randomization. (Power calculations for the study indicated there would be sufficient statistical power in the primary analysis of patients diagnosed before randomization, even though opioid use disorders are underdiagnosed.) To address the potential for identification bias, the secondary analyses will adjust for measured variables associated with the outcome and with differences across study arms in who is newly diagnosed after randomization. A sensitivity analysis will explore the potential for selection bias due to unmeasured factors.
SPARC Stepped-Wedge Trial
The Sustaining Patient-Centered Alcohol-Related Care (SPARC) trial (NCT02675777) is a pragmatic, stepped-wedge trial to evaluate a program integrating alcohol-related care into primary care compared with usual primary care in 22 primary care clinics at Kaiser Permanente Washington. One of the main study outcomes is treatment for alcohol use disorders documented in the electronic health record.
Before the trial began, screening rates for unhealthy alcohol use were lower than 20%. Because a key component of the SPARC intervention is to screen all patients in primary care, screening rates are anticipated to greatly increase in the intervention arm, with a target of 80%. As a result of the increased screening, diagnosis rates for alcohol use disorders are also expected to increase. Expansion to near universal screening of patients in primary care will likely lead to differences in characteristics of patients diagnosed with alcohol use disorders in the intervention period compared to the usual care period. Consequently, an analysis of the entire study population of patients with a diagnosis (including both pre- and postrandomization diagnoses) to evaluate changes in treatment rates could be affected by selection bias. On the other hand, an analysis restricted to patients diagnosed before randomization would reflect a small, highly selected population that may not be generalizable to the broader population of persons with alcohol use disorders.
Another risk of restricting the analysis to patients identified before randomization in a stepped-wedge design (Figure) is the potential loss of statistical power resulting from some patients leaving their assigned clinic before the intervention period, which becomes increasingly likely at clinics that implement the intervention later in the study. An alternative approach would be to define a separate "baseline" period for each cluster (eg, a 1-year period before crossing over from control to intervention) and to identify patients for inclusion in analyses using data from this baseline period. Because baseline periods for some clusters would occur after the crossover time for other clusters, such an approach may still be affected by identification bias (eg, if patients from a clinic with a later crossover time receive care in a clinic that has already crossed over to the intervention period).
To address these competing concerns, the planned SPARC analysis will include the entire population of patients who visit the clinic (before and after randomization) as the denominator for the main study outcomes, because the intervention is not expected to alter this population. Although the effect of the SPARC program on changing rates of treatment for alcohol use disorders is diluted in the entire population (since most patients do not have alcohol use disorders), power calculations indicated sufficient statistical power to detect hypothesized changes in the main study outcomes.
STOP CRC Cluster Randomized Trial
The Strategies and Opportunities to Stop Colorectal Cancer in Priority Populations (STOP CRC) trial (NCT01742065), one of the NIH Collaboratory Demonstration Projects, was a cluster randomized, parallel-group trial conducted in 26 Federally Qualified Health Center clinics. The objective of the trial was to test the effectiveness of a health system–level intervention designed to improve colorectal cancer screening rates. The intervention involved giving clinic staff access to an electronic health record registry tool to facilitate sending fecal immunochemical test (FIT) kits to patients who were not up to date with US Preventive Services Task Force guidelines for colorectal cancer screening.
The analytic sample in the STOP CRC trial included all patients who were not up to date with colorectal cancer screening guidelines at the time of randomization, as well as those who became out of date at any point during the subsequent 12 months. It is the inclusion of this latter group that allows for the possibility of selection bias. For example, suppose some patients are less likely to complete the colorectal cancer screening than others (eg, some patients are "more predisposed to health screening" and some are "less predisposed to health screening"). If clinic staff become more proactive about colorectal cancer screening and begin distributing FIT kits to patients before they become overdue for screening, and a higher proportion of the "more predisposed" patients complete the screening, then patients who become out of date with screening (and therefore eligible to be included in the primary analysis) would be more likely to be those patients who are less predisposed to health screening. In other words, the pool of patients most likely to return a FIT kit could become depleted in the intervention group before they become eligible for the analysis. This could lead to bias in the estimated intervention effect, in this case making the intervention effect look artificially low.
The study's analysis plan addressed this in 2 ways. First, the primary analysis adjusted for several patient-level factors that could be associated with the outcome or with completing screening before becoming overdue. This helps to address potential selection bias due to measured but not unmeasured factors. Second, the investigators took advantage of the fact that the intervention was rolled out to control clinics in the second year after randomization to conduct a secondary analysis that used a stepped-wedge analysis approach. This analysis evaluated changes to the clinics’ National Quality Forum colorectal cancer screening scores for the year before randomization and for the 2 years after randomization. These scores used for their denominators all patients who qualify for colorectal cancer screening in each year, regardless of whether they were out of compliance, and hence should avoid this source of potential selection bias. A downside to this analysis is that it could dilute the intervention effect by including fully compliant patients who never qualified for the intervention.
Recommendations for Addressing Identification Bias
- Consider defining the study population for the primary analysis based on baseline (ie, prerandomization) data. This could include using the entire clinic population as the denominator, or a relevant subgroup, such as patients with a prior diagnosis for the condition under study. The choice of subgroup will depend on statistical properties (eg, power) and scientific considerations (eg, interpretability of effect estimates).
- If defining the study population to be included in analyses using postrandomization data, consider whether there are choices for the study population that may be less likely to be affected by treatment assignment. Some possibilities include using the entire clinic population, or, for a trial that is not expected to change the population screened or the population screening positive, an analysis of screened or screened-positive patients might be a good choice, respectively.
- If the study population is defined based on postrandomization data, assess whether it is scientifically plausible that identification of the population may be affected by the treatment assignment. For example, if the screen used in both arms is the same and can also be used to assess symptom severity, the severity of symptoms in the eligible population in each of the arms could be compared. If the intervention plausibly changes who is identified as eligible, then using this postrandomization study population in secondary (rather than primary) analyses may be preferred, as the analysis may be subject to selection bias. Adjusting for baseline characteristics among patients identified for inclusion in the study that differ across treatment arms can control for selection bias due to measured factors, but treatment effect estimates may still be biased due to unmeasured factors. Additional sensitivity analysis methods may be applied to investigate how treatment effect estimates vary across plausible values of the magnitude (and direction) of the potential selection bias (National Research Council 2011).
- Depending on the scientific question and ethical considerations, alternatives to using usual care as a comparator that will induce a similar mechanism for identifying the study population across treatment arms may be explored (eg, comparing screening plus treatment A to screening plus treatment B).
- Following recommendations for reporting of cluster randomized trials, authors should describe the timing of when the study population was identified relative to randomization and report the proportion of patients identified for inclusion in analyses across randomized treatment groups to elucidate the potential for identification bias (Eldridge et al, 2009; Eldridge et al, 2016).
The issue of selection bias due to conditioning the analysis on a population that is identified based on postrandomization data has been discussed previously in the context of “improper” subgroup analysis in clinical trials. Prior literature on improper subgroups has focused on developing guidelines for analyzing improper subgroups (Desai et al, 2014); on discussing specific analytic tools for conducting such analyses, such as per-protocol analyses (Little et al, 2009) or outcome-based subgroup analyses (Hirji and Fagerland, 2009); and on comparing results from a single trial across different analytic methods (Pieper et al, 2004). Literature on addressing identification bias in cluster randomized trials has focused primarily on settings where patients are formally recruited for inclusion in the study (Eldridge et al, 2009). However, there has been little guidance on how to optimally design a pragmatic trial in settings where the intervention may affect identification of the primary study population of interest, particularly when the condition of interest is not well recognized in the population at baseline and recognition is increased (perhaps substantially) as a result of treatment assignment.
The 3 examples described above illustrate the potential ways studies may be affected by identification bias, as well as approaches for handling this source of bias. These approaches include defining the primary analysis based on the prerandomization sample to avoid identification bias while considering secondary analyses that incorporate postrandomization data (as in the PROUD trial); defining the study population using postrandomization data that are not expected to be altered by the intervention (as in the SPARC trial); and conducting sensitivity analyses to address the potential for bias (as in the STOP CRC trial). The choice of study population for the primary analysis in any particular study will depend on a variety of considerations, including the study design (eg, parallel-group versus stepped-wedge), scientific knowledge informing potential mechanisms by which identification of participants may be differentially affected across treatment groups, and other trade-offs, such as the ability to capture the complete effect of interventions for underdiagnosed conditions.
Desai M, Pieper KS, Mahaffey K. 2014. Challenges and solutions to pre- and post-randomization subgroup analyses. Curr Cardiol Rep. 16:531. doi: 10.1007/s11886-014-0531-2. PMID: 25135344.
Eldridge S, Kerry S, Torgerson DJ. 2009. Bias in identifying and recruiting participants in cluster randomised trials: What can be done? BMJ. 339:b4006. doi: 10.1136/bmj.b4006. PMID: 19819928.
Eldridge S, Campbell M, Campbell M, Dahota A, Giraudeau B, Higgins JP, Reeves B, Siegfried N. 2016. Revised Cochrane risk of bias tool for randomized trials (RoB 2.0): additional considerations for cluster-randomized trials. https://researchportal.port.ac.uk/portal/en/publications/revised-cochrane-risk-of-bias-tool-for-randomized-trials-rob-20(09bf163b-13fb-4776-833f-ab39984b4429).html
Hernán MA, Hernández-Díaz S, Robins JM. 2004. A structural approach to selection bias. Epidemiology. 15:615-625. PMID: 15308962.
Hirji KF, Fagerland MW. 2009. Outcome based subgroup analysis: a neglected concern. Trials. 10:33. doi: 10.1186/1745-6215-10-33. PMID: 19454041.
Little RJ, Long Q, Lin X. 2009. A comparison of methods for estimating the causal effect of a treatment in randomized clinical trials subject to noncompliance. Biometrics. 65:640-649. doi: 10.1111/j.1541-0420.2008.01066.x. PMID: 18510650.
Mansournia MA, Higgins JP, Sterne JA, Hernán MA. 2017. Biases in randomized trials: a conversation between trialists and epidemiologists. Epidemiology. 28:54-59. doi: 10.1097/EDE.0000000000000564. PMID: 27748683.
National Research Council (US) Panel on Handling Missing Data in Clinical Trials. 2010. Principles and methods of sensitivity analyses. In: National Research Council (US) Panel on Handling Missing Data in Clinical Trials. The Prevention and Treatment of Missing Data in Clinical Trials. Washington, DC: National Academies Press; 83-106.
Pieper KS, Tsiatis AA, Davidian M, et al. 2004. Differential treatment benefit of platelet glycoprotein IIb/IIIa inhibition with percutaneous coronary intervention versus medical therapy for acute coronary syndromes: exploration of methods. Circulation. 109:641-646. doi: 10.1161/01.CIR.0000112570.97220.89. PMID: 14769687.