Case Study: STOP CRC Trial

Analysis Plan

Section 9

Case Study: STOP CRC Trial


William M. Vollmer, PhD

Gloria D. Coronado, PhD

For the NIH Health Care Systems Collaboratory Biostatistics and Study Design Core


Contributing Editor

Damon M. Seils, MA

This case study explores some of the challenges in PCT design and analysis discussed in the previous sections. The case study uses the Strategies and Opportunities to Stop Colorectal Cancer in Priority Populations (STOP CRC) trial, one of the NIH Collaboratory Demonstration Projects, to illustrate how the study team dealt with pragmatic issues during the planning and conduct of the trial. In particular, the study team modified the planned analysis to better suit the final study design.

Overview of the STOP CRC Trial

Study Design

The STOP CRC trial was a cluster randomized pragmatic clinical trial embedded in 26 Federally Qualified Health Center clinics. The objective of the trial was to test the effectiveness of a health system–level intervention designed to improve colorectal cancer screening rates. The intervention involved access to an electronic health record registry tool and training to facilitate sending fecal immunoglobulin test (FIT) kits to patients who were not up to date with US Preventive Services Task Force guidelines for colorectal cancer screening.

The participating clinics, each housed administratively in 1 of 8 health centers, served as the unit of randomization. The study team originally planned to use constrained randomization to ensure balance across potentially confounding covariates. However, on the basis of early simulation analyses during the project's developmental phase, the study team opted to use conventional stratified randomization whereby the health center was the stratification variable and assignments were blocked within health center to assure maximum possible balance within and across health centers.

Study Intervention

The study intervention consisted of 2 parts. First, the study team used a tool embedded in the electronic medical record to create a registry that was updated daily. The registry displayed lists of patients who were due for each component of the intervention: a mailed introduction letter, a mailed FIT kit, and a mailed reminder letter. For example, an initial list showed all patients in the clinic who, at any given time, were not up to date with colorectal cancer screening and had not recently been ordered a FIT kit. Clinic staff used the registry to trigger mailings of FIT kits to eligible patients. The registry was the core of the intervention, though individual clinics were allowed flexibility in the frequency with which they used the registry to deliver mailings. The registry tool was made available to clinics at the time of randomization (February 2014), though an upgrade to the underlying electronic health record platform delayed implementation until June 2014. Control clinics gained access to the registry in August 2015.

Patient Accrual and Outcome Measurement

For both the intervention clinics and the control clinics, the study team identified all patients who entered the registry during the 12 months after randomization. The date on which patients first entered the registry was the start of follow-up for outcome assessment. The intended primary outcome was the return of a completed FIT kit within 12 months after the patient's start date. The follow-up period was redefined as the shorter of 12 months or the time until August 2015 because, for practical reasons, the study team determined that control clinics should have access to the intervention starting on that date. One reason for the long follow-up period was that the intervention clinics did not send the FIT kits as soon as patients became eligible. Rather, they used a variety of processes that allowed them to spread out the work over the course of a year to accommodate local staffing needs.

Analysis Plan

The original analysis plan for the STOP CRC trial called for a traditional random-effects analysis using logistic regression with adjustment for patient-level factors and controlling for within-clinic clustering. During the developmental phase of the trial, the study team began to consider whether their primary interest was the clinic-level impact of the intervention rather than the patient-level impact. The study team ultimately incorporated a clinic-level analytic approach into the analysis plan for the implementation phase of the trial.

The data from the trial were viewed in the context of a 2-level hierarchical model with patients clustered within clinics. The primary interest was the cross-sectional clinic-level data. In other words, to what extent did the intervention improve fecal testing completion rates at each clinic among patients who qualified for the intervention?

Because the study involved outreach to all eligible patients at each clinic, and because the study team did not want to weight data from larger and smaller clinics differentially, each clinic's data were aggregated initially into 8 separate screening rates (1 rate for each subgroup defined by age [50 to 64 years vs 65 years or older], sex [female vs male], and race/ethnicity [minority vs nonminority]. The resulting analytic data set thus consisted of 208 observations (26 clinics × 8 observations per clinic). Treating the resulting observations as approximately normally distributed, the study team had planned to use mixed-model analysis of covariance to estimate the screening probabilities as a function of intervention, age, sex, race/ethnicity, and baseline clinic screening rate, with clinic specified as a cluster variable.

Secondary analyses were to include, in turn, fixed-effect interactions of treatment with age, sex, and race/ethnicity to test the impact of the intervention in subgroups defined by these variables. Following Murray (1998), the latter analyses would also include the random-effect interactions of the covariates with clinic.

Analyses were to be done using the mixed command in Stata, which calculates maximum likelihood estimates assuming normally distributed residual errors and random effects. The study team also assumed an unstructured covariance matrix for the random effects.


After recruitment of the analysis sample, 2 analytic challenges became apparent. First, the number of patients in the various age–sex–race/ethnicity subgroups varied markedly across clinics. This variation raised serious questions about the validity of the normal approximation and the homoscedastic variances. Second, preliminary analyses made clear that the ICC was much higher than anticipated, even after adjustment for baseline clinic-level screening rates, and that the ICC did not decrease to acceptable levels unless the analysis adjusted for network in place of baseline clinic-level screening rate, as shown in Table 1.

Table 1. Intraclass Correlation Coefficients From Various Random-Effects Models

Modela ICC
Null model: pure ICC without covariate adjustment 0.094
Intervention 0.089
Intervention + network 0.051
Intervention + baseline NQF score 0.082
Intervention + network + baseline NQF score 0.045
Full model: intervention + network + age + male sex 0.050

Abbreviations: NQF, National Quality Forum; ICC, intraclass correlation coefficient.
a Only network and NQF score are clinic-level variables.


In response to these challenges, the study statistician proposed to replace the original analysis plan described above with a person-based logistic model that weighted individual observations by 1/(clinic size). This approach seemed to be a natural extension of the original proposal that preserved equal weighting of clinic-level effects, allowed for more general modeling of patient-level covariate effects than would otherwise have been possible, and provided a modeling framework (ie, the logistic link function) that better corresponded to the nature of the data.

Consistent with the focus on marginal effects, the revised analysis plan proposed to use generalized estimating equation (GEE) models that accounted for clinic-level clustering and used a robust covariance estimate. The analysis also included a bias correction for the variance to reflect the small number of clusters. The final model, fit using Stata, took the following form:

xtset clinic
xtgee resultever intv age male corr(indep)
   [pweight=wtind], family(binomial) vce(robust) nmp

where intervention (intv) and male sex (male) were binary indicators of intervention clinics and male sex, respectively; age was a continuous variable; is Stata syntax to treat network as a class variable and create the corresponding dummy indicators for each network; and nmp is the bias correction.

This model now adjusted for age as a continuous variable and did not include an adjustment for race/ethnicity. (Race/ethnicity was missing for some participants, and the investigators wanted to include all participants in the primary analysis.) Although the nmp option in Stata may not have been the optimal variance bias adjustment (see previous guidance from the NIH Collaboratory Biostatistics and Design Core), it was the only available option in Stata for this particular model formulation.

The analysts used an independent (rather than exchangeable) correlation matrix to avoid overweighting of smaller clinics (Patrick Heagerty, personal communication). The analysts also fit several alternative models as sensitivity analyses. These models included GEE models with varying covariate adjustments, an unweighted GEE model (now using the exchangeable correlation matrix), and random-effects logistic models (Table 2), as well as ordinary linear regression models using the 26 observed clinic proportions as data points (Table 3). For the most part, the models gave the same message.

Table 2. Summary of Intervention Effects From the Primary Model and Various Sensitivity Analyses Using the Full Data Seta

Model ln(OR) SE Value OR (95% CI) Absolute Difference (95% CI), %
Primary model
Weighted GEE: intervention + network + age + male sex 0.3241 0.1511 .05 1.38 (1.01 to 1.90) 3.4 (0.1 to 6.8)
Sensitivity analyses
Weighted GEE: intervention + network 0.3239 0.1522 .05 1.38 (1.00 to 1.91) 3.4 (0.0 to 6.8)
Weighted GEE: intervention + network + age + male sex +  insurance status  (excludes some patients) 0.3160 0.1541 .06 1.37 (0.99 to 1.90) 3.4 (–0.1 to 6.8)
Unweighted GEE: intervention + network + age + male sex 0.3158 0.1464 .05 1.37 (1.01 to 1.87) 3.3 (0.1 to 6.5)
Unweighted RE: intervention + network + age + male sex 0.2806 0.1699 .12 1.32 (0.92 to 1.89) 2.9 (–0.8 to 6.6)
Unweighted RE: intervention + network 0.2794 0.1704 .12 1.32 (0.92 to 1.89) 2.9 (–0.9 to 6.6)

Abbreviations: CI, confidence interval; GEE, generalized estimating equation; ln(OR), natural logarithm of the odds ratio; OR, odds ratio; SE, standard error; RE, random effects model.

Table 3. Intervention Effects Calculated From 26 Clinic Means

Model Absolute Difference
Intervention–Control SE P Value
Intervention 3.6 2.3 .14
Intervention + network 3.5 2.0 .10
Intervention + network + age + male sexa 2.4 2.3 .31

Abbreviation: SE, standard error.
Male sex is percentage male and age is mean age.

The study team also took advantage of the fact that the intervention was rolled out to control clinics during the second year after randomization to conduct a stepped-wedge analysis. The outcome variable for this analysis was each clinic's NQF score for colorectal cancer screening. The NQF score was calculated for the year before randomization and for each of the 2 years after randomization. Because the NQF measure estimates the proportion of all age-eligible patients who were up to date for screening, it included some patients who were not included in the analyses above (such as patients who underwent a colonoscopy during the previous 10 years who would have been considered covered for the full year and hence never appeared in the registry of patients needing screening). Therefore, the NQF measure might be expected to be less sensitive to intervention effects. Nevertheless, it is a highly policy-relevant metric, because it is what is likely to drive health plan managers to adopt the intervention.

Because the intervention clinics experienced sizeable delays between the time of randomization and when they actually rolled out the intervention, the analysis allowed for separate intervention effects in the first and second years after randomization, ostensibly reflecting "startup" effects and "ongoing" effects. Both the intervention clinics and the control clinics would contribute to estimation of the former effects, whereas only the intervention clinics would contribute to the latter effects. This analysis suggested similar startup and ongoing effects of 3.3 to 3.4 percentage points relative to baseline, in line with those reported above, though they were not statistically significant in this analysis.


The STOP CRC study team encountered substantial variation in patient-level demographic characteristics across study clinics and a high ICC. The study team overcame these challenges by using a person-based approach to modeling clinic-level intervention effects and by adjusting for health center (rather than specific health center characteristics). Sensitivity analyses suggested that the results were much the same as they would have been had the study team ignored the weighting or used random-effects models rather than GEE models. To improve the robustness of the findings, the study team conducted an analysis comparing NQF scores between the intervention and usual care groups.




back to top

Murray DM. 1998. Design and Analysis of Group Randomized Trials. Oxford University Press, New York.

Version History

Published January 3, 2019


Vollmer WM, Coronado GD. Analysis Plan: Case Study: STOP CRC Trial. In: Rethinking Clinical Trials: A Living Textbook of Pragmatic Clinical Trials. Bethesda, MD: NIH Health Care Systems Research Collaboratory. Available at: Updated August 5, 2019. DOI: 10.28929/099.