A defining characteristic of cluster randomized trials is the randomization of groups, or clusters, of individuals to study arms and the resulting potential for correlation of outcomes within clusters. This potential correlation must be considered in the design of the trial and in the primary analysis. Thus, in addition to estimating the effect size in a cluster randomized trial, the researchers must estimate the ICC for a valid calculation of the target sample size (Campbell et al 2004; Donner and Klar 2010; Chow et al 2020).
See the Intraclass Correlation section of this chapter for more on the ICC.
In ideal situations, preliminary data for sample size calculations are available from the planned enrollment sites for the individuals and clusters to be studied, and these data can be analyzed to inform estimates of the ICC. However, in many situations, preliminary data and reliable estimates of the ICC may not be obtainable at the time of study design. Thus, the researchers may wish to use interim data collected during the trial itself to estimate outcome data for the ICC and to reassess the sample size (Wittes and Brittain 1990).
Approaches
Formal methods for sample size reestimation in cluster randomized trials have been proposed, along with strategies for modifying the study design. The main approaches consider either an initial sample of clusters or an initial sample of individuals from a fixed number of clusters, with the interim analysis estimating the key variance components needed for a recalculation of sample size. Lake and colleagues (2002) focus on the scenario in which cluster sizes are fixed and the key design question is the necessary total number of clusters. In contrast, van Schie and Moerbeek (2014) consider the scenario in which the total number of clusters is fixed but the sample size from each cluster can vary.
In both scenarios, the proposed methods involve analyzing interim data from the trial and generally do not guarantee control of type I error. However, extensive simulations verify that both strategies lead to minimal error rate inflation, allowing the researchers to adjust sample sizes and obtain a final sample size with approximately the desired statistical power, even when the preenrollment sample size assumptions are inaccurate. Finally, in both scenarios, the analyst must select the timing of the interim analysis for sample size adjustment. Recommendations suggest conducting the interim evaluation after 25% to 75% of the originally planned enrollment (either the number of clusters or the number of individuals). Additional research has studied methods for stepped-wedge designs (Grayling et al 2018) and the use of Bayesian methods (Shen et al 2022).
In summary, there are 2 main methods for increasing the effective sample size in a cluster randomized trial: (1) enroll more individuals per cluster when the number of clusters is fixed; or (2) add more clusters. However, adding clusters may not be feasible if additional clusters are unavailable or trial resources are limited.
Case Study: FM-TIPS
In this section, we discuss the Fibromyalgia Transcutaneous Electrical Nerve Stimulation (TENS) in Physical Therapy Study (FM-TIPS) as an example of conducting an interim reassessment of sample size in a cluster randomized trial. FM-TIPS, an NIH Collaboratory Trial supported by the National Institute of Arthritis and Musculoskeletal and Skin Diseases (NIAMS), is a cluster randomized pragmatic trial examining whether the addition of TENS to routine physical therapy improves movement-evoked pain between baseline and 60 days compared with physical therapy alone among patients with fibromyalgia (Post et al 2022).
Learn more about the NIH Collaboratory Trials.
The FM-TIPS research team originally calculated the trial’s sample size such that a 2-tailed statistical test at the 0.05 significance level would be able to detect a difference of at least 1.0 in mean change in movement-evoked pain (on a scale of 0 to 10) with an assumed SD of 2.0. To ensure 80% statistical power for the primary analysis, they calculated the sample size to have an equal number of participants per clinic (range, 9-12 clinics/arm). They conservatively estimated an ICC of up to 0.14, which would require complete outcome data for 456 patients, assuming 19 patients per clinic and 12 clinics per arm. To account for a dropout rate of up to 24% by day 60, the research team aimed to enroll a total of 600 patients (300 per arm; 25 from each of 24 clinics). Some variability in the number of patients enrolled from each clinic was expected. Therefore, the research team capped enrollment at 30 patients per clinic to enroll up to 20% more than was originally planned at each clinic.
During study design, the research team also planned an interim reestimation of the ICC. They considered enrollment targets of one-quarter, one-third, one-half, and three-quarters of the total planned sample size (N = 600) as options for the timing of the assessment. For example, van Schie and Moerbeek (2014) recommend recalculating the ICC after enrollment of 50% of the planned number of participants. The research team determined that conducting the reassessment after enrolling half or three-quarters of the patients would yield the best estimate of the ICC. However, they were concerned that this timing might be too late if the original ICC estimate was too conservative to allow for necessary approvals from the study’s sponsor for a potential sample size reduction. Therefore, they planned the interim reassessment to occur after enrollment of the first 200 participants from both arms combined, corresponding to one-third of the planned sample size. Although this meant that fewer than 200 participants would have 60-day primary outcome data available at the time of the interim reassessment, the reassessment would allow the research team to evaluate the SD of the outcome while accounting for important aspects of the study design, such as the number of clinics and the number of patients per clinic. Thus, while the interim reassessment would not allow the research team to evaluate the treatment effect, it would allow them to assess outcome variability for sample size reestimation.
It is worth noting that an interim reassessment of the mean and SD of the primary outcome could have been considered. The FM-TIPS research team did not take this approach, because the minimal clinically important difference in the outcome was available in the existing literature and from previous studies. Since there were limited preliminary data available at the time of the interim reassessment, the research team relied on the original estimate of the mean and SD and focused on reestimating the ICC to support a reassessment of the sample size.
Methods
Using a formula described by Ahn and colleagues (2015) and the online calculator developed by Campbell and colleagues (2004), the FM-TIPS research team activated 24 clinics in 5 healthcare systems beginning in January 2021. They added a healthcare system and several clinics, and deactivated some clinics, in November 2022. (It was not feasible to add more healthcare systems or clinics due to challenges associated with the COVID-19 public health emergency and due to a lack of physical therapy clinics interested in participating in a research study.) Observed enrollment was variable across clinics. The research team determined that the sample size reassessment should account for this variability through the use of the coefficient of variation (CV = SD of cluster size/mean cluster size), which is commonly used to characterize variability in cluster sizes. Therefore, they included the CV of patients per clinic in the sample size reassessment.
The research team reestimated the ICC by using a modeling approach aimed at evaluating the relative contributions of different sites to the variance compared with other study characteristics. Specifically, they used a generalized linear mixed model with type I sums of squares to obtain the ICC estimate. To maintain blinding to treatment effect in the interim reassessment, they did not include the main effect of the treatment. In this model, they considered the size of each clinic (small or large), the interaction between clinic size and treatment arm (size × arm), and a categorical variable for movement-evoked pain at baseline (0-3, 4-6, 7-10) to be fixed factors, and they considered sites × (size × arm) to be a random factor.
Although an interim analysis can provide a new estimate of the ICC, any sample size reassessment should account for uncertainty in the new ICC estimate. To this end, the FM-TIPS research team considered the jackknife method, which allows the analyst to estimate the SE of the ICC without making parametric assumptions about the data. Using the jackknife method, the analyst performs calculations based on a leave-one-out resampling of the data wherein resampling of clusters, rather than individuals, is used to account for cluster randomization. In FM-TIPS, this approach corresponded to calculating the ICC while leaving 1 clinic out at a time and assessing the influence of each clinic. These ICC estimates were then used to calculate the jackknife-based SE for the interim ICC estimate.
Results
At the time of the interim reassessment in FM-TIPS, there were 28 active clinics, of which 26 clinics (13 in each study arm) had at least 1 patient who had completed the day 60 visit at which the primary outcome could be ascertained, and of which 21 clinics had more than 1 such patient. Consistent with the statistical analysis plan, there were 228 patients enrolled (including 183 patients with a day 1 assessment and 144 patients with a day 60 assessment). The reestimated ICC based on the adjusted model was 0.05, and the jackknife-based estimate of the SE of the ICC was 0.07.
Based on the observed enrollment of the clinics during the initial part of the study, the research team assumed the sample size per clinic would have a CV of 0.6. The Table shows the statistical power for different ICC values assuming 13 or 14 clinics per arm and the observed CV of 0.6, using the originally assumed difference in mean and SD for the primary outcome variable. The degrees of freedom for the t test were calculated based on the number of clusters in the study.