Data and Safety Monitoring
Section 4
Data Issues With Monitoring Pragmatic Trials
Many pragmatic clinical trials involve outcomes or endpoints that are available as part of routine care and are obtained using real-world data, including data from EHRs or claims data. Although the use of real-world data in pragmatic trials can improve the relevance and efficiency of the research, the use of these secondary research data, or data not originally designed for research, can also present challenges for data monitoring. Below, we describe 2 relevant considerations for DSMBs when monitoring pragmatic trials that use real-world data: data quality and data timeliness. (See also the Assessing Fitness for Use of Real-World Data Sources chapter of the Living Textbook.)
Data Quality
Unlike in explanatory trials, in which the research team often controls the entire chain of custody for outcome data, directly ascertaining study outcomes that are then recorded in study-specific databases, outcome data in pragmatic clinical trials are instead often extracted from real-world data generated from routine healthcare operations (Simon et al 2019).
The use of real-world data in pragmatic trials raises several challenges relevant for DSMBs. First, data generated from routine healthcare operations, such as those reported by clinicians during routine clinical interactions, may have more variability than those generated in explanatory research contexts, which typically rely on strict reporting protocols using separate systems for documentation (such as case report forms). Monitoring for data quality and integrity issues related to inconsistent measurement across study arms or sites early in the trial may be needed to help ensure that the trial can generate valid results.
Data in pragmatic trials may also be more heterogenous across sites, because of real-world variations in populations and approaches used by healthcare delivery organizations and their practicing clinicians. Consequently, DSMBs may need to pay close attention to site-specific data, to inform assessments of whether an emerging result is generalizable or may instead be attributable to a limited number of sites (Ellenberg et al 2015). For example, EMBED (Melnick et al 2019; Melnick et al 2022) researchers found that coding practices related to patient race and ethnicity varied not only from system to system but also within systems, which could undermine validity and the ability to assess whether the data are generalizable (Curtis et al 2025). To monitor for the issue, EMBED researchers increased the number of data pulls for review(Curtis et al 2025). As another example, the ACP PEACE trial planned to use the electronic health record to examine documentation of advance care planning (Volandes et al. 2025). However, chart review revealed that fewer than 60% of patients had correct documentation, forcing investigators to pivot to another method (natural language processing) to detect documentation (Curtis et al 2025).
A related challenge is that of data completeness. Unlike traditional explanatory trials, which may involve intense follow-up visits and extensive data collection, some pragmatic trials involve no more follow-up than would be normal in usual care, or may allow flexibility in follow-up according to standard clinical practices at the participating sites. The DSMB, investigators, and study sponsor will need to agree on necessary data for follow-up in accordance with the risks of the trial intervention. Variations in follow-up practices across sites may also need to be accounted for in the randomization process to ensure that sites with more frequent follow-up practices are balanced across study arms.
Finally, data recording practices (or even entire data systems) may change during the course of the trial, due to changes in the business practices or EHR systems of the participating trial sites (Simon et al 2019). In such cases, DSMBs should be prepared to collaborate with investigators and sponsors to assess how best to mitigate threats to study validity.
Data Timeliness
Data obtained from sources such as claims data, state mortality data, or even the EHR will often have a delay and not be available in real time. In some trials, sites may perform data analysis locally due to privacy concerns, then submit results for centralized analysis and aggregation. This process, along with necessary data quality checks, can result in additional time before data are available for review (Ellenberg et al 2015). In some cases, these delays may limit the feasibility of interim analyses to inform decisions about early termination based on considerations of futility, safety, or effectiveness. DSMBs and study investigators should prospectively consider the implications of these delays when developing plans for data monitoring.
Case Example
In a traditional study of a mental health intervention, a suicide attempt or suicide death would be considered a serious adverse event, calling for immediate reporting to the DSMB and a determination by the DSMB regarding the relatedness of the event to study treatment. However, in the SPOT suicide prevention trial, an NIH Pragmatic Trials Collaboratory Trial, information on deaths was obtained from state mortality data, which were delayed up to 16 months. Death was also an expected occurrence in this population at high risk for suicide. In this situation, it would not be sensible to stop the trial for investigation of a death 16 months after the death occurred. These special circumstances required negotiation with the DSMB to work out a more practical monitoring plan.
SECTIONS
Resources
Assessing Data Quality for Healthcare Systems Data Used in Clinical Research
Guidance document from the NIH Pragmatic Trials Collaboratory with best practices for assessing data quality in pragmatic trials
Data Quality Assessment for Comparative Effectiveness Research in Distributed Data Networks
Article exploring requirements for data quality assessment in comparative effectiveness research using distributed data networks
REFERENCES
Curtis LH, Morain S, O’Rourke PP, et al. 2025. Monitoring in pragmatic trials lessons from the NIH pragmatic trials collaboratory. Contemp Clin Trials. 152:107866. doi:10.1016/j.cct.2025.107866. PMID: 40015598.
Ellenberg SS, Culbertson R, Gillen DL, Goodman S, Schrandt S, Zirkle M. 2015. Data monitoring committees for pragmatic clinical trials. Clin Trials. 12(5):530-6. doi: 10.1177/1740774515597697. Epub 2015 Sep 15. PMID: 26374679.
Melnick ER, Jeffery MM, Dziura JD, et al. 2019. User-centred clinical decision support to implement emergency department-initiated buprenorphine for opioid use disorder: protocol for the pragmatic group randomised EMBED trial. BMJ Open. 9:e028488. doi:10.1136/bmjopen-2018-028488. PMID: 31152039.
Melnick ER, Nath B, Dziura JD, et al. 2022 User centered clinical decision support to implement initiation of buprenorphine for opioid use disorder in the emergency department: EMBED pragmatic cluster randomized controlled trial. BMJ.:e069271. doi:10.1136/bmj-2021-069271. PMID: 35760423.
Simon GE, Shortreed SM, Rossom RC, Penfold RB, Sperl-Hillen JAM, O'Connor P. 2019. Principles and procedures for data and safety monitoring in pragmatic clinical trials. Trials. 20(1):690. doi: 10.1186/s13063-019-3869-3. PMID: 31815644.
Volandes AE, Chang Y, Lakin JR, et al. 2025. An Intervention to Increase Advance Care Planning Among Older Adults With Advanced Cancer: A Randomized Clinical Trial. JAMA Netw Open. 8:e259150. doi:10.1001/jamanetworkopen.2025.9150. PMID: 40343696.