EHR Data Extraction

Analysis Plan

Section 5

EHR Data Extraction


Elizabeth R. DeLong, PhD

For the NIH Health Care Systems Collaboratory Biostatistics and Study Design Core


Contributing Editor

Jonathan McCall, MS

Many pragmatic trials, whether designed as CRTs or individually randomized trials, rely on data extraction from the patient/participant’s electronic health record (EHR). Although study data extraction allows PCTs to be performed quickly and at less expense than “traditional” RCTs that establish redundant parallel data capture systems, they also introduce methodological and logistical challenges, such as those described in the white paper, Assessing Data Quality for Healthcare Systems Data Used in Clinical Research. However, EHR data extraction also poses challenges for statistical analysis. Data gathered from patient EHRs (which, by definition, are not purposely designed or optimized to support research activities) may have higher rates of missingness and error than data captured with purpose-built systems and subjected to “cleaning” and validation. Missing data, including that caused by the dropout of whole clusters, pose special issues for PCTs. Preliminary data capture and assessment will provide a guide as to whether the intended study is feasible, given the availability and quality of the data.




Using Electronic Health Record Data

Living Textbook chapter describing considerations for the use of EHR data in pragmatic trials

Key Issues in Extracting Usable Data from Electronic Health Records for Pragmatic Clinical Trials

A guidance document from the Biostatistics and Study Design Core


Version History

Published August 25, 2017


DeLong ER. Analysis Plan: EHR Data Extraction. In: Rethinking Clinical Trials: A Living Textbook of Pragmatic Clinical Trials. Bethesda, MD: NIH Health Care Systems Research Collaboratory. Available at: Updated November 13, 2018. DOI: 10.28929/019.