EHR Data Extraction

Analysis Plan

Section 7

EHR Data Extraction


Elizabeth R. DeLong, PhD

For the NIH Health Care Systems Collaboratory Biostatistics and Study Design Core


Contributing Editors

Jonathan McCall, MS

Damon M. Seils, MA

Many PCTs, whether designed as CRTs or individually randomized trials, rely on data extraction from the participant’s electronic health record (EHR). Although study data extraction allows PCTs to be performed quickly and at less expense than traditional RCTs that establish redundant parallel data capture systems, they also introduce methodological and logistical challenges, such as those described in the white paper, Assessing Data Quality for Healthcare Systems Data Used in Clinical Research. EHR data extraction also poses challenges for statistical analysis. Data gathered from patient EHRs (which, by definition, are not purposely designed or optimized to support research activities) may have higher rates of missingness and error than data captured with purpose-built systems and subjected to “cleaning” and validation. Missing data, including that caused by the dropout of whole clusters, pose special issues for PCTs. Preliminary data capture and assessment will provide a guide as to whether the intended study is feasible, given the availability and quality of the data.



Version History

January 16, 2019: Made nonsubstantive changes to the text as part of the annual content update (changes made by D. Seils).

Published August 25, 2017


DeLong ER. Analysis Plan: EHR Data Extraction. In: Rethinking Clinical Trials: A Living Textbook of Pragmatic Clinical Trials. Bethesda, MD: NIH Health Care Systems Research Collaboratory. Available at: Updated August 5, 2019. DOI: 10.28929/019.