Skip to content

COVID-19 Resources

Access the latest information on COVID-19 for clinical researchers
  • Home
  • About
    • NIH Collaboratory
      • Coordinating Center
      • NIH Collaboratory Trials
      • Core Working Groups
      • Steering Committee
      • Distributed Research Network
      • Our Impact
    • Living Textbook
      • Table of Contents
      • How to Use This Site
  • Resources
    • Data and Resource Sharing
    • Training Resources
    • Tools for Researchers
    • Publications
    • Knowledge Repository
  • Webinar
  • Podcast
  • News
    • News Feed
    • Calendar
    • Subscribe
return to home
Subscribe to Newsletter go to twitter feed go to linkedin go to blue sky feed
Search
NIH Collaboratory
Living Textbook of
Pragmatic Clinical Trials

COVID-19 Resources

Access the latest information on COVID-19 for clinical researchers
home button

Rethinking Clinical Trials

A Living Textbook of Pragmatic Clinical Trials

  • Design
    • What is a Pragmatic Clinical Trial?
    • Decentralized Pragmatic Clinical Trials
    • Developing a Compelling Grant Application
    • Experimental Designs and Randomization Schemes
    • Endpoints and Outcomes
    • Analysis Plan
    • Using Electronic Health Record Data
    • Building Partnerships and Teams to Ensure a Successful Trial
    • Intervention Delivery and Complexity
    • Patient Engagement
  • Data, Tools & Conduct
    • Assessing Feasibility
    • Acquiring Real-World Data
    • Assessing Fitness-for-Use of Real-World Data
    • Study Startup
    • Participant Recruitment
    • Monitoring Intervention Fidelity and Adaptations
    • Patient-Reported Outcomes
    • Clinical Decision Support
    • Mobile Health
    • Electronic Health Records–Based Phenotyping
    • Navigating the Unknown
  • Dissemination & Implementation
    • Data Sharing and Embedded Research
    • Dissemination Approaches for Different Audiences
    • Implementation
    • End-of-Trial Decision-Making
  • Ethics & Regulatory
    • Privacy Considerations
    • Identifying Those Engaged in Research
    • Collateral Findings
    • Consent, Disclosure, and Non-Disclosure
    • Data and Safety Monitoring
    • Ethical Considerations of Data Sharing in Pragmatic Clinical Trials
    • Ethics for AI and ML
    • IRB Responsibilities and Procedures

Introduction

CHAPTER SECTIONS

Using Electronic Health Record Data in Pragmatic Clinical Trials


Section 1

Introduction

Expand Contributors

 

Rachel Richesson, MS, PhD, MPH

Keith A Marsolo, PhD
Richard Platt, MD, MSc
Gregory Simon, MD, MPH
Lesley Curtis, PhD
Reesa Laws, BS

Adrian Hernandez, MD, MSH
Jon Puro, MPA-HA
Doug Zatzick, MD
Erik van Eaton, MD, FACS
Vincent Mor, PhD

Contributing Editor
Karen Staman, MS

Using electronic health record (EHR) data for research is fundamentally different than collecting the research data prospectively, as is traditional for controlled clinical trials. Several features of EHR systems create these important differences, most importantly being the lack of investigator control over data collection and recording processes in health care facilities. Other factors include the lack of standard definitions for identifying patient cohorts and study-specific outcomes, the challenges associated with completeness of longitudinal data, and potential errors in linkage of records across systems (Zozus et al. 2015). All of these challenge investigators to assure and demonstrate that data are of adequate quality to support research conclusions. While many of the issues addressed in this chapter apply to a broad range of study designs that might use data from the EHR, this chapter describes the use cases and associated challenges for using EHR data in pragmatic clinical trials, particularly those that include randomization. Specifically, we will discuss:

  • Prerequisites for conducting pragmatic research using EHR systems
  • Developing and refining the research question and defining the data that are essential and necessary to answer that question
  • Data sources for explanatory trials vs PCTs
  • The role of data as a partial representation of (or surrogate for) clinical phenomena under investigation
  • Considerations for the use of EHR data, including understanding bias and provenance, completeness and other dimensions of data quality, and methods for linking between multiple data sources

Challenges and Prerequisites for Using EHR Systems

In a NIH Pragmatic Trials Collaboratory manuscript, 20 NIH Collaboratory Trials responded to a survey about the challenges they encountered when using EHR systems for pragmatic clinical research (Richesson et al. 2021). The goal of the study was to elucidate challenges and develop solutions—or prerequisites for pragmatic research—to enable healthcare system leaders, policy makers, and EHR designers to improve the national capacity for generating real-world evidence. The table summarizes 6 broad challenges and solutions, identified by the study’s authors. The solutions for each broad challenge—if implemented as part of the health systems and research infrastructure—can enable the rapid conduct of future pragmatic trials, and hence can be conceptualized as prerequisites for successful EHR-based pragmatic research.

Challenge

Prerequisite

Inadequate collection of patient-centered data Integrate collection of patient-centered data into EHR systems
Lack of structured data collection Facilitate structured research data collection by leveraging standard EHR functions, usable interfaces, and standard workflows
Lack of standardization Support creation of high-quality research data by using standards
Lack of resources to support customization of EHRs Ensure adequate IT staff to support embedded research
Difficulties aggregating data across sites Create aggregate, multidata type resources for multisite trials
Inefficiencies accessing EHR data Create reusable and automated queries

This study highlights the need to tailor the use of EHR systems to enable the collection of patient-centered outcomes and the extraction of high-quality, standardized data. Although EHR data systems are designed to support clinical care and billing, high-quality data derived from these systems can also help improve population health by generating reliable evidence and advancing continuous learning within and across healthcare systems.

For further descriptions of the 6 challenges and prerequisites, read Enhancing the use of EHR systems for pragmatic embedded research: lessons from the NIH Health Care Systems Research Collaboratory.

Data Sources for Explanatory Trials vs PCTs

There is a marked contrast between using the data collected within an EHR system for research versus using data that were collected outside of an EHR explicitly for a trial. Traditionally in clinical research, a study protocol specifies the data to be collected, and they are collected through a separate, stand-alone system. The circumstances around data collection for traditional trials, including procedures for taking samples, making observations and recording data (e.g., patient positioning, timing, and anatomical location) are clearly defined in the protocol and the data are collected in accordance with those specifications. Further, in traditional research, the protocol defines the timing of data relative to the trial milestones or activities, for example, “the second assessment occurs 14 days post baseline.” In designing traditional (or explanatory) research studies, a top-down approach is usually taken starting with the research question and working down to the required data.

In contrast, the use of existing data streams, a defining feature for pragmatic clinical trials, presents a number of issues and requires a different approach than in traditional explanatory trials. Data contained in EHRs captured from routine-care settings or insurance claims have a different context from prospectively collected research data. While the context of care and data collection is often unspecified, it is certainly not defined around a research question or protocol. Consequently, the structure and representation of clinical data is imposed at the facility according to their standards for clinical documentation and business needs rather than by the needs of the research study. This structure, along with local context, record linkage considerations, use of diagnosis or other structured codes, etc, brings substantial and unique challenges for using data from EHR systems in research.

Next Section

SECTIONS

CHAPTER SECTIONS

sections

  1. Introduction
  2. Interoperability
  3. Data as a Surrogate for Clinical Phenomena
  4. Developing and Refining the Research Questions
  5. Specific Uses for EHR Data in PCTs
  6. Estimating and Identifying the Study Population and Assessing Baseline Prognostic Characteristics
  7. Implementing and Monitoring the Delivery of an Intervention
  8. Assessing Outcomes
  9. The Research Question Drives the Data Requirements
  10. Additional Resources

Resources

Linking Demographic and Socioeconomic Data to the Electronic Health Record

This tool introduces the methodology used at Duke Medicine for linking and enriching demographic and socioeconomic data within its enterprise-wide EHR system.

REFERENCES

back to top

Richesson RL, Marsolo KS, Douthit BJ, et al. 2021. Enhancing the use of EHR systems for pragmatic embedded research: lessons from the NIH Health Care Systems Research Collaboratory. Journal of the American Medical Informatics Association. 28:2626–2640. doi:10.1093/jamia/ocab202.

Zozus MN, Richesson R, Hammond WE, Simon GE. 2015. Acquiring and Using Electronic Health Record Data.  https://dcricollab.dcri.duke.edu/sites/NIHKR/KR/Acquiring%20and%20Using%20Electronic%20Health%20Record%20Data.pdf. Accessed July 14, 2025.

back to top


Version History

October 7, 2025: Updated text as part of annual review (changes made by K. Staman).

July 14, 2025: Updated references and resources (changes made by G. Uhlenbrauck).

August 26, 2022: Updated text as part of annual update (changes made by K. Staman).

July 3, 2020: Minor corrections to layout and formatting (changes made by D. Seils).

November 30, 2018: Updated text as part of annual update (changes made by K. Staman).

Published August 25, 2017

current section :

Introduction

  1. Introduction
  2. Interoperability
  3. Data as a Surrogate for Clinical Phenomena
  4. Developing and Refining the Research Questions
  5. Specific Uses for EHR Data in PCTs
  6. Estimating and Identifying the Study Population and Assessing Baseline Prognostic Characteristics
  7. Implementing and Monitoring the Delivery of an Intervention
  8. Assessing Outcomes
  9. The Research Question Drives the Data Requirements
  10. Additional Resources

Citation:

Richesson RL, Platt R, Simon G, et al. Using Electronic Health Record Data in Pragmatic Clinical Trials: Introduction. In: Rethinking Clinical Trials: A Living Textbook of Pragmatic Clinical Trials. Bethesda, MD: NIH Pragmatic Trials Collaboratory. Available at: https://rethinkingclinicaltrials.org/chapters/design/using-electronic-health-record-data-pragmatic-clinical-trials-top/using-electronic-health-record-data-in-pragmatic-clinical-trials-introduction/. Updated December 3, 2025. DOI: 10.28929/030.

Footer Menu

  • How to Use This Site
  • About NIH Collaboratory
  • Enrollment Reporting
  • Grand Rounds
  • Funding Statement
Link to Twitter Link to LinkedIn Link to Blue Sky Link to NIH Collaboratory email

Reference in this Web site to any specific commercial products, process, service, manufacturer, or company does not constitute its endorsement or recommendation by the U.S. Government or National Institutes of Health (NIH). NIH is not responsible for the contents of any “off-site” Web page referenced from this server.

Log in
Privacy Statement
WordPress is a content management system and should not be used to upload any PHI as it is not an environment for which we exercise oversight, meaning you the author are responsible for the content you post. Please use this system accordingly. Site Map