Skip to content

COVID-19 Resources

Access the latest information on COVID-19 for clinical researchers
  • Home
  • About
    • NIH Collaboratory
      • Coordinating Center
      • NIH Collaboratory Trials
      • Core Working Groups
      • Steering Committee
      • Distributed Research Network
      • Our Impact
    • Living Textbook
      • Table of Contents
      • How to Use This Site
  • Resources
    • Data and Resource Sharing
    • Training Resources
    • Tools for Researchers
    • Publications
    • Knowledge Repository
  • Webinar
  • Podcast
  • News
    • News Feed
    • Calendar
    • Subscribe
return to home
Subscribe to Newsletter go to twitter feed go to linkedin go to blue sky feed
Search
NIH Collaboratory
Living Textbook of
Pragmatic Clinical Trials

COVID-19 Resources

Access the latest information on COVID-19 for clinical researchers
home button

Rethinking Clinical Trials

A Living Textbook of Pragmatic Clinical Trials

  • Design
    • What is a Pragmatic Clinical Trial?
    • Decentralized Pragmatic Clinical Trials
    • Developing a Compelling Grant Application
    • Experimental Designs and Randomization Schemes
    • Endpoints and Outcomes
    • Analysis Plan
    • Using Electronic Health Record Data
    • Building Partnerships and Teams to Ensure a Successful Trial
    • Intervention Delivery and Complexity
    • Patient Engagement
  • Data, Tools & Conduct
    • Assessing Feasibility
    • Acquiring Real-World Data
    • Assessing Fitness-for-Use of Real-World Data
    • Study Startup
    • Participant Recruitment
    • Monitoring Intervention Fidelity and Adaptations
    • Patient-Reported Outcomes
    • Clinical Decision Support
    • Mobile Health
    • Electronic Health Records–Based Phenotyping
    • Navigating the Unknown
  • Dissemination & Implementation
    • Data Sharing and Embedded Research
    • Dissemination Approaches for Different Audiences
    • Implementation
    • End-of-Trial Decision-Making
  • Ethics & Regulatory
    • Privacy Considerations
    • Identifying Those Engaged in Research
    • Collateral Findings
    • Consent, Disclosure, and Non-Disclosure
    • Data and Safety Monitoring
    • Ethical Considerations of Data Sharing in Pragmatic Clinical Trials
    • Ethics for AI and ML
    • IRB Responsibilities and Procedures

Methods of Access

CHAPTER SECTIONS

Acquiring Real-World Data


Section 8

Methods of Access

Expand Contributors

Keith A. Marsolo, PhD
Rachel Richesson, MS, PhD, MPH
W. Edward Hammond, PhD
Michelle Smerek, BS
Lesley Curtis, PhD

Contributing Editor
Karen Staman, MS
Damon M. Seils, MA

There are several approaches to obtaining real-world data. Real-world data may be obtained directly from a site (such as a healthcare organization) or data holder, via a distributed research network, or directly from patients. Depending on the data needed, real-world data may be provisioned into a protected computing environment, often referred to as an enclave. We detail the trade-offs between the different approaches below.

Direct From Sites or Data Holders

Healthcare organizations, particularly those that participate in research, can often provide data in a variety of formats, which need to be aligned with the requirements of the project. Many other data holders, such as those that maintain disease or device registries have similar capabilities. Examples include:

  • Clinician-generated reports: Most electronic health records (EHRs) provide functionality that allows clinicians to generate on-demand reports geared toward answering care management questions (for example, who received a flu shot in last 30 days, who was in the emergency department last night). Creating these reports has relatively low cost, and the reports typically take seconds to run, with real-time results. The drawback is that they have limited ability to include longitudinal results. They are geared toward "most recent" values—most recent lab result, date of last test, etc. Adoption and uptake also varies. Clinicians may not realize that they have the capabilities to generate such reports, as training and support vary by healthcare system.
  • Database reports: Almost every EHR includes a reporting database and/or data warehouse. Extracts can be programmed against these repositories, and it may be possible to reuse the same query, or a large portion of it, across sites that use the same vendor. Once a query is developed, it is usually possible to automate the production and delivery of the data. This approach may not be feasible for smaller sites or sites without local information technology support, and complex queries will rely heavily on the skill set and knowledge of the local analyst responsible for programming. This can lead to variation in quality across sites.
  • Common data model (CDM) extracts: Many academic medical centers that participate in distributed research networks may have their local source data transformed into a CDM. After the cohort or study population has been defined, local analysts can generate extracts of the relevant tables and fields.
  • Application programming interface (API): Healthcare organizations (both healthcare systems and payers) are working to make their data available via API standards like FHIR. Most sites have limited experience delivering data in this format, which means they may not yet have robust processes for allowing access from external parties.

Distributed Research Networks

Study teams can partner directly with distributed research networks to obtain data to support their trials. The process to develop and distribute a query within these networks is usually straightforward, though there are often governance processes that must be followed. One query can be distributed in order to retrieve results from the whole network (or participating sites). An added benefit is that most distributed research networks perform some level of curation or quality assessment on data within their networks. The major drawbacks to this approach are that the data elements of interest for the study may not be in the CDM of the network, which means they must be obtained through other means (for example, added to the CDM or abstracted through chart review), and that large studies will likely need to go beyond a single distributed research network, meaning study teams will need to deal with data in multiple formats.

Direct From Patient

While patients in the United States have always had the right to receive their health records from providers under the Health Insurance Portability and Accountability Act (HIPAA), it was not always possible to receive them in a machine-readable, electronic format (for example, not a scanned PDF). Spurred by efforts of the US federal government over the past decade to promote interoperability and patients' access to their own healthcare data, it has become increasingly viable to obtain data directly from them. Certified EHRs historically have provided patients with the ability to download structured documents, which contain information about most recent visit and some longitudinal values.

More recent regulations will require that EHRs provide data via FHIR APIs, which should streamline the process somewhat, especially given that technology companies such as Apple have made it easy for users to download their EHRs into their local Health app. Once the records have been downloaded, users can decide whether to share them with other applications, including those designed for research. CMS has enabled similar workflows through its BlueButton 2.0 initiative, for Medicare beneficiaries as well as for CMS-regulated payers, including those that support Medicare Advantage, Medicaid, CHIP, and Qualified Health Plans (QHPs) on the federally facilitated exchanges.

There are some drawbacks to this approach: (1) the "completeness" of the implementation of the standard varies by site and/or EHR vendor (D’Amore et al 2014); (2) study teams must broker access through a secondary app such as Apple Health, Hugo, or 1upHealth; and (3) if a patient receives care in multiple healthcare systems, they must make multiple requests to receive all of their records. Despite this, there may be studies that can benefit from such an approach, for instance, a study on a rare disease with a small number of patients who receive care across multiple healthcare systems. Negotiating agreements with multiple systems is time-consuming, so it may be faster to engage directly with patients.

The research community has much to learn about best practices in engaging with patients to obtain data in this manner (including how to encourage high response rates and how to ensure access is provided for the life of the study), but it remains an encouraging possibility.

Protected Computing Environments

Most academic medical centers and many healthcare organizations struggle with the need to provide access to clinical data for research while protecting sensitive data from EHRs and other systems. For this reason, many organizations set up protected computing environments, or limited-access platforms, where only individuals with the proper permissions can access data in a protected, secure environment that is separated from the rest of the network used for clinical and/or research purposes. (Another term for such a platform is a data or computing enclave.) In such an environment, users generally do not have the ability to download data to their local machines, access is provided via a remote or virtual computer, and analyses are contained within the protected space. In order to remove data from such an environment, users must go through an honest broker process, where the content is reviewed to ensure that it can be removed from the secure environment. For example, users may be able to transfer aggregate counts without additional approvals but may need special permission to remove patient-level records.

Examples of organizations that use protected computing environments include CMS and the Veterans Affairs, and they exist within many academic medical centers as well, such as PACE (Protected Analytics Computing Environment). When used as part of a pragmatic clinical trial, prospective study data can be uploaded into the computing enclave, linked with the relevant records stored there, and used in the resulting analysis. Summary or analysis datasets can then be downloaded by the study team. Despite these extra steps, there can be benefits to using a protected computing environment. In some cases, it is the only way to gain access to a particular data source, while in others, the data may be refreshed or updated more frequently, since it is not necessary to generate stand-alone flat files. As concerns grow about the organizational risk of sharing sensitive patient information, these protected environments are likely to increase in prevalence.

Real-world data have particular relevance to pragmatic clinical trials, as they generally represent data collected or generated in the course of routine operations. There are many different types of real-world data and different approaches by which to obtain them. However, real-world data sources are not interchangeable, and any given source may not be applicable for a specific study. As a result, care must be taken to ensure that the real-world data source aligns with the study in question and that the data are obtained in a format that support the proposed analysis.

SECTIONS

CHAPTER SECTIONS

sections

  1. Introduction
  2. Common Real-World Data Sources
  3. Data Formats
  4. Acquiring Electronic Health Record Data
  5. Acquiring Claims Data and CMS Research-Identifiable Files
  6. Acquiring Patient-Reported Data
  7. Gaining Permission to Use Real-World Data
  8. Methods of Access
  9. Case Study: The IMPACT-AFib Trial

Resources

Screenshot of Grand Rounds presentation
Approaches to Patient Follow-Up for Clinical Trials: What’s the Right Choice for Your Study?
NIH Pragmatic Trials Collaboratory PCT Grand Rounds; March 1, 2019


Screenshot of Grand Rounds presentation
Data Linkage Within, Across, and Beyond PCORnet
NIH Pragmatic Trials Collaboratory PCT Grand Rounds; November 9, 2018

REFERENCES

back to top

D'Amore JD, Mandel JC, Kreda DA, et al. 2014. Are meaningful use stage 2 certified EHRs ready for interoperability? Findings from the SMART C-CDA Collaborative. J Am Med Inform Assoc. 21(6):1060-1068. doi: 10.1136/amiajnl-2014-002883. PMID: 24970839.

back to top


Version History

December 3, 2025: Updated hyperlinks (changes made by G. Uhlenbrauck).

October 14, 2022: Made nonsubstantive changes to the text, added images to the Resources sidebar, added Seils as a contributing editor, and reordered the section within the chapter as part of the annual content update (changes made by D. Seils).

Published August 25, 2020

current section :

Methods of Access

  1. Introduction
  2. Common Real-World Data Sources
  3. Data Formats
  4. Acquiring Electronic Health Record Data
  5. Acquiring Claims Data and CMS Research-Identifiable Files
  6. Acquiring Patient-Reported Data
  7. Gaining Permission to Use Real-World Data
  8. Methods of Access
  9. Case Study: The IMPACT-AFib Trial

Citation:

Marsolo KA, Richesson RL, Hammond WE, et al. Acquiring Real-World Data: Methods of Access. In: Rethinking Clinical Trials: A Living Textbook of Pragmatic Clinical Trials. Bethesda, MD: NIH Pragmatic Trials Collaboratory. Available at: https://rethinkingclinicaltrials.org/chapters/conduct/acquiring-real-world-data/methods-of-access/. Updated December 3, 2025. DOI: 10.28929/182.

Footer Menu

  • How to Use This Site
  • About NIH Collaboratory
  • Enrollment Reporting
  • Grand Rounds
  • Funding Statement
Link to Twitter Link to LinkedIn Link to Blue Sky Link to NIH Collaboratory email

Reference in this Web site to any specific commercial products, process, service, manufacturer, or company does not constitute its endorsement or recommendation by the U.S. Government or National Institutes of Health (NIH). NIH is not responsible for the contents of any “off-site” Web page referenced from this server.

Log in
Privacy Statement
WordPress is a content management system and should not be used to upload any PHI as it is not an environment for which we exercise oversight, meaning you the author are responsible for the content you post. Please use this system accordingly. Site Map