Skip to content

COVID-19 Resources

Access the latest information on COVID-19 for clinical researchers
  • Home
  • About
    • NIH Collaboratory
      • Coordinating Center
      • NIH Collaboratory Trials
      • Core Working Groups
      • Steering Committee
      • Distributed Research Network
      • Our Impact
    • Living Textbook
      • Table of Contents
      • How to Use This Site
  • Resources
    • Data and Resource Sharing
    • Training Resources
    • Tools for Researchers
    • Publications
    • Knowledge Repository
  • Webinar
  • Podcast
  • News
    • News Feed
    • Calendar
    • Subscribe
return to home
Subscribe to Newsletter go to twitter feed go to linkedin go to blue sky feed
Search
NIH Collaboratory
Living Textbook of
Pragmatic Clinical Trials

COVID-19 Resources

Access the latest information on COVID-19 for clinical researchers
home button

Rethinking Clinical Trials

A Living Textbook of Pragmatic Clinical Trials

  • Design
    • What is a Pragmatic Clinical Trial?
    • Decentralized Pragmatic Clinical Trials
    • Developing a Compelling Grant Application
    • Experimental Designs and Randomization Schemes
    • Endpoints and Outcomes
    • Analysis Plan
    • Using Electronic Health Record Data
    • Building Partnerships and Teams to Ensure a Successful Trial
    • Intervention Delivery and Complexity
    • Patient Engagement
  • Data, Tools & Conduct
    • Assessing Feasibility
    • Acquiring Real-World Data
    • Assessing Fitness-for-Use of Real-World Data
    • Study Startup
    • Participant Recruitment
    • Monitoring Intervention Fidelity and Adaptations
    • Patient-Reported Outcomes
    • Clinical Decision Support
    • Mobile Health
    • Electronic Health Records–Based Phenotyping
    • Navigating the Unknown
  • Dissemination & Implementation
    • Data Sharing and Embedded Research
    • Dissemination Approaches for Different Audiences
    • Implementation
    • End-of-Trial Decision-Making
  • Ethics & Regulatory
    • Privacy Considerations
    • Identifying Those Engaged in Research
    • Collateral Findings
    • Consent, Disclosure, and Non-Disclosure
    • Data and Safety Monitoring
    • Ethical Considerations of Data Sharing in Pragmatic Clinical Trials
    • Ethics for AI and ML
    • IRB Responsibilities and Procedures

Common Real-World Data Sources

CHAPTER SECTIONS

Acquiring Real-World Data


Section 2

Common Real-World Data Sources

Expand Contributors

Keith A. Marsolo, PhD
Rachel Richesson, MS, PhD, MPH
W. Edward Hammond, PhD
Michelle Smerek, BS
Lesley Curtis, PhD

Contributing Editors
Karen Staman, MS
Damon M. Seils, MA

As part of its framework for using real-world evidence derived from real-world data to support regulatory decision making, the US Food and Drug Administration (FDA) has identified several potential sources of real-world data and information (FDA 2017):

Electronic health records: Electronic health records (EHRs) contain information collected during the course of clinical care. They may include multiple care settings—outpatient visits, inpatient stays, emergency and urgent care visits, home health, etc. EHRs can include a variety of data from structured domains, including diagnoses, procedures, laboratory results, vital signs, medication orders, and medication administrations. They may also include less standardized data, such as information captured in inpatient flowsheets, questionnaires and surveys completed directly by patients, signs and symptoms recorded by clinicians, data on surgical care and anesthesia, and provider and nursing documentation.

Administrative claims: Administrative claims are insurance claims related to services from healthcare providers. In the United States, federal insurance programs include Medicare and Medicaid. The Medicare population includes adults 65 years and older, patients with certain disabilities, and patients with end-stage renal disease. Medicaid is an insurance program for people with low income. Private insurance claims include those in employer-sponsored health plans, insurance claims for those who are self-employed, and claims for insurance plans administered on behalf of the federal government. These administrative data can include information about physician services, institutional costs, demographic characteristics, deaths, dispensed medications, home health services, and skilled nursing facilities.

Patient-reported outcomes: Patient-reported outcomes (PROs) are defined by the FDA as any report of the status of a patient's health that comes directly from the patient, without interpretation of the patient's response by a clinician or anyone else (FDA Guidance for Industry 2009). PROs might include information about symptoms, functioning, satisfaction with care or symptoms, adherence to prescribed medications or other therapy, and perceived value of treatment. Typically captured in the form of surveys or questionnaires, PROs may be obtained via paper forms, online portals, or mobile apps. See the Patient-Reported Outcomes chapter of the Living Textbook for more information.

Patient-generated health data: Patient-generated health data are data generated from devices that provide information on a patient’s status (for example, internet-connected scales, pedometers, home blood pressure monitors). These data may be obtained directly from the device via a mobile application or through some other type of instrument. Patient-generated health data can include the raw sensor values and summary statistics calculated from the underlying data.

Medical product/device registries: Registries are typically created after a product or device has been approved in order to support postmarketing surveillance. These registries often contain rich information about the product or device but limited data on the characteristics or health status of patients, generally far less data than what is available in EHRs.

Condition-specific or disease registries: Registries contain information from patients who have a specific condition or disease. These patient-focused registries often include information about disease onset, symptoms, changing phenotypes, treatments, and outcomes. Because they are designed for research or targeted care, condition or disease-specific registries often have more condition-specific data than is collected in EHRs.

Environmental factors and social determinants of health: Environmental factors and social determinants of health (for example, food insecurity, access to transportation) are increasingly being captured in EHRs as healthcare systems focus more on population health. The data may be collected directly from patients through surveys or derived from community or geographically organized resources (for example, the American Community Survey) based on a patient’s current or historical address. Environmental sources can also be used in this manner, such as to estimate exposure to pollution based on distance to a freeway, power plant, or other industrial source.

In most cases, PROs and patient-generated health data are obtained directly from patients as part of a specific trial or study. In this context, PROs and patient-generated health data are collected prospectively using the procedures that govern prospective data collection, such as patient consent. PROs and patient-generated health data collected in the EHR or as part of a registry for other purposes (such as follow-up to a surgical procedure or monitoring of patients with a chronic disease) will typically be treated like the rest of the data that are contained in that source.

Identifying the Appropriate Source

Since data in secondary real-world data sources were collected or generated for purposes other than research, they include gaps and biases that reflect the nature of the underlying activity (Kahn and Ranade 2010; Hersh et al 2013; Weiskopf et al 2013; Raebel et al 2014; Rusanov et al 2014). Therefore, given a specific research question or study, it is important to assess whether the real-world data source is relevant and can reliably fulfill its intended purpose (FDA 2017; FDA 2021), whether it is for patient identification or recruitment, monitoring outcomes, or assessing endpoints. (See the Assessing Fitness-for-Use of Real-World Data chapter of the Living Textbook.)

In many cases, the same study concept can be present in multiple real-world data sources. For example, a disease diagnosis can be identified through a query of the EHR, administrative claims, a patient-reported medical history, or a disease registry. When designing a study, investigators should understand the trade-offs between different sources in terms of completeness across a potential study population, length of follow-up, etc. Depending on how data are captured, multiple sources may be needed to adequately support a study. In this case, an adjudication process is often necessary to decide what to do if there is discordance between sources (Rockhold et al 2020). Investigators should prepare to implement such a plan.

In addition, when combining study data with real-world data sources, some form of record linkage is usually required to match patients across sources. There are a number of techniques that can be used. Some rely on deterministic matches of clear-text identifiers, while others rely on probabilistic weighting of encrypted tokens generated from combinations of identifiers (such as first and last names, date of birth, and current zip code) (Grannis et al 2002; Durham et al 2010; Kum et al 2014; Setoguchi et al 2014; Durojaiye et al 2018; Karr et al 2019). Not all data holders are able to support all of these methods, so it is important to understand their capabilities, as well as the identifiers to which they have access. Study teams may then need to collect these same identifiers to allow for linkage to occur.

Previous Section Next Section

SECTIONS

CHAPTER SECTIONS

sections

  1. Introduction
  2. Common Real-World Data Sources
  3. Data Formats
  4. Acquiring Electronic Health Record Data
  5. Acquiring Claims Data and CMS Research-Identifiable Files
  6. Acquiring Patient-Reported Data
  7. Gaining Permission to Use Real-World Data
  8. Methods of Access
  9. Case Study: The IMPACT-AFib Trial

Resources

Real-World Data and Real-World Evidence in Regulatory Decisions
NIH Pragmatic Trials Collaboratory EHR Workshop video module. Jacqueline Corrigan-Curay of the US Food and Drug Administration discusses recent trends in incorporating real-world data and real-world evidence in regulatory decisions.


Screenshot of Grand Rounds presentation
Using Real-World Data to Plan Eligibility Criteria and Enhance Recruitment
NIH Pragmatic Trials Collaboratory PCT Grand Rounds; July 31, 2020

REFERENCES

back to top

Durham E, Xue Y, Kantarcioglu M, Malin B. 2010. Private medical record linkage with approximate matching. AMIA Annu Symp Proc. 2010:182-186. PMID: 21346965.

Durojaiye AB, Puett LL, Levin S, et al. 2018. Linking electronic health record and trauma registry data: assessing the value of probabilistic linkage. Methods Inf Med. 57:261-269. doi:10.1055/s-0039-1681087. PMID: 30875705.

Grannis SJ, Overhage JM, McDonald CJ. 2002. Analysis of identifier performance using a deterministic linkage algorithm. Proc AMIA Symp. 2002:305-309. PMID: 12463836.

Hersh WR, Weiner MG, Embi PJ, et al. 2013. Caveats for the use of operational electronic health record data in comparative effectiveness research. Med Care. 51:S30-S37. doi:10.1097/MLR.0b013e31829b1dbd. PMID: 23774517.

Kahn MG, Ranade D. 2010. The impact of electronic medical records data sources on an adverse drug event quality measure. J Am Med Inform Assoc. 17:185-191. doi:10.1136/jamia.2009.002451. PMID: 20190062.

Karr AF, Taylor MT, West SL, et al. 2019. Comparing record linkage software programs and algorithms using real-world data. PLoS One. 14:e0221459. doi:10.1371/journal.pone.0221459. PMID: 32352389.

Kum H-C, Krishnamurthy A, Machanavajjhala A, Reiter MK, Ahalt S. 2014. Privacy preserving interactive record linkage (PPIRL). J Am Med Inform Assoc. 21:212-220. doi:10.1136/amiajnl-2013-002165. PMID: 24201028.

Raebel MA, Haynes K, Woodworth TS, et al. 2014. Electronic clinical laboratory test results data tables: lessons from Mini-Sentinel. Pharmacoepidemiol Drug Saf. 23:609-618. doi:10.1002/pds.3580. PMID: 24677577.

Rockhold FW, Tenenbaum JD, Richesson R, Marsolo KA, O'Brien EC. 2020. Design and analytic considerations for using patient-reported health data in pragmatic clinical trials: report from an NIH Collaboratory roundtable. J Am Med Inform Assoc. 27:634-638. doi:10.1093/jamia/ocz226. PMID: 32027359.

back to top

Rusanov A, Weiskopf NG, Wang S, Weng C. 2014. Hidden in plain sight: bias towards sick patients when sampling patients with sufficient electronic health record data for research. BMC Med Inform Decis Mak. 14:51. doi:10.1186/1472-6947-14-51. PMID: 24916006.

Setoguchi S, Zhu Y, Jalbert JJ, Williams LA, Chen C-Y. 2014. Validity of deterministic record linkage using multiple indirect personal identifiers: linking a large registry to claims data. Circ Cardiovasc Qual Outcomes. 7:475-480. doi:10.1161/CIRCOUTCOMES.113.000294. PMID: 24755909.

US Food and Drug Administration. Guidance for Industry. 2009. Patient-Reported Outcome Measures: Use in Medical Product Development to Support Labeling Claims. https://www.fda.gov/regulatory-information/search-fda-guidance-documents/patient-reported-outcome-measures-use-medical-product-development-support-labeling-claims. Accessed August 21, 2020.

US Food and Drug Administration. 2017. Use of Real-World Evidence to Support Regulatory Decision-Making for Medical Devices Guidance for Industry and Food and Drug Administration Staff. https://www.fda.gov/science-research/science-and-research-special-topics/real-world-evidence. Accessed August 20, 2020.

US Food and Drug Administration. 2021. Real-World Data: Assessing Electronic Health Records and Medical Claims Data To Support Regulatory Decision-Making for Drug and Biological Products—Draft Guidance for Industry. https://www.fda.gov/regulatory-information/search-fda-guidance-documents/real-world-data-assessing-electronic-health-records-and-medical-claims-data-support-regulatory. Accessed July 21, 2022.

Weiskopf NG, Hripcsak G, Swaminathan S, Weng C. 2013. Defining and measuring completeness of electronic health records for secondary use. J Biomed Inform. 46:830-836. doi:10.1016/j.jbi.2013.06.010. PMID: 23820016.


Version History

October 14, 2022: Made nonsubstantive changes to the text, added an image and updated links in the Resources sidebar, and added Seils as a contributing editor as part of the annual content update (changes made by D. Seils).

January 18, 2021: Added EHR Workshop video module to the resource bar (changes make by K. Staman).

Published August 25, 2020

current section :

Common Real-World Data Sources

  1. Introduction
  2. Common Real-World Data Sources
  3. Data Formats
  4. Acquiring Electronic Health Record Data
  5. Acquiring Claims Data and CMS Research-Identifiable Files
  6. Acquiring Patient-Reported Data
  7. Gaining Permission to Use Real-World Data
  8. Methods of Access
  9. Case Study: The IMPACT-AFib Trial

Citation:

Marsolo KA, Richesson RL, Hammond WE, et al. Acquiring Real-World Data: Common Real-World Data Sources. In: Rethinking Clinical Trials: A Living Textbook of Pragmatic Clinical Trials. Bethesda, MD: NIH Pragmatic Trials Collaboratory. Available at: https://rethinkingclinicaltrials.org/chapters/conduct/acquiring-real-world-data/common-real-world-data-sources/. Updated December 3, 2025. DOI: 10.28929/179.

Footer Menu

  • How to Use This Site
  • About NIH Collaboratory
  • Enrollment Reporting
  • Grand Rounds
  • Funding Statement
Link to Twitter Link to LinkedIn Link to Blue Sky Link to NIH Collaboratory email

Reference in this Web site to any specific commercial products, process, service, manufacturer, or company does not constitute its endorsement or recommendation by the U.S. Government or National Institutes of Health (NIH). NIH is not responsible for the contents of any “off-site” Web page referenced from this server.

Log in
Privacy Statement
WordPress is a content management system and should not be used to upload any PHI as it is not an environment for which we exercise oversight, meaning you the author are responsible for the content you post. Please use this system accordingly. Site Map