Skip to content

COVID-19 Resources

Access the latest information on COVID-19 for clinical researchers
  • Home
  • About
    • NIH Collaboratory
      • Coordinating Center
      • NIH Collaboratory Trials
      • Core Working Groups
      • Steering Committee
      • Distributed Research Network
      • Our Impact
    • Living Textbook
      • Table of Contents
      • How to Use This Site
  • Resources
    • Data and Resource Sharing
    • Training Resources
    • Tools for Researchers
    • Publications
    • Knowledge Repository
  • Webinar
  • Podcast
  • News
    • News Feed
    • Calendar
    • Subscribe
return to home
Subscribe to Newsletter go to twitter feed go to linkedin go to blue sky feed
Search
NIH Collaboratory
Living Textbook of
Pragmatic Clinical Trials

COVID-19 Resources

Access the latest information on COVID-19 for clinical researchers
home button

Rethinking Clinical Trials

A Living Textbook of Pragmatic Clinical Trials

  • Design
    • What is a Pragmatic Clinical Trial?
    • Decentralized Pragmatic Clinical Trials
    • Developing a Compelling Grant Application
    • Experimental Designs and Randomization Schemes
    • Endpoints and Outcomes
    • Analysis Plan
    • Using Electronic Health Record Data
    • Building Partnerships and Teams to Ensure a Successful Trial
    • Intervention Delivery and Complexity
    • Patient Engagement
  • Data, Tools & Conduct
    • Assessing Feasibility
    • Acquiring Real-World Data
    • Assessing Fitness-for-Use of Real-World Data
    • Study Startup
    • Participant Recruitment
    • Monitoring Intervention Fidelity and Adaptations
    • Patient-Reported Outcomes
    • Clinical Decision Support
    • Mobile Health
    • Electronic Health Records–Based Phenotyping
    • Navigating the Unknown
  • Dissemination & Implementation
    • Data Sharing and Embedded Research
    • Dissemination Approaches for Different Audiences
    • Implementation
    • End-of-Trial Decision-Making
  • Ethics & Regulatory
    • Privacy Considerations
    • Identifying Those Engaged in Research
    • Collateral Findings
    • Consent, Disclosure, and Non-Disclosure
    • Data and Safety Monitoring
    • Ethical Considerations of Data Sharing in Pragmatic Clinical Trials
    • Ethics for AI and ML
    • IRB Responsibilities and Procedures

Defining Fitness for Use

CHAPTER SECTIONS

Assessing Fitness for Use of Real-World Data Sources


Section 2

Defining Fitness for Use

Expand Contributors

Keith A. Marsolo, PhD
Rachel Richesson, MS, PhD, MPH
Bradley G Hammill, DrPH, MA
Emily O’Brien, PhD
Michelle Smerek, BS
Lesley Curtis, PhD

Contributing Editor
Karen Staman, MS

Given the widespread adoption of electronic health records (EHRs) and the general availability of administrative claims in electronic formats, there is a corresponding interest in leveraging these data sources for research (Bayley et al 2013; Botsis et al 2010; Coorevits et al 2013; Etheredge 2007; Friedman et al 2010; Hersh et al 2013; Jensen et al 2012; Weiner and Embi 2009). This interest has spurred the development of approaches to assess the underlying data quality, often by defining data checks that can be executed against a dataset (Kahn and Todd 2008; Kahn et al 2010; Kahn et al 2012; Khare et al 2017; Qualls et al 2018; Rogers et al 2019).

Data checks, or metrics, can describe characteristics of a dataset, including missing values, outliers, and frequency distributions. However, determining whether the result or value of a particular metric is good or bad depends on the needs of the research project and the intended use of the data. For example, in determining eligibility for a study, it may be sufficient to simply know if a patient has an available laboratory result, as study coordinators will need to complete a screening form that involves chart review. In this case, using the presence of a result (regardless of unit or value) is an adequate filter. However, if a lab result were going to serve as the biomarker endpoint for a trial, more rigorous thresholds might be needed. For example, each result may need to have an actual value and a unit of measure and a measure of confidence in the accuracy of the result. In other words, when it comes to using real-world data in clinical research, datasets must be considered in the context of a specific project or analysis to determine whether they are suitable or fit for use.

“Fitness-for-use” is a nebulous concept, and defining it is more art than science, with few hard and fast rules established thus far. When it comes to the use of real-world evidence derived from real-world data for regulatory decision-making, the US Food and Drug Administration (FDA) has provided guidance through the recommendations contained in the Framework for FDA's Real-World Evidence Program (FDA 2018) and further highlighted in Real-World Data: Assessing Electronic Health Records and Medical Claims Data To Support Regulatory Decision-Making for Drug and Biological Products (FDA 2021). The FDA defines fitness for use in terms of relevance and reliability. Relevance “includes the availability of key data elements (exposures, outcomes, covariates) and sufficient number of representative patients for the study,” while reliability is focused on “data accuracy, completeness, provenance and traceability” (FDA 2021). The definitions of these terms are described in more detail below. The FDA has stated that it is able to respond to study teams regarding whether a specific set of assessments is sufficient to determine the fitness for use of a real-world data source for a given study or analysis. The FDA has not yet approved a given assessment package that will always be sufficient for determining suitability for all studies that use real-world data. As the field gains more experience and confidence with the use of real-world data and real-world evidence, we expect more refinement in this area.

Key Point: On a study-by-study basis, FDA will work with stakeholders to evaluate whether a given assessment is suitable for a particular research question.

Relevance

A real-world data source is said to be relevant if:

  • the data apply to question at hand;
    • For example, the data contain sufficient detail to capture the use or exposure of the the product or device and/or the outcome of interest.
  • the data are amenable to sound clinical and statistical analysis; and
    • For example, the data can be used to answer the specified question using the proposed statistical plan.
  • the data and evidence the source provides are interpretable using informed clinical and statistical judgement.
    • For example, the use of a device or product in a real-world population is representative of what is captured in the data source, is generalizable to the relevant population under study, etc (FDA 2018).

The "sufficient detail" needed to capture use or exposure may depend on the intended use case. For instance, medication prescription data may be sufficient if an investigator is planning to screen patients for a trial, while dispensing or medication administration data would be a more reliable indicator for exposure as part of an outcome or endpoint. Along the same lines, "amenable to analysis" means that the data are specific enough to support the study question. For example, having death data may be sufficient for one study but not another if it is important to distinguish between all-cause mortality and mortality due to a specific cause.

Investigators will often have a general sense of the relevance of a data source before attempting to use it as part of a study, but it may be necessary to include additional analyses to better demonstrate applicability. This is particularly true for real-world data sources like EHRs. While administrative claims tend to have complete capture of all medically attended events during a given enrollment period, the same concept does not exist within EHRs. Encounters may occur outside a given health system; even within a health system, data collection within the EHR can be variable, particularly for workflows that are not tied to reimbursement. Practices vary by hospital, clinic, and/or provider, and the availability of data for longitudinal analysis may be affected by when the EHR or other clinical information system was deployed across the health system. All of these factors should be taken into account when assessing fitness for use, particularly for studies that rely on EHR data from different healthcare systems.

Reliability: Data Accrual

Data accrual relates to aspects of how the data in the source are collected or captured. Reliable documentation of data accrual methods for a real-world data source includes:

  • an operational manual that pre-specifies the data elements to be collected;
  • the definitions of those data elements;
  • methods of data aggregation, transformation, and documentation; and
  • a relevant time window, etc (FDA 2018).

This information is expected for real-world data sources like patient or device registries (Agency for Healthcare Research and Quality 2010; International Medical Device Regulators Forum Group 2015; Krucoff et al 2015; Patient-Centered Outcomes Research Institute 2012), as well as those that collect data directly as part of a study (eg, patient-reported outcomes or patient-generated data). Secondary real-world data sources like EHRs and administrative claims lack many of these characteristics, though it is possible to approximate some aspects through items like data dictionaries or data model specifications, provenance surveys that detail the source of certain data elements, and workflow descriptions that describe how data elements were captured over time, including any changes or modifications (eg, patient-reported outcomes initially being captured in an in-clinic kiosk in the waiting room and later supporting the ability to have patients complete from home via questionnaires delivered through the healthcare system’s patient portal). Documentation of the procedures and specifications used to translate EHR data from the source system to the target database (eg, a common data model or database extract) can provide further insight into the practices of data accrual.

Reliability

For secondary data sources like EHRs and administrative claims, data reliability concerns aspects of data quality and provenance over the “life cycle” of the data, or the steps that occur as data are curated and transformed from initial capture in the source system(s) to data repositories/common data models to a final analytic dataset. Activities to ensure data reliability include the execution of data checks that can describe the completeness, conformance, and plausibility of the data (see Section 4), and documentation of the data quality processes for the various transformation steps along the data life cycle to ensure the overall validity and integrity of the data (see Section 7).

Real-world data sources like patient or device registries (Gliklich et al 2010; International Medical Device Regulators Forum Group 2015; Krucoff et al 2015; Patient-Centered Outcomes Research Institute 2012), as well as data sources that are collected directly as part of a study (eg, patient-reported outcomes or patient-generated data), can provide a template for the types of information that should be documented to demonstrate reliability. While secondary real-world data sources like EHRs and administrative claims lack some of the characteristics of these data sources, it is possible to approximate some aspects through items like data dictionaries or data model specifications, provenance surveys that detail the source of certain data elements, and workflow descriptions that describe how data elements were captured over time, including any changes or modifications (eg, patient-reported outcomes initially being captured in an in-clinic kiosk in the waiting room and later supporting the ability to have patients complete from home via questionnaires delivered through the healthcare system’s patient portal). Documentation of the procedures and specifications used to translate EHR data from the source system to the target database (eg, a common data model or database extract) can provide further insight into the practices of data accrual.

 

SECTIONS

CHAPTER SECTIONS

sections

  1. Introduction
  2. Defining Fitness for Use
  3. Evaluating Fitness for Use
  4. Data Quality Measures
  5. Use of Medicare Data in PCTs
  6. Data Source Accuracy: Case Study from TRANSLATE-ACS
  7. Data Provenance
  8. Operationalizing Fitness-for-Use Assessments

Resources

Leveraging RWE to Support Regulatory Decisions–An Update on Efforts to Inform Policy; NIH Collaboratory Grand Rounds; March 15, 2019

Expanding Use of Real-World Evidence: A National Academies Workshop Series; NIH Collaboratory Grand Rounds; April 27, 2018

REFERENCES

back to top

Gliklich RE, Dreyer NA, eds. 2010. Registries for Evaluating Patient Outcomes: A User's Guide. Rockville, Maryland: Agency for Healthcare Research and Quality. https://effectivehealthcare.ahrq.gov/products/registries-guide-4th-edition/. Accessed August 24, 2020.

Bayley KB, Belnap T, Savitz L, Masica AL, Shah N, Fleming NS. 2013. Challenges in using electronic health record data for CER experience of 4 learning organizations and solutions applied. Med Care. 51:S80-S86. doi:10.1097/MLR.0b013e31829b1d48. PMID: 23774512.

Botsis T, Hartvigsen G, Chen F, Weng C. 2010. Secondary use of EHR: data quality issues and informatics opportunities. Summit Transl Bioinform. 2010:1-5. PMID: 21347133.

Coorevits P, Sundgren M, Klein GO, et al. 2013. Electronic health records: new opportunities for clinical research. J Intern Med. 274:547-560. doi:10.1111/joim.12119. PMID: 23952476.

Etheredge LM. 2007. A rapid-learning health system. Health Aff (Millwood). 26(2):w107-w118. doi:10.1377/hlthaff.26.2.w107. PMID: 17259191.

FDA. 2018. Framework for FDA's Real-World Evidence Program. https://www.fda.gov/media/120060/download. Accessed August 25, 2020.

FDA. 2021. Real-World Data: Assessing Electronic Health Records and Medical Claims Data To Support Regulatory Decision- Making for Drug and Biological Products. https://www.fda.gov/media/152503/download Accessed August 26, 2021.

Friedman CP, Wong AK, Blumenthal D. 2010. Achieving a nationwide learning health system. Sci Transl Med. 2:57cm29. doi:10.1126/scitranslmed.3001456. PMID: 21068440.

Hersh WR, Cimino J, Payne PRO, et al. 2013. Recommendations for the use of operational electronic health record data in comparative effectiveness research. EGEMS (Washington, DC). 1(1):1018. doi:10.13063/2327-9214.1018. PMID: 25848563.

International Medical Device Regulators Forum Group. 2015. Patient Registry: Essential Principles. https://www.imdrf.org/sites/default/files/2021-09/imdrf-cons-essential-principles-151124.pdf. Accessed December 3, 2025.

Jensen PB, Jensen LJ, Brunak S. 2012. Mining electronic health records: towards better research applications and clinical care. Nat Rev Genet. 13:395-405. doi:10.1038/nrg3208. PMID: 22549152.

back to top

Kahn MG, Todd J. 2008. Comparative quality measures: putting evidence above expediency. Pediatrics. 122(1):182-183. doi:10.1542/peds.2008-1042. PMID: 18596002.

Kahn MG, Eliason BB, Bathurst J. 2010. Quantifying clinical data quality using relative gold standards. AMIA Annu Symp Proc. 2010 Nov 13:356-360. PMID: 21347000.

Kahn MG, Raebel MA, Glanz JM, Riedlinger K, Steiner JF. 2012. A pragmatic framework for single-site and multisite data quality assessment in electronic health record-based clinical research. Med Care. 50 Suppl(0):S21-S29. doi:10.1097/MLR.0b013e318257dd67. PMID: 22692254.

Khare R, Utidjian L, Ruth BJ, et al. 2017. A longitudinal analysis of data quality in a large pediatric data research network. J Am Med Inform Assoc. 24(6):1072-1079. doi:10.1093/jamia/ocx033. PMID: 28398525.

Krucoff M, Normand S, Edwards F, et al. 2015. Recommendations for a National Medical Device Evaluation System: Strategically Coordinated Registry Networks to Bridge Clinical Care and Research. https://www.fda.gov/media/93140/download. Accessed August 24, 2020.

Patient-Centered Outcomes Research Institute. 2012. Standards in the Conduct of Registry Studies for Patient-Centered Outcomes Research. PCORnet. PCORnet Common Data Model (CDM). https://pcornet.org/data-driven-common-model/. Accessed January 26, 2017.

Qualls LG, Phillips TA, Topping J, et al. 2018. Evaluating foundational data quality in the national Patient-Centered Clinical Research Network (PCORnet). EGEMS (Wash DC). 6(1):3. doi:10.5334/egems.199. PMID: 29881761.

Weiner MG, Embi PJ. 2009. Toward reuse of clinical data for research and quality improvement: the end of the beginning? Ann Intern Med. 151(5):359-360. doi:10.7326/0003-4819-151-5-200909010-00141. PMID: 19638404.


Version History

December 3, 2025: Updated hyperlinks (changes made by G. Uhlenbrauck).

August 26, 2022: Updated as part of annual review (changes made by K. Staman).

Published August 25, 2020

current section :

Defining Fitness for Use

  1. Introduction
  2. Defining Fitness for Use
  3. Evaluating Fitness for Use
  4. Data Quality Measures
  5. Use of Medicare Data in PCTs
  6. Data Source Accuracy: Case Study from TRANSLATE-ACS
  7. Data Provenance
  8. Operationalizing Fitness-for-Use Assessments

Citation:

Marsolo KA, Richesson RL, Hammill BG, et al. Assessing Fitness for Use of Real-World Data Sources: Defining Fitness for Use. In: Rethinking Clinical Trials: A Living Textbook of Pragmatic Clinical Trials. Bethesda, MD: NIH Pragmatic Trials Collaboratory. Available at: https://rethinkingclinicaltrials.org/chapters/conduct/assessing-fitness-for-use-of-real-world-data-sources/defining-fitness-for-use/. Updated December 3, 2025. DOI: 10.28929/184.

Footer Menu

  • How to Use This Site
  • About NIH Collaboratory
  • Enrollment Reporting
  • Grand Rounds
  • Funding Statement
Link to Twitter Link to LinkedIn Link to Blue Sky Link to NIH Collaboratory email

Reference in this Web site to any specific commercial products, process, service, manufacturer, or company does not constitute its endorsement or recommendation by the U.S. Government or National Institutes of Health (NIH). NIH is not responsible for the contents of any “off-site” Web page referenced from this server.

Log in
Privacy Statement
WordPress is a content management system and should not be used to upload any PHI as it is not an environment for which we exercise oversight, meaning you the author are responsible for the content you post. Please use this system accordingly. Site Map