Skip to content

COVID-19 Resources

Access the latest information on COVID-19 for clinical researchers
  • Home
  • About
    • NIH Collaboratory
      • Coordinating Center
      • NIH Collaboratory Trials
      • Core Working Groups
      • Steering Committee
      • Distributed Research Network
      • Our Impact
    • Living Textbook
      • Table of Contents
      • How to Use This Site
  • Resources
    • Data and Resource Sharing
    • Training Resources
    • Tools for Researchers
    • Publications
    • Knowledge Repository
  • Webinar
  • Podcast
  • News
    • News Feed
    • Calendar
    • Subscribe
return to home
Subscribe to Newsletter go to twitter feed go to linkedin go to blue sky feed
Search
NIH Collaboratory
Living Textbook of
Pragmatic Clinical Trials

COVID-19 Resources

Access the latest information on COVID-19 for clinical researchers
home button

Rethinking Clinical Trials

A Living Textbook of Pragmatic Clinical Trials

  • Design
    • What is a Pragmatic Clinical Trial?
    • Decentralized Pragmatic Clinical Trials
    • Developing a Compelling Grant Application
    • Experimental Designs and Randomization Schemes
    • Endpoints and Outcomes
    • Analysis Plan
    • Using Electronic Health Record Data
    • Building Partnerships and Teams to Ensure a Successful Trial
    • Intervention Delivery and Complexity
    • Patient Engagement
  • Data, Tools & Conduct
    • Assessing Feasibility
    • Acquiring Real-World Data
    • Assessing Fitness-for-Use of Real-World Data
    • Study Startup
    • Participant Recruitment
    • Monitoring Intervention Fidelity and Adaptations
    • Patient-Reported Outcomes
    • Clinical Decision Support
    • Mobile Health
    • Electronic Health Records–Based Phenotyping
    • Navigating the Unknown
  • Dissemination & Implementation
    • Data Sharing and Embedded Research
    • Dissemination Approaches for Different Audiences
    • Implementation
    • End-of-Trial Decision-Making
  • Ethics & Regulatory
    • Privacy Considerations
    • Identifying Those Engaged in Research
    • Collateral Findings
    • Consent, Disclosure, and Non-Disclosure
    • Data and Safety Monitoring
    • Ethical Considerations of Data Sharing in Pragmatic Clinical Trials
    • Ethics for AI and ML
    • IRB Responsibilities and Procedures

Data Provenance

CHAPTER SECTIONS

Assessing Fitness for Use of Real-World Data Sources


Section 7

Data Provenance

Expand Contributors

Keith A. Marsolo, PhD
Rachel Richesson, MS, PhD, MPH
Bradley G Hammill, DrPH, MA
Emily O’Brien, PhD
Michelle Smerek, BS
Lesley Curtis, PhD

Contributing Editor
Karen Staman, MS

There is widespread variability in how information is captured in EHRs within and across healthcare systems. The same is true with how administrative claims are processed by health insurance providers. There is also variability in how data at sites are mapped between source systems and the value sets within a dataset or CDM. Knowledge about data collection practices and the decisions made in the source system–to–CDM translation can provide additional insight and context into the reliability of a dataset as it relates to data accrual (Johnson et al 2014). Many DRNs, for instance, ask their collaborators to complete surveys that describe the provenance of their data sources and to provide detail about the characteristics of their clinical workflows and/or source systems (Qualls et al 2018). Similarly, the NIH Collaboratory data quality white paper includes a recommendation that researchers explore the data collection and transformation procedures at each site, though this can be a highly manual process involving interviews and discussions between local data experts and the research team. It is important not to make this process overly burdensome, however, as these surveys should not be a onetime event, given that clinical systems change over time. Having multiple responses can provide a more complete longitudinal history. Provenance information can also be incorporated at the record level, for instance by assigning values as part of the data transformation process (eg, whether a diagnosis originated in a billing system, was entered by a clinician, or was derived from a note through natural language processing). These details are important because datasets that include records from one source and not the others or includes records from several sources that are not distinguished from one another will end up generating very different profiles.

SECTIONS

CHAPTER SECTIONS

sections

  1. Introduction
  2. Defining Fitness for Use
  3. Evaluating Fitness for Use
  4. Data Quality Measures
  5. Use of Medicare Data in PCTs
  6. Data Source Accuracy: Case Study from TRANSLATE-ACS
  7. Data Provenance
  8. Operationalizing Fitness-for-Use Assessments

Resources

Evaluation System for Health Technology Coordinating Center (NESTcc) Data Quality Framework; NEST Coordinating Center; 2019.

REFERENCES

back to top

Johnson KE, Kamineni A, Fuller S, Olmstead D, Wernli KJ. 2014. How the provenance of electronic health record data matters for research: a case example using system mapping. EGEMS (Washington, DC). 2:1058. doi:10.13063/2327-9214.1058. PMID: Medline:25821838.

back to top

Qualls LG, Phillips TA, Topping J, et al. 2018. Evaluating Foundational Data Quality in the National Patient-Centered Clinical Research Network (PCORnet). eGEMs (Generating Evidence & Methods to improve patient outcomes.  Apr 13;6(1):3. doi:10.5334/egems.199. PMID: 29881761.

 


Version History

Published August 25, 2020

current section :

Data Provenance

  1. Introduction
  2. Defining Fitness for Use
  3. Evaluating Fitness for Use
  4. Data Quality Measures
  5. Use of Medicare Data in PCTs
  6. Data Source Accuracy: Case Study from TRANSLATE-ACS
  7. Data Provenance
  8. Operationalizing Fitness-for-Use Assessments

Citation:

Marsolo KA, Richesson RL, Hammill BG, et al. Assessing Fitness for Use of Real-World Data Sources: Data Provenance. In: Rethinking Clinical Trials: A Living Textbook of Pragmatic Clinical Trials. Bethesda, MD: NIH Pragmatic Trials Collaboratory. Available at: https://rethinkingclinicaltrials.org/chapters/conduct/assessing-fitness-for-use-of-real-world-data-sources/data-provenance/. Updated December 3, 2025. DOI: 10.28929/188.

Footer Menu

  • How to Use This Site
  • About NIH Collaboratory
  • Enrollment Reporting
  • Grand Rounds
  • Funding Statement
Link to Twitter Link to LinkedIn Link to Blue Sky Link to NIH Collaboratory email

Reference in this Web site to any specific commercial products, process, service, manufacturer, or company does not constitute its endorsement or recommendation by the U.S. Government or National Institutes of Health (NIH). NIH is not responsible for the contents of any “off-site” Web page referenced from this server.

Log in
Privacy Statement
WordPress is a content management system and should not be used to upload any PHI as it is not an environment for which we exercise oversight, meaning you the author are responsible for the content you post. Please use this system accordingly. Site Map