Skip to content

COVID-19 Resources

Access the latest information on COVID-19 for clinical researchers
  • Home
  • About
    • NIH Collaboratory
      • Coordinating Center
      • NIH Collaboratory Trials
      • Core Working Groups
      • Steering Committee
      • Distributed Research Network
      • Our Impact
    • Living Textbook
      • Table of Contents
      • How to Use This Site
  • Resources
    • Data and Resource Sharing
    • Training Resources
    • Tools for Researchers
    • Publications
    • Knowledge Repository
  • Webinar
  • Podcast
  • News
    • News Feed
    • Calendar
    • Subscribe
return to home
Subscribe to Newsletter go to twitter feed go to linkedin go to blue sky feed
Search
NIH Collaboratory
Living Textbook of
Pragmatic Clinical Trials

COVID-19 Resources

Access the latest information on COVID-19 for clinical researchers
home button

Rethinking Clinical Trials

A Living Textbook of Pragmatic Clinical Trials

  • Design
    • What is a Pragmatic Clinical Trial?
    • Decentralized Pragmatic Clinical Trials
    • Developing a Compelling Grant Application
    • Experimental Designs and Randomization Schemes
    • Endpoints and Outcomes
    • Analysis Plan
    • Using Electronic Health Record Data
    • Building Partnerships and Teams to Ensure a Successful Trial
    • Intervention Delivery and Complexity
    • Patient Engagement
  • Data, Tools & Conduct
    • Assessing Feasibility
    • Acquiring Real-World Data
    • Assessing Fitness-for-Use of Real-World Data
    • Study Startup
    • Participant Recruitment
    • Monitoring Intervention Fidelity and Adaptations
    • Patient-Reported Outcomes
    • Clinical Decision Support
    • Mobile Health
    • Electronic Health Records–Based Phenotyping
    • Navigating the Unknown
  • Dissemination & Implementation
    • Data Sharing and Embedded Research
    • Dissemination Approaches for Different Audiences
    • Implementation
    • End-of-Trial Decision-Making
  • Ethics & Regulatory
    • Privacy Considerations
    • Identifying Those Engaged in Research
    • Collateral Findings
    • Consent, Disclosure, and Non-Disclosure
    • Data and Safety Monitoring
    • Ethical Considerations of Data Sharing in Pragmatic Clinical Trials
    • Ethics for AI and ML
    • IRB Responsibilities and Procedures

Data Quality

CHAPTER SECTIONS

Electronic Health Records–Based Phenotyping


Section 5

Data Quality

Expand Contributors

Rachel L. Richesson, PhD, MPH
Laura K. Wiley, PhD
Sigfried Gold, MA, MFA
Luke Rasmussen, MS
For the NIH Pragmatic Trials Collaboratory Electronic Health Records Core Working Group
See the Acknowledgments for additional contributors.

Contributing Editors
Damon M. Seils, MA
Gina Uhlenbrauck

The quality of the data in healthcare information systems has the potential to affect the results of phenotype-based queries in such a way that the resulting data may not be useful. Secondary use of healthcare data is defined as use of the data for a purpose other than that for which the data were originally collected (Safran et al 2007). This means that secondary users should not expect the data to meet their needs. For these reasons, data quality assessment should accompany phenotype validation.

Using healthcare data in the absence of an understanding of their accuracy, consistency, missingness, and possible biases can lead to misleading answers. The capacity of the data to support research conclusions is so important that requests for applications for the NIH Pragmatic Trials Collaboratory Trials require that data validation be addressed. A recent methodology report from the Patient-Centered Outcomes Research Institute (PCORI) (Kahn et al 2018) recommends reporting of data quality along with study results for observational and comparative effectiveness research. The report also provides a data quality assessment model and framework. Other guidelines from research networks provide practical advice for data quality checks and reporting (Brown, Kahn, and Toh 2013; Kahn et al 2015).

The NIH Pragmatic Trials Collaboratory has developed a data quality assessment framework to help investigators and research teams identify and implement necessary assessments. (See “Assessing Data Quality for Healthcare Systems Data Used in Clinical Research.”) There are few validated electronic methods for data quality assessment that can be executed on a dataset. Instead, current methods for data quality assessment are comparison-based, involving comparison of chart review to data returned from a phenotype-based query, or comparison of 2 datasets to quantify the number and type of discrepancies and understand how they might be distributed in a dataset.

Previous Section Next Section

SECTIONS

CHAPTER SECTIONS

sections

  1. Introduction
  2. Definitions
  3. Finding Existing Phenotype Definitions
  4. Evaluating Phenotype Definitions
  5. Data Quality
  6. Using Phenotypes in PCTs—How Do I Get Started?

Resources

Assessing Data Quality for Healthcare Systems Data Used in Clinical Research
Guidance document from the NIH Collaboratory's Electronic Health Records Core Working Group.

REFERENCES

back to top

Brown JS, Kahn M, Toh S. 2013. Data quality assessment for comparative effectiveness research in distributed data networks. Med Care. 51(8 Suppl 3):S22-S29. doi:10.1097/MLR.0b013e31829b1e2c. PMID: 23793049.

Kahn MG, Brown JS, Chun AT, et al. 2015. Transparent reporting of data quality in distributed data networks. EGEMS (Wash DC). 3(1):1052. doi:10.13063/2327-9214.1052. PMID: 25992385.

back to top

Kahn M, Ong T, Barnard J, Maertens J. 2018. Developing Standards for Improving Measurement and Reporting of Data Quality in Health Research. Washington, DC: Patient-Centered Outcomes Research Institute. https://doi.org/10.25302/3.2018.ME.13035581. Accessed June 30, 2020.

Safran C, Bloomrosen M, Hammond WE, et al. 2007. Toward a national framework for the secondary use of health data: an American Medical Informatics Association white paper. J Am Med Inform Assoc. 14:1-9. doi:10.1197/jamia.M2273. PMID: 17077452.

ACKNOWLEDGMENTS

back to top

Key contributors to previous versions of this chapter included Michelle Smerek, Shelley Rusincovitch, Meredith Nahm Zozus, Paramita Saha Chaudhuri, Ed Hammond, Robert Califf, Greg Simon, Beverly Green, Michael Kahn, and Reesa Laws.

back to top

The Electronic Health Records Core Working Group (formerly the Phenotypes, Data Standards, and Data Quality Core Working Group) of the NIH Collaboratory influenced much of this content through monthly meetings. These additional contributors included Monique Anderson, Nick Anderson, Alan Bauck, Denise Cifelli, Lesley Curtis, John Dickerson, Chris Helker, Michael Kahn, Cindy Kluchar, Melissa Leventhal, Rosemary Madigan, Renee Pridgen, Jon Puro, Jennifer Robinson, Jerry Sheehan, and Kari Stephens. We are also grateful to the Duke Center for Predictive Medicine for development and clarification of the scientific validity and evaluation of phenotype definitions.


Version History

June 23, 2022: Updated the name of the NIH Collaboratory in the contributors list and throughout the text as part of the annual content update (changes made by D. Seils).

July 8, 2020: Updated links in the list of contributors; and made nonsubstantive corrections to the text (changes made by D. Seils).

July 1, 2020: Addition of Resources sidebar; and minor corrections to layout and formatting (changes made by D. Seils).

Published June 30, 2020

current section :

Data Quality

  1. Introduction
  2. Definitions
  3. Finding Existing Phenotype Definitions
  4. Evaluating Phenotype Definitions
  5. Data Quality
  6. Using Phenotypes in PCTs—How Do I Get Started?

Citation:

Richesson R, Wiley LK, Gold S, Rasmussen L; for the NIH Health Care Systems Research Collaboratory Electronic Health Records Core Working Group. Electronic Health Records–Based Phenotyping: Data Quality. In: Rethinking Clinical Trials: A Living Textbook of Pragmatic Clinical Trials. Bethesda, MD: NIH Pragmatic Trials Collaboratory. Available at: https://rethinkingclinicaltrials.org/chapters/conduct/electronic-health-records-based-phenotyping/data-quality/. Updated December 3, 2025. DOI: 10.28929/147.

Footer Menu

  • How to Use This Site
  • About NIH Collaboratory
  • Enrollment Reporting
  • Grand Rounds
  • Funding Statement
Link to Twitter Link to LinkedIn Link to Blue Sky Link to NIH Collaboratory email

Reference in this Web site to any specific commercial products, process, service, manufacturer, or company does not constitute its endorsement or recommendation by the U.S. Government or National Institutes of Health (NIH). NIH is not responsible for the contents of any “off-site” Web page referenced from this server.

Log in
Privacy Statement
WordPress is a content management system and should not be used to upload any PHI as it is not an environment for which we exercise oversight, meaning you the author are responsible for the content you post. Please use this system accordingly. Site Map