Skip to content

COVID-19 Resources

Access the latest information on COVID-19 for clinical researchers
  • Home
  • About
    • NIH Collaboratory
      • Coordinating Center
      • NIH Collaboratory Trials
      • Core Working Groups
      • Steering Committee
      • Distributed Research Network
      • Our Impact
    • Living Textbook
      • Table of Contents
      • How to Use This Site
  • Resources
    • Data and Resource Sharing
    • Training Resources
    • Tools for Researchers
    • Publications
    • Knowledge Repository
  • Webinar
  • Podcast
  • News
    • News Feed
    • Calendar
    • Subscribe
return to home
Subscribe to Newsletter go to twitter feed go to linkedin go to blue sky feed
Search
NIH Collaboratory
Living Textbook of
Pragmatic Clinical Trials

COVID-19 Resources

Access the latest information on COVID-19 for clinical researchers
home button

Rethinking Clinical Trials

A Living Textbook of Pragmatic Clinical Trials

  • Design
    • What is a Pragmatic Clinical Trial?
    • Decentralized Pragmatic Clinical Trials
    • Developing a Compelling Grant Application
    • Experimental Designs and Randomization Schemes
    • Endpoints and Outcomes
    • Analysis Plan
    • Using Electronic Health Record Data
    • Building Partnerships and Teams to Ensure a Successful Trial
    • Intervention Delivery and Complexity
    • Patient Engagement
  • Data, Tools & Conduct
    • Assessing Feasibility
    • Acquiring Real-World Data
    • Assessing Fitness-for-Use of Real-World Data
    • Study Startup
    • Participant Recruitment
    • Monitoring Intervention Fidelity and Adaptations
    • Patient-Reported Outcomes
    • Clinical Decision Support
    • Mobile Health
    • Electronic Health Records–Based Phenotyping
    • Navigating the Unknown
  • Dissemination & Implementation
    • Data Sharing and Embedded Research
    • Dissemination Approaches for Different Audiences
    • Implementation
    • End-of-Trial Decision-Making
  • Ethics & Regulatory
    • Privacy Considerations
    • Identifying Those Engaged in Research
    • Collateral Findings
    • Consent, Disclosure, and Non-Disclosure
    • Data and Safety Monitoring
    • Ethical Considerations of Data Sharing in Pragmatic Clinical Trials
    • Ethics for AI and ML
    • IRB Responsibilities and Procedures

Using Phenotypes in PCTs—How Do I Get Started?

CHAPTER SECTIONS

Electronic Health Records–Based Phenotyping


Section 6

Using Phenotypes in PCTs—How Do I Get Started?

Expand Contributors

Rachel L. Richesson, PhD, MPH
Laura K. Wiley, PhD
Sigfried Gold, MA, MFA
Luke Rasmussen, MS
For the NIH Pragmatic Trials Collaboratory Electronic Health Records Core Working Group
See the Acknowledgments for additional contributors.

Contributing Editors
Damon M. Seils, MA
Gina Uhlenbrauck

Before beginning development of any phenotype definition, researchers should search for existing phenotype definitions and consider their performance in validation testing. They should then assess the candidate phenotype definitions for feasibility in particular settings (for example, determining whether available domains match the authoritative source phenotype definition). If a suitable phenotype definition cannot be found from authoritative sources, then a definition must be developed and validated. In any case, once a candidate phenotype definition is identified, it must be validated against a gold standard in clinical populations, as shown in the figure below.

Figure. Phenotype Evaluation Process

 

The Figure is a flow diagram of the phenotype evaluation process.
Abbreviations: AHRQ, Agency for Healthcare Research and Quality; CMS, Center for Medicare and Medicaid Services. Adapted with permission from Shelley Rusincovitch, Center for Predictive Medicine, Duke Clinical Research Institute.

If a new phenotype definition is needed, the researchers must first operationalize a disease concept against electronic health record (EHR) data. The researchers must explicitly define how a concept should be measured, observed, or manipulated within a particular study and available data sources. A theoretical or conceptual variable of interest (such as a disease) must be translated into a set of specific diagnoses or procedures paired with implementation specifications that define the variable's meaning in a specific study. In the context of healthcare data, this means explicitly defining diagnoses, treatments, and clinical and patient characteristics that are indicative or suggestive of the condition. The researchers must specify the clinical condition they are looking for and how the condition would be represented in various EHRs.

For example, to identify obesity, the researchers would first identify diagnostic and procedure codes for the condition and investigate whether the codes are reliable and are applied consistently. If the researchers cannot reasonably assume that all patients with obesity would be coded with a given diagnosis or procedure code, they must use other data sources.

The next step is to review the available data sources (such as EHR data, claims data, registry data, and patient-reported outcomes data). If a phenotype definition is to be applied in multiple organizations, the researchers must consider the data sources that are available in other organizations. Possible data sources for obesity might include patient height and weight, the ordering or dispensing of medications associated with weight management, or patient-reported data on weight or a previous diagnosis of obesity. It is also important to consider other factors that may affect these measurements (such as the effect of pregnancy on weight, or the effect of amputation on height). Within each data type, the researchers should identify which data are available to them (for example, some EHR data include medication orders but not administration data, or billing diagnoses rather than problem lists). Knowing the types of data available can support an early feasibility assessment of existing phenotype definitions.

Previous Section

SECTIONS

CHAPTER SECTIONS

sections

  1. Introduction
  2. Definitions
  3. Finding Existing Phenotype Definitions
  4. Evaluating Phenotype Definitions
  5. Data Quality
  6. Using Phenotypes in PCTs—How Do I Get Started?

Resources

A User’s Guide to Computable Phenotypes
Master’s thesis providing a practical framework to help physicians, clinical researchers, and informaticians evaluate published phenotype algorithms for reuse for various purposes. The framework is divided into 3 phases aligned with expected user roles: overall assessment, clinical validation, and technical review.

ACKNOWLEDGMENTS

back to top

Key contributors to previous versions of this chapter included Michelle Smerek, Shelley Rusincovitch, Meredith Nahm Zozus, Paramita Saha Chaudhuri, Ed Hammond, Robert Califf, Greg Simon, Beverly Green, Michael Kahn, and Reesa Laws.

back to top

The Electronic Health Records Core Working Group (formerly the Phenotypes, Data Standards, and Data Quality Core Working Group) of the NIH Collaboratory influenced much of this content through monthly meetings. These additional contributors included Monique Anderson, Nick Anderson, Alan Bauck, Denise Cifelli, Lesley Curtis, John Dickerson, Chris Helker, Michael Kahn, Cindy Kluchar, Melissa Leventhal, Rosemary Madigan, Renee Pridgen, Jon Puro, Jennifer Robinson, Jerry Sheehan, and Kari Stephens. We are also grateful to the Duke Center for Predictive Medicine for development and clarification of the scientific validity and evaluation of phenotype definitions.


Version History

June 23, 2022: Updated the name of the NIH Collaboratory in the contributors list, added a Resources sidebar, and made nonsubstantive changes to the text as part of the annual content update (changes made by D. Seils).

July 22, 2020: Added the alt text attribute and corrected the caption for the Figure (changes made by D. Seils).

July 8, 2020: Updated links in the list of contributors; and made minor corrections to layout and formatting (changes made by D. Seils).

July 1, 2020: Minor corrections to layout and formatting (changes made by D. Seils).

Published June 30, 2020

current section :

Using Phenotypes in PCTs—How Do I Get Started?

  1. Introduction
  2. Definitions
  3. Finding Existing Phenotype Definitions
  4. Evaluating Phenotype Definitions
  5. Data Quality
  6. Using Phenotypes in PCTs—How Do I Get Started?

Citation:

Richesson R, Wiley LK, Gold S, Rasmussen L; for the NIH Health Care Systems Research Collaboratory Electronic Health Records Core Working Group. Electronic Health Records–Based Phenotyping: Using Phenotypes in PCTs—How Do I Get Started?. In: Rethinking Clinical Trials: A Living Textbook of Pragmatic Clinical Trials. Bethesda, MD: NIH Pragmatic Trials Collaboratory. Available at: https://rethinkingclinicaltrials.org/chapters/conduct/electronic-health-records-based-phenotyping/using-phenotypes-in-pcts-how-do-i-get-started/. Updated December 3, 2025. DOI: 10.28929/148.

Footer Menu

  • How to Use This Site
  • About NIH Collaboratory
  • Enrollment Reporting
  • Grand Rounds
  • Funding Statement
Link to Twitter Link to LinkedIn Link to Blue Sky Link to NIH Collaboratory email

Reference in this Web site to any specific commercial products, process, service, manufacturer, or company does not constitute its endorsement or recommendation by the U.S. Government or National Institutes of Health (NIH). NIH is not responsible for the contents of any “off-site” Web page referenced from this server.

Log in
Privacy Statement
WordPress is a content management system and should not be used to upload any PHI as it is not an environment for which we exercise oversight, meaning you the author are responsible for the content you post. Please use this system accordingly. Site Map