Skip to content

COVID-19 Resources

Access the latest information on COVID-19 for clinical researchers
  • Home
  • About
    • NIH Collaboratory
      • Coordinating Center
      • NIH Collaboratory Trials
      • Core Working Groups
      • Steering Committee
      • Distributed Research Network
      • Our Impact
    • Living Textbook
      • Table of Contents
      • How to Use This Site
  • Resources
    • Data and Resource Sharing
    • Training Resources
    • Tools for Researchers
    • Publications
    • Knowledge Repository
  • Webinar
  • Podcast
  • News
    • News Feed
    • Calendar
    • Subscribe
return to home
Subscribe to Newsletter go to twitter feed go to linkedin go to blue sky feed
Search
NIH Collaboratory
Living Textbook of
Pragmatic Clinical Trials

COVID-19 Resources

Access the latest information on COVID-19 for clinical researchers
home button

Rethinking Clinical Trials

A Living Textbook of Pragmatic Clinical Trials

  • Design
    • What is a Pragmatic Clinical Trial?
    • Decentralized Pragmatic Clinical Trials
    • Developing a Compelling Grant Application
    • Experimental Designs and Randomization Schemes
    • Endpoints and Outcomes
    • Analysis Plan
    • Using Electronic Health Record Data
    • Building Partnerships and Teams to Ensure a Successful Trial
    • Intervention Delivery and Complexity
    • Patient Engagement
  • Data, Tools & Conduct
    • Assessing Feasibility
    • Acquiring Real-World Data
    • Assessing Fitness-for-Use of Real-World Data
    • Study Startup
    • Participant Recruitment
    • Monitoring Intervention Fidelity and Adaptations
    • Patient-Reported Outcomes
    • Clinical Decision Support
    • Mobile Health
    • Electronic Health Records–Based Phenotyping
    • Navigating the Unknown
  • Dissemination & Implementation
    • Data Sharing and Embedded Research
    • Dissemination Approaches for Different Audiences
    • Implementation
    • End-of-Trial Decision-Making
  • Ethics & Regulatory
    • Privacy Considerations
    • Identifying Those Engaged in Research
    • Collateral Findings
    • Consent, Disclosure, and Non-Disclosure
    • Data and Safety Monitoring
    • Ethical Considerations of Data Sharing in Pragmatic Clinical Trials
    • Ethics for AI and ML
    • IRB Responsibilities and Procedures

Finding Existing Phenotype Definitions

CHAPTER SECTIONS

Electronic Health Records–Based Phenotyping


Section 3

Finding Existing Phenotype Definitions

Expand Contributors

Rachel L. Richesson, PhD, MPH
Laura K. Wiley, PhD
Sigfried Gold, MA, MFA
Luke Rasmussen, MS
For the NIH Pragmatic Trials Collaboratory Electronic Health Records Core Working Group
See the Acknowledgments for additional contributors.

Contributing Editors
Damon M. Seils, MA
Gina Uhlenbrauck

Several key groups are involved in establishing phenotype definitions. This section describes some authoritative sources.

Phenotype definitions may be developed by government entities, universities, healthcare systems, professional societies, or clinical trial consortia. Some research networks, such as PCORnet and OHDSI, are committed to a common data model, and phenotype definitions can be more easily shared if they use that model. (See the Inpatient Endpoints in Pragmatic Clinical Trials section in the Choosing and Specifying Endpoints and Outcomes chapter of the Living Textbook for more about the use of common data models in extracting data from electronic health records [EHRs].)

The NIH Pragmatic Trials Collaboratory is aware of the many related efforts and the dynamic nature of this field, and is continually surveying for phenotype-related efforts in an attempt to keep this work in context while preventing duplication of previous efforts.

Chronic Conditions Data Warehouse

The Centers for Medicare & Medicaid Services developed the Chronic Conditions Data Warehouse to enable research on 27 chronic conditions that they determined to be of particular importance to Medicare beneficiaries. This resource includes the algorithms that define the 27 chronic conditions, as well as links to the references used in the creation of the categories.

Clinical Classifications Software

The Healthcare Cost and Utilization Project is a well-established collection of databases and tools sponsored by the Agency for Healthcare Research and Quality. This project has produced Clinical Classifications Software that groups ICD-9-CM codes, ICD-10 codes, and other systems into clinically meaningful categories.

Electronic Medical Records and Genomics (eMERGE) Network

The Electronic Medical Records and Genomics (eMERGE) Network was organized by the National Human Genome Research Institute to connect EHR data with specimens from biorepositories to enable genetic research. The ultimate goal of the network is to provide genetic data for clinical care or personalized medicine. Equipped with genotyping data made available by the advent of the genome-wide association studies era, researchers are now turning to the expanding volume of clinical data in EHRs to identify genotype–phenotype associations. PheKB is a collaborative environment, organized via a website and facilitated by the eMERGE consortium, that enables access to validated phenotype definitions (“algorithms”), validation of existing phenotype algorithms on EHRs, collaboration on existing and new phenotype algorithms, and interaction with potential phenotype algorithm collaborators.

Sentinel Initiative

The Sentinel Initiative is a project sponsored by the US Food and Drug Administration with the goal of creating a system of safety surveillance for drugs and medical devices after they have been approved for marketing (“postmarket surveillance”). The phenotyping efforts of the Sentinel Initiative include the accurate identification and characterization of clinical outcomes experienced by people who are using a specific FDA-regulated device or drug. Additional background information, methods, and protocols can be accessed on the Sentinel Initiative website.

QualityNet

QualityNet is an effort sponsored by the Centers for Medicare & Medicaid Services to improve the quality of healthcare for Medicare beneficiaries. QualityNet provides a secure environment for the exchange of healthcare information, as well as tools and quality improvement news and information. QualityNet provides specifications for reporting quality measures that include definitions of clinical populations using standardized coding systems used in healthcare claims data.

Strategic Health IT Advanced Research Projects (SHARP)

The Strategic Health IT Advanced Research Projects (SHARP) program was established by the ONC to facilitate research that would enable increased adoption of health information technology. Area 4 of the SHARP project, known as SHARPn, focused on enabling secondary use of EHR data. The SHARPn group developed the Phenotype Portal for “generating and executing Meaningful Use standards–based phenotyping algorithms that can be shared across multiple institutions and investigators.”

Value Set Authority Center (VSAC)

The Value Set Authority Center (VSAC) is a repository hosted by the National Library of Medicine in collaboration with the ONC and the Centers for Medicare & Medicaid Services. The VSAC provides access to official versions of all value sets contained in the Meaningful Use 2014 Clinical Quality Measures (CQMs). Each value set consists of the alphanumeric values (“codes”) and their respective human-readable names (“terms”). The value sets are derived from standard vocabularies, such as SNOMED CT, RxNorm, Logical Observation Identifiers Names and Codes (LOINC), and ICD-10-CM, which are used to define clinical concepts for quality assessment purposes. The VSAC has expanded to incorporate value sets for other use cases, as well as for new measures and updates to existing measures. The VSAC Data Element Catalog was previously used to provide 2014 CQMs and value set names, and has been replaced with more robust metadata available in the Binding Parameter Specification. Value sets are available for viewing or download after obtaining a free Unified Medical Language System Metathesaurus License (required due to usage restrictions on some of the codes included in the value sets).

Previous Section Next Section

SECTIONS

CHAPTER SECTIONS

sections

  1. Introduction
  2. Definitions
  3. Finding Existing Phenotype Definitions
  4. Evaluating Phenotype Definitions
  5. Data Quality
  6. Using Phenotypes in PCTs—How Do I Get Started?

Resources

A User’s Guide to Computable Phenotypes
Master’s thesis providing a practical framework to help physicians, clinical researchers, and informaticians evaluate published phenotype algorithms for reuse for various purposes. The framework is divided into 3 phases aligned with expected user roles: overall assessment, clinical validation, and technical review.

Suggestions for Identifying Phenotype Definitions Used in Published Research
Guidance document from the NIH Collaboratory's Electronic Health Records Core Working Group providing suggestions for searching for phenotype definitions in the peer-reviewed literature.

Phenotype KnowledgeBase (PheKB)
Platform that enables access to validated phenotype definitions, validation of existing phenotype algorithms in electronic health records, collaboration on existing and new phenotype algorithms, and interaction with potential phenotype algorithm collaborators.

ACKNOWLEDGMENTS

back to top

Key contributors to previous versions of this chapter included Michelle Smerek, Shelley Rusincovitch, Meredith Nahm Zozus, Paramita Saha Chaudhuri, Ed Hammond, Robert Califf, Greg Simon, Beverly Green, Michael Kahn, and Reesa Laws.

back to top

The Electronic Health Records Core Working Group (formerly the Phenotypes, Data Standards, and Data Quality Core Working Group) of the NIH Collaboratory influenced much of this content through monthly meetings. These additional contributors included Monique Anderson, Nick Anderson, Alan Bauck, Denise Cifelli, Lesley Curtis, John Dickerson, Chris Helker, Michael Kahn, Cindy Kluchar, Melissa Leventhal, Rosemary Madigan, Renee Pridgen, Jon Puro, Jennifer Robinson, Jerry Sheehan, and Kari Stephens. We are also grateful to the Duke Center for Predictive Medicine for development and clarification of the scientific validity and evaluation of phenotype definitions.


Version History

June 23, 2022: Updated the name of the NIH Collaboratory in the contributors list, added an item to the Resources sidebar, and made nonsubstantive changes to the text as part of the annual content update (changes made by D. Seils).

May 13, 2021: Added PheKB to the Resources sidebar (change made by D. Seils).

July 8, 2020: Updated links in the list of contributors (changes made by D. Seils).

July 1, 2020: Minor corrections to layout and formatting (changes made by D. Seils).

Published June 30, 2020

current section :

Finding Existing Phenotype Definitions

  1. Introduction
  2. Definitions
  3. Finding Existing Phenotype Definitions
  4. Evaluating Phenotype Definitions
  5. Data Quality
  6. Using Phenotypes in PCTs—How Do I Get Started?

Citation:

Richesson R, Wiley LK, Gold S, Rasmussen L; for the NIH Health Care Systems Research Collaboratory Electronic Health Records Core Working Group. Electronic Health Records–Based Phenotyping: Finding Existing Phenotype Definitions. In: Rethinking Clinical Trials: A Living Textbook of Pragmatic Clinical Trials. Bethesda, MD: NIH Pragmatic Trials Collaboratory. Available at: https://rethinkingclinicaltrials.org/chapters/conduct/electronic-health-records-based-phenotyping/finding-existing-phenotype-definitions/. Updated December 3, 2025. DOI: 10.28929/145.

Footer Menu

  • How to Use This Site
  • About NIH Collaboratory
  • Enrollment Reporting
  • Grand Rounds
  • Funding Statement
Link to Twitter Link to LinkedIn Link to Blue Sky Link to NIH Collaboratory email

Reference in this Web site to any specific commercial products, process, service, manufacturer, or company does not constitute its endorsement or recommendation by the U.S. Government or National Institutes of Health (NIH). NIH is not responsible for the contents of any “off-site” Web page referenced from this server.

Log in
Privacy Statement
WordPress is a content management system and should not be used to upload any PHI as it is not an environment for which we exercise oversight, meaning you the author are responsible for the content you post. Please use this system accordingly. Site Map