Electronic Health Records–Based Phenotyping

Section 6 Using Phenotypes in PCTs—How Do I Get Started?

Contributors

Before beginning development of any phenotype definition, researchers should search for existing phenotype definitions and consider their performance in validation testing. They should then assess the candidate phenotype definitions for feasibility in particular settings (for example, determining whether available domains match the authoritative source phenotype definition). If a suitable phenotype definition cannot be found from authoritative sources, then a definition must be developed and validated. In any case, once a candidate phenotype definition is identified, it must be validated against a gold standard in clinical populations, as shown in the figure below.

Figure. Phenotype Evaluation Process

The Figure is a flow diagram of the phenotype evaluation process. — Abbreviations: AHRQ, Agency for Healthcare Research and Quality; CMS, Center for Medicare and Medicaid Services. Adapted with permission from Shelley Rusincovitch, Center for Predictive Medicine, Duke Clinical Research Institute.

If a new phenotype definition is needed, the researchers must first operationalize a disease concept against electronic health record (EHR) data. The researchers must explicitly define how a concept should be measured, observed, or manipulated within a particular study and available data sources. A theoretical or conceptual variable of interest (such as a disease) must be translated into a set of specific diagnoses or procedures paired with implementation specifications that define the variable's meaning in a specific study. In the context of healthcare data, this means explicitly defining diagnoses, treatments, and clinical and patient characteristics that are indicative or suggestive of the condition. The researchers must specify the clinical condition they are looking for and how the condition would be represented in various EHRs.

For example, to identify obesity, the researchers would first identify diagnostic and procedure codes for the condition and investigate whether the codes are reliable and are applied consistently. If the researchers cannot reasonably assume that all patients with obesity would be coded with a given diagnosis or procedure code, they must use other data sources.

The next step is to review the available data sources (such as EHR data, claims data, registry data, and patient-reported outcomes data). If a phenotype definition is to be applied in multiple organizations, the researchers must consider the data sources that are available in other organizations. Possible data sources for obesity might include patient height and weight, the ordering or dispensing of medications associated with weight management, or patient-reported data on weight or a previous diagnosis of obesity. It is also important to consider other factors that may affect these measurements (such as the effect of pregnancy on weight, or the effect of amputation on height). Within each data type, the researchers should identify which data are available to them (for example, some EHR data include medication orders but not administration data, or billing diagnoses rather than problem lists). Knowing the types of data available can support an early feasibility assessment of existing phenotype definitions.

Previous Section

SECTIONS

CHAPTER SECTIONS

sections

Resources

A User’s Guide to Computable Phenotypes
Master’s thesis providing a practical framework to help physicians, clinical researchers, and informaticians evaluate published phenotype algorithms for reuse for various purposes. The framework is divided into 3 phases aligned with expected user roles: overall assessment, clinical validation, and technical review.

ACKNOWLEDGMENTS

Key contributors to previous versions of this chapter included Michelle Smerek, Shelley Rusincovitch, Meredith Nahm Zozus, Paramita Saha Chaudhuri, Ed Hammond, Robert Califf, Greg Simon, Beverly Green, Michael Kahn, and Reesa Laws.

The Electronic Health Records Core Working Group (formerly the Phenotypes, Data Standards, and Data Quality Core Working Group) of the NIH Collaboratory influenced much of this content through monthly meetings. These additional contributors included Monique Anderson, Nick Anderson, Alan Bauck, Denise Cifelli, Lesley Curtis, John Dickerson, Chris Helker, Michael Kahn, Cindy Kluchar, Melissa Leventhal, Rosemary Madigan, Renee Pridgen, Jon Puro, Jennifer Robinson, Jerry Sheehan, and Kari Stephens. We are also grateful to the Duke Center for Predictive Medicine for development and clarification of the scientific validity and evaluation of phenotype definitions.

Version History

June 23, 2022: Updated the name of the NIH Collaboratory in the contributors list, added a Resources sidebar, and made nonsubstantive changes to the text as part of the annual content update (changes made by D. Seils).

July 22, 2020: Added the alt text attribute and corrected the caption for the Figure (changes made by D. Seils).

July 8, 2020: Updated links in the list of contributors; and made minor corrections to layout and formatting (changes made by D. Seils).

July 1, 2020: Minor corrections to layout and formatting (changes made by D. Seils).

Published June 30, 2020

COVID-19 Resources

COVID-19 Resources

Rethinking Clinical Trials

A Living Textbook of Pragmatic Clinical Trials

Using Phenotypes in PCTs—How Do I Get Started?

Electronic Health Records–Based Phenotyping

Section 6

Using Phenotypes in PCTs—How Do I Get Started?

SECTIONS

sections

Resources

ACKNOWLEDGMENTS

current section :

Using Phenotypes in PCTs—How Do I Get Started?

Citation: