Electronic Health Records–Based Phenotyping
Section 3
Finding Existing Phenotype Definitions
Several key groups are involved in establishing phenotype definitions. This section describes some authoritative sources.
Phenotype definitions may be developed by government entities, universities, healthcare systems, professional societies, or clinical trial consortia. Some research networks, such as PCORnet and OHDSI, are committed to a common data model, and phenotype definitions can be more easily shared if they use that model. (See the Inpatient Endpoints in Pragmatic Clinical Trials section in the Choosing and Specifying Endpoints and Outcomes chapter of the Living Textbook for more about the use of common data models in extracting data from electronic health records [EHRs].)
The NIH Pragmatic Trials Collaboratory is aware of the many related efforts and the dynamic nature of this field, and is continually surveying for phenotype-related efforts in an attempt to keep this work in context while preventing duplication of previous efforts.
Chronic Conditions Data Warehouse
The Centers for Medicare & Medicaid Services developed the Chronic Conditions Data Warehouse to enable research on 27 chronic conditions that they determined to be of particular importance to Medicare beneficiaries. This resource includes the algorithms that define the 27 chronic conditions, as well as links to the references used in the creation of the categories.
Clinical Classifications Software
The Healthcare Cost and Utilization Project is a well-established collection of databases and tools sponsored by the Agency for Healthcare Research and Quality. This project has produced Clinical Classifications Software that groups ICD-9-CM codes, ICD-10 codes, and other systems into clinically meaningful categories.
Electronic Medical Records and Genomics (eMERGE) Network
The Electronic Medical Records and Genomics (eMERGE) Network was organized by the National Human Genome Research Institute to connect EHR data with specimens from biorepositories to enable genetic research. The ultimate goal of the network is to provide genetic data for clinical care or personalized medicine. Equipped with genotyping data made available by the advent of the genome-wide association studies era, researchers are now turning to the expanding volume of clinical data in EHRs to identify genotype–phenotype associations. PheKB is a collaborative environment, organized via a website and facilitated by the eMERGE consortium, that enables access to validated phenotype definitions (“algorithms”), validation of existing phenotype algorithms on EHRs, collaboration on existing and new phenotype algorithms, and interaction with potential phenotype algorithm collaborators.
Sentinel Initiative
The Sentinel Initiative is a project sponsored by the US Food and Drug Administration with the goal of creating a system of safety surveillance for drugs and medical devices after they have been approved for marketing (“postmarket surveillance”). The phenotyping efforts of the Sentinel Initiative include the accurate identification and characterization of clinical outcomes experienced by people who are using a specific FDA-regulated device or drug. Additional background information, methods, and protocols can be accessed on the Sentinel Initiative website.
QualityNet
QualityNet is an effort sponsored by the Centers for Medicare & Medicaid Services to improve the quality of healthcare for Medicare beneficiaries. QualityNet provides a secure environment for the exchange of healthcare information, as well as tools and quality improvement news and information. QualityNet provides specifications for reporting quality measures that include definitions of clinical populations using standardized coding systems used in healthcare claims data.
Strategic Health IT Advanced Research Projects (SHARP)
The Strategic Health IT Advanced Research Projects (SHARP) program was established by the ONC to facilitate research that would enable increased adoption of health information technology. Area 4 of the SHARP project, known as SHARPn, focused on enabling secondary use of EHR data. The SHARPn group developed the Phenotype Portal for “generating and executing Meaningful Use standards–based phenotyping algorithms that can be shared across multiple institutions and investigators.”
Value Set Authority Center (VSAC)
The Value Set Authority Center (VSAC) is a repository hosted by the National Library of Medicine in collaboration with the ONC and the Centers for Medicare & Medicaid Services. The VSAC provides access to official versions of all value sets contained in the Meaningful Use 2014 Clinical Quality Measures (CQMs). Each value set consists of the alphanumeric values (“codes”) and their respective human-readable names (“terms”). The value sets are derived from standard vocabularies, such as SNOMED CT, RxNorm, Logical Observation Identifiers Names and Codes (LOINC), and ICD-10-CM, which are used to define clinical concepts for quality assessment purposes. The VSAC has expanded to incorporate value sets for other use cases, as well as for new measures and updates to existing measures. The VSAC Data Element Catalog was previously used to provide 2014 CQMs and value set names, and has been replaced with more robust metadata available in the Binding Parameter Specification. Value sets are available for viewing or download after obtaining a free Unified Medical Language System Metathesaurus License (required due to usage restrictions on some of the codes included in the value sets).
SECTIONS
Resources
A User’s Guide to Computable Phenotypes
Master’s thesis providing a practical framework to help physicians, clinical researchers, and informaticians evaluate published phenotype algorithms for reuse for various purposes. The framework is divided into 3 phases aligned with expected user roles: overall assessment, clinical validation, and technical review.
Suggestions for Identifying Phenotype Definitions Used in Published Research
Guidance document from the NIH Collaboratory's Electronic Health Records Core Working Group providing suggestions for searching for phenotype definitions in the peer-reviewed literature.
Phenotype KnowledgeBase (PheKB)
Platform that enables access to validated phenotype definitions, validation of existing phenotype algorithms in electronic health records, collaboration on existing and new phenotype algorithms, and interaction with potential phenotype algorithm collaborators.
ACKNOWLEDGMENTS
Key contributors to previous versions of this chapter included Michelle Smerek, Shelley Rusincovitch, Meredith Nahm Zozus, Paramita Saha Chaudhuri, Ed Hammond, Robert Califf, Greg Simon, Beverly Green, Michael Kahn, and Reesa Laws.
The Electronic Health Records Core Working Group (formerly the Phenotypes, Data Standards, and Data Quality Core Working Group) of the NIH Collaboratory influenced much of this content through monthly meetings. These additional contributors included Monique Anderson, Nick Anderson, Alan Bauck, Denise Cifelli, Lesley Curtis, John Dickerson, Chris Helker, Michael Kahn, Cindy Kluchar, Melissa Leventhal, Rosemary Madigan, Renee Pridgen, Jon Puro, Jennifer Robinson, Jerry Sheehan, and Kari Stephens. We are also grateful to the Duke Center for Predictive Medicine for development and clarification of the scientific validity and evaluation of phenotype definitions.