Evaluating Phenotype Definitions

Electronic Health Records–Based Phenotyping


Section 4

Evaluating Phenotype Definitions

What makes a "good" phenotype definition?

Computable phenotype definitions should be explicit, reproducible, reliable, and valid. Details of the components of a phenotype definition (such as data elements and value sets) should be provided and should be sufficient to allow the query to be reproduced in another system or by another data operator. For a phenotype definition to be reliable, it must produce a similar result with the same dataset every time it is applied. For a phenotype definition to be valid, it must identify the condition for which it was developed and meet the desired degrees of sensitivity and specificity.

Various performance metrics are used to measure the performance of a phenotype definition in different data sources or populations, analogous to measuring the performance of a case definition or diagnostic technique. These metrics include sensitivity, specificity, positive predictive value, and negative predictive value.

Moreover, to become consistently used, computable phenotype definitions must leverage data that are routinely collected in most, if not all, electronic health records (EHRs) and/or ancillary data sources.

How can the validity of a phenotype definition be determined?

The validity of a phenotype definition is defined as the phenotype definition's ability to correctly measure or detect people with the condition of interest and those without the condition of interest; that is, its ability to correctly identify which individuals exhibit the true phenotype and which do not.

Estimation of validity requires a gold standard, defined as the best classification available for assessing the true or actual phenotype status. Assessment of a gold standard is a resource-intensive process that requires careful manual review of current and historical individual patient data. Owing to logistical and efficiency considerations, multiple clinical reviewers are usually involved in the process. However, to ensure consistency between conclusions drawn from patient records, an initial training of the reviewers is crucial. Most studies use expert clinicians to review identified cases but do not specify the training of the reviewers or the details of their assessment of true disease or case status.

Many phenotype developers have conducted validation studies (Newton et al 2013; Peissig et al 2012; Rosenman et al 2014), but none appear to have used a controlled approach. Some investigators attempt to characterize the validity of a phenotype definition by using agreement rates between the phenotype definition and a known standard, while other investigators report the sensitivity or specificity of the phenotype definition compared with a known or gold standard. In this context, sensitivity is the ability to correctly identify individuals who have the phenotype, and specificity is the ability to correctly identify those who do not have the phenotype.

Positive predictive value is an estimate of the prevalence of the true condition among individuals who have the phenotype. Negative predictive value is an estimate of the prevalence of the true condition among those who do not have the phenotype. Both positive and negative predictive values are indicators of the success rate of a phenotype definition when it is to be used in practice. Similar to sensitivity and specificity, positive and negative predictive values require knowledge about the true phenotype. They can be estimated on the basis of the sensitivity, specificity, and prevalence of the condition in the population being examined.

Researchers at Duke University’s Center for Predictive Medicine are developing and testing methods to quantify the validity and reliability of certain computable phenotype definitions. (See “Practical Development and Implementation of EHR Phenotypes”; NIH Collaboratory Grand Rounds; November 15, 2013.)

Determination of a gold standard is a critical complicating factor for determining data quality in EHRs and ultimately the "source of truth." For conditions in which lab values are diagnostic, a lab value can be the gold standard, though the clinical context is critical in many cases. For behavioral or mental health conditions, the gold standard or best source of data to approximate "truth" is often the patient or an observation by an expert clinician. For many diseases with complex etiologies, subjective diagnoses, or broad ranges of clinical presentations, the best source of data (or "truth") is unclear. It is likely that a variety of data sources must be used to determine a patient's true state of disease or to identify the condition. Because of these challenges, recent efforts have looked to the use of "silver standard" definitions to produce more cost-effective validation sets without the need for significant record review (Swerdel, Hripcsak, and Ryan 2019; Wagholikar et al 2020).

How can the reliability and reproducibility of a phenotype definition be determined?

"Reliability" is defined as the extent to which an experiment, test, or measuring procedure—or phenotype definition—yields the same results in repeated trials. Reliability is an attribute of any computer-related component (such as software, hardware, or a network) that consistently performs according to its specifications. One way to assess reliability is to implement the phenotype definition algorithm multiple times and observe whether the results on the same patients are the same over repeated implementations.

In contrast, "reproducibility" is the consistency of results or implementation of the algorithm multiple times under similar conditions. To determine reliability, the analyst repeatedly implements the algorithm on the same set of patients and checks whether the phenotype results for the same patients match. For reproducibility, the algorithm can be implemented on either different or the same patient populations by different "coders."

Ultimately, what is required is an unequivocal algorithm that is implemented without room for confusion. For most diseases (especially those with a subjective diagnosis or broad range of clinical presentations), a variety of data sources must be included in the phenotype definition. The more complex the phenotype definition, the more difficult it can be to reproduce and the more likely errors will influence the reliability of the algorithm (Richesson et al 2013).

Several well-known issues can affect reliability, including changes in coding terminology over time and variations in coding practices at the provider, healthcare system, and regional levels. An active area of research involves studying data quality and testing various phenotype definitions in different settings or time periods to represent variations in data quality.

How can the reproducibility of a phenotype definition be optimized?

Careful attention to 2 features of phenotype definition development can enhance the likelihood that a phenotype definition will be applied consistently: clearly articulated specifications for the definition, and guidance for implementers. Development of meaningful specifications and documentation is complicated by variation in healthcare information systems and lack of data standards for EHR data.

Ideally, a phenotype definition should be reproducible across institutions, but many factors can affect reproducibility, including regional differences in patient populations, differences in EHR systems, variations in the work flows that generate the data, and variations in coding practices. The process of implementing a phenotype definition at multiple institutions can result in a more robust definition that accounts or adjusts for localization of the data.

What are potential limitations of EHR data and computable phenotypes?

Data contained in EHRs and ancillary data sources are generated through the provision of clinical care. The data are not optimized for secondary uses, and using the data for research purposes has multiple limitations (Bayley et al 2013).

Missing Data

Because EHR data are derived from patient encounters with providers or healthcare systems, data are only recorded during healthcare episodes. This fact can result in bias, because healthier individuals are missing from the dataset. "Missingness" is a common problem and is often nonrandom, a challenge known as "informative censoring" (National Research Council 2010; Shih 2002). Patients are also lost to follow-up if they move out of the area or obtain care from a provider in a different healthcare system. Therefore, in pragmatic clinical trials, it is important to distinguish between "not present" in the dataset and "did not assess."

Inaccurate or Uninterpretable Data

Errors are common in data from EHRs and ancillary data sources, because most data are entered by busy healthcare providers during a patient visit or from recall after the visit. Phenotype definitions based on coding that is influenced by billing are susceptible to systematic biases. In addition, data may be uninterpretable if, for example, units of measurement are missing or if analyzable information cannot be gleaned from qualitative assessments.

Complex and Inconsistent Data

Clinical definitions, coding rules, and data collection systems vary over time. Data collection practices can also vary among providers at different locations. Finally, much information is still captured as unstructured data and stored in narrative notes. Though many challenges exist in extracting unstructured data, these data are increasingly being used to support various types of clinical decision making and research using an evolving set of tools (Nadkarni, Ohno-Machado, and Chapman 2001).

SECTIONS

CHAPTER SECTIONS

Resources


Practical Development and Implementation of EHR Phenotypes
NIH Collaboratory Grand Rounds; November 15, 2013

A User’s Guide to Computable Phenotypes
Master’s thesis providing a practical framework to help physicians, clinical researchers, and informaticians evaluate published phenotype algorithms for reuse for various purposes. The framework is divided into 3 phases aligned with expected user roles: overall assessment, clinical validation, and technical review.

REFERENCES

back to top

Bayley KB, Belnap T, Savitz L, et al. 2013. Challenges in using electronic health record data for CER: experience of 4 learning organizations and solutions applied. Med Care. 51:S80-86. doi:10.1097/MLR.0b013e31829b1d48. PMID: 23774512.

Nadkarni PM, Ohno-Machado L, Chapman WW. 2011. Natural language processing: an introduction. J Am Med Inform Assoc. 18:544-551. doi:10.1136/amiajnl-2011-000464. PMID: 21846786.

National Research Council. 2010. The Prevention and Treatment of Missing Data in Clinical Trials. Washington, DC: National Academies Press.

Newton KM, Peissig PL, Kho AN, et al. 2013. Validation of electronic medical record-based phenotyping algorithms: results and lessons learned from the eMERGE network. J Am Med Inform Assoc. 20:e147-154. doi:10.1136/amiajnl-2012-000896. PMID: 23531748.

Peissig PL, Rasmussen LV, Berg RL, et al. 2012. Importance of multi-modal approaches to effectively identify cataract cases from electronic health records. J Am Med Inform Assoc. 19:225-234. doi:10.1136/amiajnl-2011-000456. PMID: 22319176.

Richesson RL, Rusincovitch SA, Wixted D, et al. 2013. A comparison of phenotype definitions for diabetes mellitus. J Am Med Inform Assoc. 20:e319-e326. doi:10.1136/amiajnl-2013-001952. PMID: 24026307.

Rosenman M, He J, Martin J, et al. 2014. Database queries for hospitalizations for acute congestive heart failure: flexible methods and validation based on set theory. J Am Med Inform Assoc. 21:345-352. doi:10.1136/amiajnl-2013-001942. PMID: 24113802.

Swerdel JN, Hripcsak G, Ryan PB. 2019. PheValuator: development and evaluation of a phenotype algorithm evaluator. J Biomed Inform. 97:103258. doi:10.1016/j.jbi.2019.103258. PMID: 31369862.

Wagholikar KB, Estiri H, Murphy M, Murphy SN. 2020. Polar labeling: silver standard algorithm for training disease classifiers. Bioinformatics. 36(10):3200-3206. doi:10.1093/bioinformatics/btaa088. PMID: 32049335.

ACKNOWLEDGMENTS

back to top

Key contributors to previous versions of this chapter included Michelle Smerek, Shelley Rusincovitch, Meredith Nahm Zozus, Paramita Saha Chaudhuri, Ed Hammond, Robert Califf, Greg Simon, Beverly Green, Michael Kahn, and Reesa Laws.

The Electronic Health Records Core Working Group (formerly the Phenotypes, Data Standards, and Data Quality Core Working Group) of the NIH Collaboratory influenced much of this content through monthly meetings. These additional contributors included Monique Anderson, Nick Anderson, Alan Bauck, Denise Cifelli, Lesley Curtis, John Dickerson, Chris Helker, Michael Kahn, Cindy Kluchar, Melissa Leventhal, Rosemary Madigan, Renee Pridgen, Jon Puro, Jennifer Robinson, Jerry Sheehan, and Kari Stephens. We are also grateful to the Duke Center for Predictive Medicine for development and clarification of the scientific validity and evaluation of phenotype definitions.


Version History

June 23, 2022: Updated the name of the NIH Collaboratory in the contributors list, added an item to the Resources sidebar, and made nonsubstantive changes to the text as part of the annual content update (changes made by D. Seils).

July 8, 2020: Updated links in the list of contributors; and made nonsubstantive corrections to the text (changes made by D. Seils).

July 1, 2020: Addition of Resources sidebar; and minor corrections to layout and formatting (changes made by D. Seils).

Published June 30, 2020

Finding Existing Phenotype Definitions

Electronic Health Records–Based Phenotyping


Section 3

Finding Existing Phenotype Definitions

Several key groups are involved in establishing phenotype definitions. This section describes some authoritative sources.

Phenotype definitions may be developed by government entities, universities, healthcare systems, professional societies, or clinical trial consortia. Some research networks, such as PCORnet and OHDSI, are committed to a common data model, and phenotype definitions can be more easily shared if they use that model. (See the Inpatient Endpoints in Pragmatic Clinical Trials section in the Choosing and Specifying Endpoints and Outcomes chapter of the Living Textbook for more about the use of common data models in extracting data from electronic health records [EHRs].)

The NIH Pragmatic Trials Collaboratory is aware of the many related efforts and the dynamic nature of this field, and is continually surveying for phenotype-related efforts in an attempt to keep this work in context while preventing duplication of previous efforts.

Chronic Conditions Data Warehouse

The Centers for Medicare & Medicaid Services developed the Chronic Conditions Data Warehouse to enable research on 27 chronic conditions that they determined to be of particular importance to Medicare beneficiaries. This resource includes the algorithms that define the 27 chronic conditions, as well as links to the references used in the creation of the categories.

Clinical Classifications Software

The Healthcare Cost and Utilization Project is a well-established collection of databases and tools sponsored by the Agency for Healthcare Research and Quality. This project has produced Clinical Classifications Software that groups ICD-9-CM codes, ICD-10 codes, and other systems into clinically meaningful categories.

Electronic Medical Records and Genomics (eMERGE) Network

The Electronic Medical Records and Genomics (eMERGE) Network was organized by the National Human Genome Research Institute to connect EHR data with specimens from biorepositories to enable genetic research. The ultimate goal of the network is to provide genetic data for clinical care or personalized medicine. Equipped with genotyping data made available by the advent of the genome-wide association studies era, researchers are now turning to the expanding volume of clinical data in EHRs to identify genotype–phenotype associations. PheKB is a collaborative environment, organized via a website and facilitated by the eMERGE consortium, that enables access to validated phenotype definitions (“algorithms”), validation of existing phenotype algorithms on EHRs, collaboration on existing and new phenotype algorithms, and interaction with potential phenotype algorithm collaborators.

Sentinel Initiative

The Sentinel Initiative is a project sponsored by the US Food and Drug Administration with the goal of creating a system of safety surveillance for drugs and medical devices after they have been approved for marketing (“postmarket surveillance”). The phenotyping efforts of the Sentinel Initiative include the accurate identification and characterization of clinical outcomes experienced by people who are using a specific FDA-regulated device or drug. Additional background information, methods, and protocols can be accessed on the Sentinel Initiative website.

QualityNet

QualityNet is an effort sponsored by the Centers for Medicare & Medicaid Services to improve the quality of healthcare for Medicare beneficiaries. QualityNet provides a secure environment for the exchange of healthcare information, as well as tools and quality improvement news and information. QualityNet provides specifications for reporting quality measures that include definitions of clinical populations using standardized coding systems used in healthcare claims data.

Strategic Health IT Advanced Research Projects (SHARP)

The Strategic Health IT Advanced Research Projects (SHARP) program was established by the ONC to facilitate research that would enable increased adoption of health information technology. Area 4 of the SHARP project, known as SHARPn, focused on enabling secondary use of EHR data. The SHARPn group developed the Phenotype Portal for “generating and executing Meaningful Use standards–based phenotyping algorithms that can be shared across multiple institutions and investigators.”

Value Set Authority Center (VSAC)

The Value Set Authority Center (VSAC) is a repository hosted by the National Library of Medicine in collaboration with the ONC and the Centers for Medicare & Medicaid Services. The VSAC provides access to official versions of all value sets contained in the Meaningful Use 2014 Clinical Quality Measures (CQMs). Each value set consists of the alphanumeric values (“codes”) and their respective human-readable names (“terms”). The value sets are derived from standard vocabularies, such as SNOMED CT, RxNorm, Logical Observation Identifiers Names and Codes (LOINC), and ICD-10-CM, which are used to define clinical concepts for quality assessment purposes. The VSAC has expanded to incorporate value sets for other use cases, as well as for new measures and updates to existing measures. The VSAC Data Element Catalog was previously used to provide 2014 CQMs and value set names, and has been replaced with more robust metadata available in the Binding Parameter Specification. Value sets are available for viewing or download after obtaining a free Unified Medical Language System Metathesaurus License (required due to usage restrictions on some of the codes included in the value sets).

SECTIONS

CHAPTER SECTIONS

Resources

A User’s Guide to Computable Phenotypes
Master’s thesis providing a practical framework to help physicians, clinical researchers, and informaticians evaluate published phenotype algorithms for reuse for various purposes. The framework is divided into 3 phases aligned with expected user roles: overall assessment, clinical validation, and technical review.

Suggestions for Identifying Phenotype Definitions Used in Published Research
Guidance document from the NIH Collaboratory's Electronic Health Records Core Working Group providing suggestions for searching for phenotype definitions in the peer-reviewed literature.

Phenotype KnowledgeBase (PheKB)
Platform that enables access to validated phenotype definitions, validation of existing phenotype algorithms in electronic health records, collaboration on existing and new phenotype algorithms, and interaction with potential phenotype algorithm collaborators.

ACKNOWLEDGMENTS

back to top

Key contributors to previous versions of this chapter included Michelle Smerek, Shelley Rusincovitch, Meredith Nahm Zozus, Paramita Saha Chaudhuri, Ed Hammond, Robert Califf, Greg Simon, Beverly Green, Michael Kahn, and Reesa Laws.

The Electronic Health Records Core Working Group (formerly the Phenotypes, Data Standards, and Data Quality Core Working Group) of the NIH Collaboratory influenced much of this content through monthly meetings. These additional contributors included Monique Anderson, Nick Anderson, Alan Bauck, Denise Cifelli, Lesley Curtis, John Dickerson, Chris Helker, Michael Kahn, Cindy Kluchar, Melissa Leventhal, Rosemary Madigan, Renee Pridgen, Jon Puro, Jennifer Robinson, Jerry Sheehan, and Kari Stephens. We are also grateful to the Duke Center for Predictive Medicine for development and clarification of the scientific validity and evaluation of phenotype definitions.


Version History

June 23, 2022: Updated the name of the NIH Collaboratory in the contributors list, added an item to the Resources sidebar, and made nonsubstantive changes to the text as part of the annual content update (changes made by D. Seils).

May 13, 2021: Added PheKB to the Resources sidebar (change made by D. Seils).

July 8, 2020: Updated links in the list of contributors (changes made by D. Seils).

July 1, 2020: Minor corrections to layout and formatting (changes made by D. Seils).

Published June 30, 2020

Definitions

Electronic Health Records–Based Phenotyping


Section 2

Definitions

What is a phenotype?

A phenotype is the observable physical or biochemical expression of a specific trait in an organism, such as a disease, stature, or blood type, based on genetic information and environmental influences. The phenotype of an organism includes physical appearance, biochemical processes, and behaviors. In short, an organism's phenotype is the appearance the organism presents to observers.

In contemporary biomedical research contexts, a phenotype is understood as a measurable biological marker (such as a physiological, biochemical, or anatomical feature), a behavioral marker (such as a psychometric pattern), or a cognitive marker that is found more often in individuals with a disease or condition than in the general population.

What is a computable phenotype?

A computable phenotype is a clinical condition, characteristic, or set of clinical features that can be determined solely from data in electronic health records (EHRs) and ancillary data sources and does not require chart review or interpretation by a clinician. We use the term "EHR" broadly to refer to data generated through healthcare delivery and reimbursement practices. These functions may be covered in multiple systems and can contain both practice management data and data that are strictly limited to the clinical domain. We use the term "ancillary data sources" to refer to disease registries, claims data, supplemental data collection, and other sources that are related to healthcare delivery but may not be directly integrated into the EHR system. Computable phenotypes are also sometimes referred to as "EHR condition definitions," "EHR-based phenotype definitions," or simply "phenotypes."

What is a computable phenotype definition?

A computable phenotype definition is a specification for identifying patients or populations with a given characteristic or condition of interest from EHRs using data that are routinely collected in EHRs or ancillary data sources. A computable phenotype definition consists of data elements and logical expressions (such as AND, OR, and NOT) that can be interpreted and executed by a computer. In other words, the syntax defining a computable phenotype is designed to be interpreted and executed programmatically without human intervention. Computable phenotype definitions often rely on value sets—lists of codes from standardized medical vocabularies that indicate a condition, drug exposure, or other clinical phenomenon of interest. Data elements and the difference between data elements and phenotypes are described under "How are Data Elements and Phenotypes Different?" later in this section.

Why are computable phenotype definitions important?

Computable phenotype definitions can support reproducible queries of EHR data from multiple systems (such as clinical and ancillary health information systems, research networks, and aggregated databases). These queries can then be replicated at multiple sites in a consistent fashion, enabling efficiencies and ensuring that populations identified from different healthcare organizations have similar features, or were at least identified in the same way.

The ability to identify people with particular conditions across healthcare organizations by using common definitions has value for clinical quality measurement, health improvement, and research. Standard phenotype definitions can enable direct identification of cohorts based on population characteristics, risk factors, and complications, allowing decision makers to identify and target patients for screening tests and interventions that have been demonstrated to be effective in similar populations. This identification process can be integrated with the EHR for real-time clinical decision support. (See the Clinical Decision Support chapter of the Living Textbook.)

Computable phenotype definitions are essential to the conduct of pragmatic clinical trials and comparative effectiveness research. These studies, which may involve multiple hospitals or healthcare systems, rely on standard phenotype definitions for EHR-based inclusion and exclusion of participants and consistent data analysis and reporting across data sources. Computable phenotype definitions have applications in interventional, observational, prospective, and retrospective studies.

How do computable phenotypes relate to the true presence of a condition?

Although computable phenotypes can be used to identify patients for whom the data are suggestive of a particular condition, the presence of a computable phenotype does not guarantee that the patients have the condition. As shown in the figure below, EHR-based computable phenotypes make use of the data constructs and coding systems that are available to providers when they record patient data in the EHR system. These EHR data may reflect a patient's state or disease status, but the data are generated from perception, interpretation, and recording by the clinicians who observe the patient. Thus, EHR data represent a limited view of a patient's condition and are by definition incomplete—and often biased.

Figure. EHR Phenotyping

 

The Figure is a flow diagram of EHR phenotyping.
Source: Hripcsak and Albers 2013. (Used under a Creative Commons license.)

EHR data are available only for patients who are motivated (often by a disease or illness) and able to visit a clinician. Other attributes related to the clinician and the healthcare organization influence the nature of the data in the EHR, including the experience of the clinician, the availability and use of diagnostic equipment and therapeutic procedures, interactions with clinical specialists, insurance coverage and limitations, and the coding and reimbursement practices of the healthcare organization (Hsia et al 1988). The quantitative impact of each of these features on the performance of clinical phenotypes is largely unknown. Measurement and estimation of these factors, and the development of strategies to mitigate their impact on data quality, are active areas of methodological research in health services research and informatics.

How are data elements and phenotypes different?

Every data element has a set of possible values, called a "value set." A value set might include a limited set of categorical values, a range of numeric values, or a more extensive list of codes from standardized coding systems, such as the International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM) or RxNorm. For example, the data element for "sex" includes a single variable with that name, along with a set of discrete values, and perhaps with a definition and associated descriptive metadata. To query the sex of a person, a single data element is assessed. Table 1 shows examples of data elements for sex, birth date, and race and their associated value sets.

 

Table 1. Examples of EHR Data Elements With Associated Value Sets
Data Element Value Set
sex male, female, unknown/not reported
birth date a date value including the present date and no later than 150 years prior
race American Indian or Alaska Native, Asian, black or African American, Native Hawaiian or other Pacific Islander, white, unknown/not reported

 

The value set for a given data element may reference an entire coding system or a smaller enumerated list, as shown in Table 2.

 

Table 2. Examples of EHR Data Elements With Associated Value Sets
Data Element Value Set
Final diagnosis ICD-10-CM codes (all)
Final diagnosis of diabetes E089, E099, E139, E0865 (from ICD-10-CM)

249.xx, 250.xx, 357.2, 362.01-06 , 366.41 (from ICD-9-CM)

Medications ordered local medication list; clinical drugs coded in RxNorm
Diabetes-related medications ordered acarbose, Precose, acetohexamide, Dymelor, etc.
Abbreviations: ICD-9-CM, International Classification of Diseases, Ninth Revision, Clinical Modification; ICD-10-CM, International Classification of Diseases, Tenth Revision, Clinical Modification.

 

Phenotype definitions are represented as logical query criteria using 1 or more data elements with a defined value set. For example, to infer that a patient has a clinical characteristic, such as type 2 diabetes mellitus, evidence can come from a single data element or many data elements. Table 3 shows possible data elements and their associated value sets for identifying the presence of type 2 diabetes mellitus.

 

Table 3. Possible Data Elements and Value Sets to Identify the Presence of Type 2 Diabetes Mellitus
Data Element Value Set
ICD-10-CM codes for type 2 diabetes mellitus E11.xx
Diabetes-related medications acarbose, glipizide, metformin, etc.
Hemoglobin A1c values suggestive of uncontrolled diabetes ≥ 6.5%
Abbreviation: ICD-10-CM, International Classification of Diseases, Tenth Revision, Clinical Modification.

 

Any single data element in Table 3, all of the data elements collectively, or any combination of the data elements could be used to create a phenotype definition for type 2 diabetes mellitus. Such a definition could specify that any or all of the data elements must contain at least 1 appropriate code. The definition might also specify that the patient must be older than a certain age at the first diagnosis of diabetes, or that the patient must have received diabetes medication but have no history of type 1 diabetes.

What data sources are used?

The number of data fields that are truly standardized and routinely collected across EHR systems is small. Therefore, most phenotype definitions use a combination of International Classification of Diseases, Tenth Revision (ICD-10) codes, medication names, and/or laboratory values. ICD-10 Clinical Modification (ICD-10-CM) diagnosis codes (or ICD-9-CM diagnosis codes before October 1, 2015) can be found in technical billing, professional billing, and/or problem lists. EHRs may use or link ICD-10-CM diagnosis codes or Systematized Nomenclature of Medicine–Clinical Terms (SNOMED CT) codes for problem lists and other sections of EHRs. EHRs also contain a significant volume of unstructured narrative data. Use of natural language processing techniques in the biomedical domain is evolving and has allowed researchers to use computable phenotypes to leverage clinically rich narrative data within EHRs (Ludvigsson et al 2013). There are many opportunities to validate and improve these algorithms (Vanderbilt University 2017).

The Office of the National Coordinator for Health Information Technology (ONC) in the US Department of Health and Human Services maintains standards and implementation specifications for EHR systems to ensure that certified systems support "meaningful use" criteria (ONC 2012). Accordingly, data elements required by the ONC can be collected in all certified EHR systems in the United States in a manner that is consistent with ONC specifications.

Because EHR data may be available from different types of encounters, including inpatient, outpatient, and emergency department visits, phenotype definitions should take into consideration which sources are relevant to answering the question at hand. In some cases, multiple sources will be needed for complete data capture. For example, medication data can be obtained from reconciliation of various data sources, such as records from inpatient administration, provider ordering, or outpatient dispensing. It is also important to consider the applicability of a captured data element during certain encounters. For example, a lab value for a patient may be abnormal during an emergency department visit but not reflect the typical range of lab values for that patient.

What are the benefits of "standard" phenotypes or condition definitions and phenotype definition libraries?

Explicit documentation of computable phenotype definitions can support their use in multiple organizations and settings for consistent identification of patient populations for various purposes. Standardized or explicitly defined phenotype definitions can also streamline the development of registries and applications using healthcare data and can enable development of consistent inclusion criteria to support regional surveillance in the identification of infectious diseases and rare disease complications.

Differences across phenotype definitions can affect their application in healthcare organizations and subsequent interpretation of data. It is unlikely that a single phenotype definition—for example, in type 2 diabetes mellitus or heart failure—will be sufficient for all intended uses. Rather, the ideal phenotype definition depends on the purpose and analytical requirements.

Research networks and collaborations are increasingly seeking to share phenotype definitions for a given characteristic or condition and intended use. For example, Observational Health Data Sciences and Informatics (OHDSI) is "researching and developing strategies for establishing a standardized, evidence-based approach to constructing algorithms to define disease phenotypes that can be used in observational analytics" (OHDSI 2020). The Agency for Healthcare Research and Quality has developed Clinical Classification Software and related tools as part of the Healthcare Cost and Utilization Project for use with ICD-9-CM and ICD-10-CM and other classification systems. Finally, phenotype libraries such as the Phenotype KnowledgeBase (PheKB) have emerged to assist researchers in using standard phenotype definitions appropriate for a given characteristic or condition and intended use.

See the Finding Existing Phenotype Definitions section in this chapter of the Living Textbook for more information about standardized definitions from authoritative sources.

SECTIONS

CHAPTER SECTIONS

Resources

OHSDI Phenotype Library
Landing page for cohort and phenotype definitions and discussions for the Observational Health Data Sciences and Informatics (OHDSI) community. (Under development.)

Phenotype Phebruary Daily Threads & What We Learned
Condition-specific phenotype definitions developed during a “28 phenotypes for 28 days” initiative held within the OHDSI forums during February 2022.

HDR UK Phenotype Library
A comprehensive, open-access resource providing information, tools, and phenotyping algorithms for electronic health records in the United Kingdom.

REFERENCES

back to top

Hripcsak G, Albers DJ. 2013. Next-generation phenotyping of electronic health records. J Am Med Inform Assoc. 20:117-121. doi:10.1136/amiajnl-2012-001145. PMID: 22955496.

Hsia DC, Krushat WM, Fagan AB, et al. 1988. Accuracy of diagnostic coding for Medicare patients under the prospective-payment system. N Engl J Med. 318:352-355. doi:10.1056/NEJM198802113180604. PMID: 3123929.

Ludvigsson JF, Pathak J, Murphy S, et al. 2013. Use of computerized algorithm to identify individuals in need of testing for celiac disease. J Am Med Inform Assoc. 20:e306-310. doi:10.1136/amiajnl-2013-001924. PMID: 23956016.

Observational Health Data Sciences and Informatics (OHDSI). 2020. Phenotype library. https://www.ohdsi.org/resources/libraries/phenotype-library/. Accessed June 23, 2022.

Office of the National Coordinator for Health Information Technology (ONC), Department of Health and Human Services. 2012. Health Information Technology: Standards, Implementation Specifications, and Certification Criteria for Electronic Health Record Technology, 2014 Edition; Revisions to the Permanent Certification Program for Health Information Technology. Final Rule. Fed Regist. 77(171):54163-54292. PMID: 22946139.

Vanderbilt University. Collaboration phenotypes. PheKB. https://phekb.org/phenotypes/collaboration. Accessed June 23, 2022.

ACKNOWLEDGMENTS

back to top

Key contributors to previous versions of this chapter included Michelle Smerek, Shelley Rusincovitch, Meredith Nahm Zozus, Paramita Saha Chaudhuri, Ed Hammond, Robert Califf, Greg Simon, Beverly Green, Michael Kahn, and Reesa Laws.

The Electronic Health Records Core Working Group (formerly the Phenotypes, Data Standards, and Data Quality Core Working Group) of the NIH Collaboratory influenced much of this content through monthly meetings. These additional contributors included Monique Anderson, Nick Anderson, Alan Bauck, Denise Cifelli, Lesley Curtis, John Dickerson, Chris Helker, Michael Kahn, Cindy Kluchar, Melissa Leventhal, Rosemary Madigan, Renee Pridgen, Jon Puro, Jennifer Robinson, Jerry Sheehan, and Kari Stephens. We are also grateful to the Duke Center for Predictive Medicine for development and clarification of the scientific validity and evaluation of phenotype definitions.


Version History

June 23, 2022: Updated the name of the NIH Collaboratory in the contributors list, added a Resources sidebar, and made nonsubstantive changes to the text as part of the annual content update (changes made by D. Seils).

July 22, 2020: Added the alt text attribute for the Figure (change made by D. Seils).

July 8, 2020: Updated links in the list of contributors; made nonsubstantive corrections to the text; and made minor corrections to layout and formatting (changes made by D. Seils).

July 1, 2020: Minor corrections to layout and formatting (changes made by D. Seils).

Published June 30, 2020

Introduction

Electronic Health Records–Based Phenotyping


Section 1

Introduction

In the context of electronic health records (EHRs), a "computable phenotype," or simply "phenotype," is a clinical condition or characteristic that can be ascertained by means of a computerized query to an EHR system or clinical data repository using a defined set of data elements and logical expressions. These queries can identify patients with particular conditions and can be used to support a variety of purposes, including population management, quality measurement, and observational and interventional research. Standardized computable phenotypes can facilitate large-scale pragmatic clinical trials across multiple healthcare systems while ensuring reliability and reproducibility (Richesson et al 2013).

In this chapter, we offer an overview of considerations for identifying, defining, and evaluating computable phenotypes, focusing in particular on standardization efforts within the NIH Pragmatic Trials Collaboratory.

SECTIONS

CHAPTER SECTIONS

Resources

Advances at the Intersection of Digital Health, Electronic Health Records and Pragmatic Clinical Trials: An NIH Collaboratory Grand Rounds EHR Workshop Series

Keynote: Can the COVID-19 Crisis Lead to Evolution of the Evidence Generation Ecosystem?; NIH Collaboratory Grand Rounds; May 1, 2020

Real World Evidence: Contemporary Experience and Future Directions; NIH Collaboratory Grand Rounds; May 8, 2020

Experiences from the Collaboratory PCTs; NIH Collaboratory Grand Rounds; May 29, 2020

Keys to Success in the Evolving EHRs Environment; NIH Collaboratory Grand Rounds; June 26, 2020

Reflection on Advances at the Intersection of Digital Health, Electronic Health Records, and Pragmatic Clinical Trials; NIH Collaboratory Grand Rounds Podcast; July 8, 2020

REFERENCES

back to top

Richesson RL, Hammond WE, Nahm M, et al. 2013. Electronic health records based phenotyping in next-generation clinical trials: a perspective from the NIH Health Care Systems Collaboratory. J Am Med Inform Assoc. 20:e226-e231. doi:10.1136/amiajnl-2013-001926. PMID: 23956018.

ACKNOWLEDGMENTS

back to top

Key contributors to previous versions of this chapter included Michelle Smerek, Shelley Rusincovitch, Meredith Nahm Zozus, Paramita Saha Chaudhuri, Ed Hammond, Robert Califf, Greg Simon, Beverly Green, Michael Kahn, and Reesa Laws.

The Electronic Health Records Core Working Group (formerly the Phenotypes, Data Standards, and Data Quality Core Working Group) of the NIH Collaboratory influenced much of this content through monthly meetings. These additional contributors included Monique Anderson, Nick Anderson, Alan Bauck, Denise Cifelli, Lesley Curtis, John Dickerson, Chris Helker, Michael Kahn, Cindy Kluchar, Melissa Leventhal, Rosemary Madigan, Renee Pridgen, Jon Puro, Jennifer Robinson, Jerry Sheehan, and Kari Stephens. We are also grateful to the Duke Center for Predictive Medicine for development and clarification of the scientific validity and evaluation of phenotype definitions.


Version History

July 8, 2020: Added an item to the Resources sidebar (changes made by D. Seils).

July 8, 2020: Updated links in the list of contributors (changes made by D. Seils).

July 1, 2020: Addition of Resources sidebar; and minor corrections to layout and formatting (changes made by D. Seils).

Published June 30, 2020

Additional Resources

Real-World Evidence: Patient-Reported Outcomes (PROs)


Section 10

Additional Resources

Patient-Centered Outcomes Core Toolkit

The purpose of this toolkit is to provide resources to support the capture of patient-reported outcome measures in diverse study populations participating in the NIH Pragmatic Trials Collaboratory Trials and other pragmatic clinical trials. This toolkit contains a Checklist focused on health equity considerations and PROs, along with Additional Resources.

A CONSORT (Consolidated Standards of Reporting Trials) Statement extension regarding PROs Consort statement recommends 5 checklist items for randomized controlled trials in which PROs are primary or secondary endpoints:

  1. Identify PROs as primary or secondary outcomes in the abstract.
  2. Describe the hypothesis of the PROs and relevant domains (ie, if a multidimensional PRO tool has been used).
  3. Provide or cite evidence of the PRO instrument’s validity and reliability.
  4. Explicitly state statistical approaches for dealing with missing data.
  5. Discuss PRO-specific limitations of study findings and generalizability of results to other populations and clinical practice.
National Institutes of Health (NIH)-sponsored PRO Measurement Information Systems (PROMIS) PROMIS provides approximately 1900 adult and 600 pediatric measures for health-related PRO domains in a variety of conditions. The measures have been standardized to provide common domains and metrics across a wide range of conditions and diseases.

 

NIH Toolbox for the Assessment of Neurological and Behavioral Function The NIH Toolbox is a multidimensional set of brief measures assessing cognitive, emotional, motor, and sensory function in patients ranging from 3 to 85 years of age. The Toolbox is intended to provide consistent measurement across studies and a scientific basis for identifying evidence-based best practices (Gershon et al. 2013).
Neuro-QOL Neuro-QoL (Quality of Life in Neurological Disorders) is a set of PRO measures that assesses the HRQOL of adults and children with neurological disorders such as stroke, multiple sclerosis, amyotrophic lateral sclerosis, Parkinson disease, epilepsy, and muscular dystrophy.
The Adult Sickle Cell Quality of Life Measurement Information System (ASCQ-Me) ASCQ-Me is a HRQOL instrument for adults with sickle cell disease that was designed to add specificity to the PROMIS HRQOL instrument in adults with sickle cell disease (Panepinto 2012).


SECTIONS

CHAPTER SECTIONS

REFERENCES

back to top

Gershon RC, Wagster MV, Hendrie HC, Fox NA, Cook KF, Nowinski CJ. 2013. NIH toolbox for assessment of neurological and behavioral function. Neurology. 80(11 Suppl 3):S2-6. doi:10.1212/WNL.0b013e3182872e5f. PMID: 23479538.

 

Panepinto JA. 2012. Health-related quality of life in patients with hemoglobinopathies. Hematology Am Soc Hematol Educ Program. 2012:284-289. doi:10.1182/asheducation-2012.1.284. PMID: 23233593.


Version History

Published May 30, 2020

Cultural Adaptation and Linguistic Translation

Real-World Evidence: Patient-Reported Outcomes (PROs)


Section 6

Cultural Adaptation and Linguistic Translation

When preparing PRO measures for use with populations who are culturally and linguistically diverse, it is critically important that the intended audience understands both the measure and what is being asked of them. Researchers should recognize that cultural adaptation of a PRO measure is typically more involved that simple linguistic translation. The following working definitions are useful in considering their distinction:

  • Cultural adaptation: Adapting an existing instrument to measure a phenomenon in a different culture
  • Linguistic adaptation: Translating an existing instrument to measure a phenomenon in people who speak another languag

Key Point: Cultural and linguistic adaptation of PRO measures can enable inclusion of a broader study population and enhanced generalizability of results. However, if a measure has not been adapted or translated appropriately, then the population may not understand what is being asked of them. Therefore, the data collected from the measure may not accurately reflect the underlying construct(s) of interest (Gjersing et al. 2010).

We surveyed study investigators from the first round of PRISM NIH Collaboratory Trials to understand and describe efforts for cultural adaptation and linguistic translation for PROs.  We received survey responses from 6 NIH Collaboratory Trial investigators. Of these, the BackInAction and BeatPain Utah studies were the only studies that performed PRO adaptations. Below, we describe the strategies used by the study investigators to ensure PRO cultural and linguistic appropriateness.

BackInAction (Nielsen et al. 2021)

The BackInAction study is comparing a standard 12-week course of acupuncture with an enhanced course of acupuncture (12-week standard course, plus 12-week maintenance course) to usual medical care for chronic low back pain in older adults. The study sample will be recruited from 4 diverse health plans to represent the ethnic and racial composition of Medicare enrollees. Whenever possible, as per FDA guidelines (FDA Guidance for Industry 2009), the study used the published Spanish versions for the selected PRO instruments. When translations were not readily available, they performed linguistic adaptation using standard translation and back-translation techniques for adapting instruments. The investigators created a Spanish version of all instruments used in the baseline and follow-up interviews. One of the healthcare systems had experience translating for clinical research, so a translator within the health system conducted the translation for the trial. Although cognitive testing was not performed with a sample of native speakers, the study team felt they gathered all the evidence needed to support use of the adapted instruments.

BeatPain Utah

BeatPain Utah compares the effectiveness of nonpharmacologic intervention strategies for patients with back pain seeking care in Federally Qualified Health Centers throughout the state of Utah. Because a large proportion of the population is expected to be culturally Latinx and Spanish-speaking, the study team made cultural and linguistic adaptations to the PROs and other materials used for the project. Spanish translation proceeded through the following steps:

  • Informal review of instrument content by native speakers or other stakeholder(s) (aka face validity assessment)
  • Pilot testing in sample of native speakers to evaluate content (aka cognitive testing of adapted measure)
  • Role playing intervention sessions with members of the research team who share the cultural background of many of our anticipated participants

The study team also provided training in cultural considerations for all members of the study intervention delivery team.

For translation of some study-related materials, such as the consent cover letter and IRB-approved materials, a certified translator was used. For other aspects of the study, such as screening scripts, intervention materials and exercise videos, etc., translation was performed by members of the study team who were both culturally Latinx and Spanish-speaking.

Spanish versions of the PROs used as primary and secondary endpoints were all previously validated in other studies.

FM TIPS, OPTIMUM, GRACE, and NOHARM did not make cultural or linguistic adaptations, with all 4 noting that it was not feasible in the timeframe. OPTIMUM also noted that they are not enrolling participants who speak other languages. Only NOHARM has future plans to perform cultural and linguistic translations for Hispanic/Latino participants. See Section 2 for a list of PROs collected by all the trials.

In this initial assessment, consideration of cultural and linguistic adaptation of PRO instruments were found to be relevant to half of PRISM NIH Collaboratory Trials. For the two projects that did not previously perform or plan to perform adaptations, feasibility within the timeframe was a major barrier. When planning pragmatic trials, choosing PRO instruments with readily available versions for relevant target populations should be a priority. Having to perform individual linguistic and cultural adaptations for specific tools could be time consuming and costly, as it typically requires cognitive testing to ensure the translation is interpreted in the intended way.

The Report of the ISPOR Task Force for Translation and Cultural Adaptation suggests involving relevant stakeholders for both forward and backward translation and cognitive debriefing, which is intended to ensure that that target audience understands the materials (Wild et al. 2005).


SECTIONS

CHAPTER SECTIONS

REFERENCES

back to top

FDA Guidance for Industry. 2009. Patient-Reported Outcome Measures: Use in Medical Product Development  to Support Labeling Claims. https://www.fda.gov/regulatory-information/search-fda-guidance-documents/patient-reported-outcome-measures-use-medical-product-development-support-labeling-claims. Accessed March 15, 2021.

Gjersing L, Caplehorn JR, Clausen T. 2010. Cross-cultural adaptation of research instruments: language, setting, time and statistical considerations. BMC Med Res Methodol. 10(1):13. doi:10.1186/1471-2288-10-13. [accessed 2022 Jan 29]. https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/1471-2288-10-13. PMID: 20144247.

 

Wild D, Grove A, Martin M, et al. 2005. Principles of good practice for the translation and cultural adaptation process for patient-reported outcomes (PRO) measures: report of the ISPOR Task Force for Translation and Cultural Adaptation. Value Health. 8(2):94-104. doi:10.1111/j.1524-4733.2005.04054.x. PMID: 15804318.

Nielsen A, Ocker L, Majd I, et al. 2021. Acupuncture Intervention Protocol: Consensus Process for a Pragmatic Randomized Controlled Trial of Acupuncture for Management of Chronic Low Back Pain in Older Adults: An NIH HEAL Initiative Funded Project. Glob Adv Health Med. 10:216495612110070. doi:10.1177/21649561211007091. [accessed 2022 Jan 24]. http://journals.sagepub.com/doi/10.1177/21649561211007091. PMID: 34104574.


Version History

September 14, 2022: Updated as part of annual review (changes made by K. Staman).

January 29, 2022: Added information from the survey of 2 more PRISM projects (changes made by K. Staman).

March 15, 2021: Added information from the survey of the 4 PRISM projects (changes made by K. Staman).

Published May 30, 2020

Incorporating PRO Data Into the EHR

Real-World Evidence: Patient-Reported Outcomes (PROs)


Section 8

Incorporating PRO Data Into the EHR

Although there is increasing interest in incorporating PROs into the EHR, many obstacles remain, and best practices are still evolving (Synder and Wu 2017). Collaboratory investigators have encountered challenges with incorporating PRO data into the EHR. In a recent NIH Collaboratory study, 20 NIH Collaboratory Trials responded to a survey about challenges encountered when using the EHR for pragmatic clinical research (Richesson et al. 2021). The goal of the study was to elucidate challenges and develop solutions—or prerequisites for pragmatic research—to enable healthcare system leaders, policymakers, and EHR designers to improve the national capacity for generating real-world evidence.

“Seven projects—particularly those researching management of pain or psychological trauma—voiced a need for patient centered-data within health systems and EHRs. These studies reported having to collect primary data that were expected to be readily available, including patient reported outcomes, previously completed questionnaires, and advance care planning conversations.(Lakin et al. 2020) This led to time- and monetary-consuming adjustments.”

To counter this problem, the authors suggest the integration and collection of patient-centered data, including PROs, questionnaires, and advance care plans, into EHR systems and clinical workflows to support pragmatic research and personalized clinical care.

The process of implementing collection of PROs through the EHR may take substantial investments of time and money; current data collection approaches that support use of PROs are nascent and need better integration in clinical care (Van Der Wees et al. 2014). Additionally, due to concerns related to workflow and time constraints, uptake by clinical care staff may be slower than expected (Owen-Smith et al. 2018).

Another potential issue involves the concordance (or lack of concordance) between PROs and EHR data. For example, in the ADAPTABLE (Aspirin Dosing: A Patient-centric Trial Assessing Benefits and Long-Term Effectiveness) pragmatic clinical trial, there were inconsistencies between PRO data and EHR–derived data for demographics and clinical events that occurred during follow-up (O’Brien, et al, in press), indicating that more work is needed to ensure adequate capture of information.

Burden an acceptability of PROs relate to the time it takes for the participant to complete the measure and how acceptable the measure is at capturing how the patient is feeling and functioning.

For example, PPACT was a large mixed-methods, pragmatic, cluster-randomized clinical trial conducted in 3 regions of Kaiser Permanente health systems. PPACT was designed to evaluate the integration of multidisciplinary services within the primary care environment compared with usual care in these settings. (See Section 2 for a complete list of PROs used by PPACT.)

Lessons learned from PPACT indicate several key factors for success:

  • Brevity of the instrument
  • Interpretability of results by those providing care
  • Actionable information
  • Flexible, multimodal approach (web-based personal health records, interactive voice response, live outreach with support staff as necessary) (Owen-Smith et al. 2018)

PCORI’s Users' Guide to Integrating Patient-Reported Outcomes in Electronic Health Records is designed to facilitate integration of PRO data into the EHR. The guide addresses 11 key questions for incorporating PROs into the EHR:

  1. What strategy will be used for integrating PROs in EHRs?
  2. How will the PRO-EHR system be governed?
  3. How can users be trained and engaged?
  4. Which populations and patients are most suitable for collection and use of PRO data, and how can EHRs support identification of suitable patients?
  5. Which outcomes are important to measure for a given population?
  6. How should candidate PRO measures be evaluated?
  7. How, where, and with what frequency will PROs be administered?
  8. How will PRO data be displayed in the EHR?
  9. How will PRO data be acted upon?
  10. How can PRO data from multiple EHRs be pooled?
  11. What are the ethical and legal issues? (Snyder et al. 2012)

Although PROs are being used in clinical research to support and inform labeling claims, clinical care, and reimbursement policy, the quality of content regarding PROs in protocols is often suboptimal (Calvert et al. 2018). Therefore, the Standard Protocol Items: Recommendations for Interventional Trials (SPIRIT-PRO) Extension was developed to provide guidelines for the inclusion of PROs in clinical trial protocols. The extension provides many PRO-specific recommendations for protocols in which PRO data are a primary or secondary outcome (Calvert et al. 2018). These include:

  • Specify the individual(s) responsible for the PRO content of the trial protocol
  • Describe the PRO-specific research question and rationale
  • State specific PRO objectives or hypotheses
  • Specify any PRO-specific eligibility criteria (eg, language/reading requirements or prerandomization completion of PRO)
  • Specify the PRO concepts/domains used to evaluate the intervention
  • Include a schedule of PRO assessments
  • When a PRO is the primary endpoint, state the required sample size and recruitment target
  • Justify the PRO instrument to be used
  • Include a data collection plan
  • Specify whether more than 1 language version will be used and state whether translated versions have been developed using currently recommended methods
  • State and justify the use of a proxy respondent, where necessary
  • Specify PRO data collection and management strategies for minimizing avoidable missing data
  • Describe the process of PRO assessment for participants who discontinue or deviate from the assigned intervention protocol
  • State PRO analysis methods
  • State how missing data will be described
  • State whether or not PRO data will be monitored during the study to inform the clinical care of individual trial participants … Describe how this process will be explained to participants; eg, in the participant information sheet and consent form.


SECTIONS

CHAPTER SECTIONS

Resources

Design and analytic considerations for using patient-reported health data in pragmatic clinical trials: report from an NIH Collaboratory roundtable
This article describes approaches for ascertaining and classifying patient-reported health data as study endpoints and addressing issues of incomplete data, data alignment, and concordance.

Applying patient-reported outcome methodology to capture patient-reported health data: Report from an NIH Collaboratory roundtable
This article summarize key scientific literature on the accuracy of patient-reported data compared with health data

Patient-Centered Outcomes Core Working Group for the NIH Collaboratory:
Creates guidelines and defines best practices with respect to (1) selecting, compiling, and curating the most appropriate PRO measures (and stimulating the development of new instruments when new solutions are needed); (2) the creation efficient, high-quality PRO data collection systems compatible with EHRs and registries; and (3) conducting statistical analyses of PRO endpoints.

 

REFERENCES

back to top

Calvert M, Kyte D, Mercieca-Bebber R, et al. 2018. Guidelines for inclusion of patient-reported outcomes in clinical trial protocols: The SPIRIT-PRO Extension. JAMA. 319(5):483. doi:10.1001/jama.2017.21903. PMID: 29411037.

Lakin JR, Brannen EN, Tulsky JA, et al. 2020. Advance Care Planning: Promoting Effective and Aligned Communication in the Elderly (ACP-PEACE): the study protocol for a pragmatic stepped-wedge trial of older patients with cancer. BMJ Open. 10:e040999. doi:10.1136/bmjopen-2020-040999.

Owen-Smith A, Mayhew M, Leo MC, et al. 2018. Automating collection of pain-related patient-reported outcomes to enhance clinical care and research. J Gen Intern Med. 33(S1):31-37. doi:10.1007/s11606-018-4326-9. PMID: 29633139.

Richesson RL, Marsolo KS, Douthit BJ, et al. 2021. Enhancing the use of EHR systems for pragmatic embedded research: lessons from the NIH Health Care Systems Research Collaboratory. J Am Med Inform Assoc. 28:2626-2640. doi:10.1093/jamia/ocab202.

Snyder CF, Aaronson NK, Choucair AK, et al. 2012. Implementing patient-reported outcomes assessment in clinical practice: a review of the options and considerations. Qual Life Res. 21(8):1305-1314. doi:10.1007/s11136-011-0054-x. PMID: 22048932.

Synder C, Wu AW, eds. 2017. Users’ Guide to Integrating Patient-Reported Outcomes in Electronic Health Records. http://www.pcori.org/document/users-guide-integrating-patient-reported-outcomes-electronic-health-records.

Van Der Wees PJ, Nijhuis-Van Der Sanden MWG, Ayanian JZ, Black N, Westert GP, Schneider EC. 2014. Integrating the use of patient-eeported outcomes for both clinical practice and performance measurement: views of experts from 3 countries: patient-reported outcomes, clinical practice, performance measurement. Milbank Quarterly. 92(4):754-775. doi:10.1111/1468-0009.12091. PMID: 25492603.


Version History

September 14, 2022: Updated as part of annual review (changes made by K. Staman).

January 29, 2021: Updated with new information from the PRISM projects on acceptability and burden (changes made by K. Staman).

Published May 30, 2020

Choosing PRO Measures

Real-World Evidence: Patient-Reported Outcomes (PROs)


Section 5

Choosing PRO Measures

Where possible, investigators are encouraged to use measures with adequate support for validity that are in the public domain. When choosing a PRO measure for an ePCT, investigators should ask the following questions:

  • Are there core outcome sets or other widely used measures that are appropriate for my research question?
  • Is the PRO data already being collected? If so, how and where are the data collected? Are the data available in the EHR or through other mechanisms?
  • If the measure is not in the EHR system, what is the process for adding it? Who will be responsible for the cost? Are there alternative data capture platforms (e.g., REDCap)? Will the measure be acceptable and not burdensome to my patient population?
  • Does the measure have support for validity in this use case (i.e., in this target population, setting, etc)?
    • If not, is there a similar measure with existing support for validity in this population?
      • If not, how will you gather necessary data to support validity?
  • Is the measure in the public domain?
    • If not, is there a similar measure in the public domain that would be acceptable?

Investigators may want to consider NIH-sponsored measures (see Additional Resources) or the Core Outcome Sets (Section 3). If challenges arise in accessing appropriate PROs, funders may be open to troubleshooting or adapting plans to give the project the best chance of success.

Availability of Data

If PRO measures are already being collected as part of clinical care or quality assurance, then the investigator's priority is to determine how consistently the measures are collected across sites and in what format. In some cases, PRO data may be extracted from the EHR for research, but investigators should not assume that all sites are collecting PRO outcome data in the same way, if at all. For example, PPACT enrolled a patient population who had chronic pain and were on long-term opioid therapy. Although they initially were assured routine PRO measures for pain were collected (at each primary care clinic visit and at least quarterly) as part of required opioid treatment plan contracts and that the PRO measures would be in the EHR, collection of PRO measures was far less systematic than anticipated. Investigators from PPACT stated, “Despite the recognition of the potential benefits of using pain-related PROs, their systematic use in everyday clinical care is rare. In general, the use of pain-related PROs is not embedded into routine clinical practice in health care systems or coordinated with electronic health record (EHR) systems" (Owen-Smith et al. 2018). The investigators worked closely with clinical stakeholders at the participating health care systems to establish an acceptable assessment (DeBar et al. 2018).

Acceptability and Burden

Acceptability of the measure and perception of burden are important considerations for PROs, as lengthy questionnaires can lead to burnout for both the clinician and the patient. In general, PROs are more acceptable to patients if they provide value, such as informing treatment decisions, fostering meaningful communication between the provider and the patient, or helping with triage. Investigators should also ensure that the measure is not too burdensome: brief forms have generally been found to be preferable to longer forms.

For example, the PPACT trial was designed coordinate and integrate services for helping patients adopt self-management skills for managing chronic pain, limit use of opioid medications, and identify exacerbating factors amenable to treatment that are feasible and sustainable within the primary care setting. The primary outcome of pain and pain-related disability was measured through a patient questionnaire. Investigators worked closely with each of the participating healthcare systems to identify a suitable questionnaire that was brief enough, had acceptable psychometrics, focused on functioning, and was easily interpretable (DeBar et al. 2018). Given these needs, clinical leaders and primary care physicians (PCPs) at participating healthcare systems agreed that the 3-item PEG (Krebs et al. 2009) was more acceptable to clinicians than the 12-item Brief Pain Inventory (Cleeland and Ryan, 1994) from which the PEG was derived. Clinicians also viewed favorably the PEG’s emphasis on important functional domains (enjoyment of life, general activity, and an additional item requested by clinical stakeholders on sleep) rather than pain intensity (the focus of the frequently used numerical rating scale) when setting goals with patients.

“PCPs want to ask what’s really needed, and we should consider how we can marry that with clinical research.” —Lynn Debar, PI of PPACT

Patient feedback indicated concern that if they self-reported that their pain had improved, they might get taken off opioids, so the emphasis of the PEG on functioning helped clinicians shift the focus of their conversation with their  patients to what would help in improving the patients’ day-to-day functioning rather than getting mired in a sometimes contentious interchange around pain levels and opioids.

The 3-item PEG asks the participant to score the following questions on a 0-10 scale:

1. What number best describes your pain on average in the past week?

2. What number best describes how, during the past week, pain has interfered with your enjoyment of life?

3. What number best describes how, during the past week, pain has interfered with your general activity?

PPCAT investigators also included a fourth item on sleep:

4. What number best describes how, during the past week, pain has interfered with your sleep?

 


SECTIONS

CHAPTER SECTIONS

REFERENCES

back to top

Cleeland CS, Ryan KM. 1994. Pain assessment: global use of the Brief Pain Inventory. Ann Acad Med Singap. 23:129-38. PMID: 8080219

DeBar L, Benes L, Bonifay A, et al. 2018a. Interdisciplinary team-based care for patients with chronic pain on long-term opioid treatment in primary care (PPACT) - Protocol for a pragmatic cluster randomized trial. Contemp Clin Trials. 67:91-99. doi:10.1016/j.cct.2018.02.015. PMID: 29522897.

 

 

Krebs EE, Lorenz KA, Bair MJ, et al. 2009. Development and initial validation of the PEG, a three-item scale assessing pain intensity and interference. J Gen Intern Med. 24(6):733-738. doi:10.1007/s11606-009-0981-1. PMID: 19418100.

Owen-Smith A, Mayhew M, Leo MC, et al. 2018. Automating collection of pain-related patient-reported outcomes to enhance clinical care and research. J Gen Intern Med. 33(S1):31-37. doi:10.1007/s11606-018-4326-9. PMID: 29633139.

 

 


Version History

March 6, 2026: Added considerations for choosing a PRO measure for an ePCT (changes made by K. Staman).

September 14, 2022: Updates made as part of annual review (changes made by K. Staman).

Published May 30, 2020

Updated November 17, 2020: Moved the Common Data Elements to a new section (changes made by K Staman).

How Are PRO Measures Used?

Real-World Evidence: Patient-Reported Outcomes (PROs)


Section 2

How Are PRO Measures Used?

Within the context of pragmatic clinical trials embedded within health care systems (ePCTs), PRO measures that are already being collected as part of routine clinical care and/or for quality assurance purposes could easily be integrated into the trial. However, in some cases, PROs are not collected routinely, necessitating a separate data collection protocol (although this seems to be rapidly changing).

To use PRO data most effectively in pragmatic research, it is useful to understand the different roles played by PROs in clinical research, clinical care, and quality assurance.

Clinical Research

PRO Measures as Study Endpoints

PROs play a significant role as study endpoints in the development and evaluation of new therapies (Willke et al. 2004; Gnanasakthy et al. 2016; Gnanasakthy et al. 2017). PROs have not been used as frequently as study endpoints in ePCTs, in large part because PROs are often not included as part of routine clinical care and thus the electronic health record (EHR), which is typically the main source of data for many ePCTs. Still, several trials within the NIH Pragmatic Trials Collaboratory have or are assessing PROs as trial endpoints (see Table).

Table. NIH Pragmatic NIH Collaboratory Trials With PROs
Study Name
Project Goal Indication Primary and Secondary Outcome PRO Measures
BackInAction

Pragmatic Trial of Acupuncture for Chronic Low Back Pain in Older Adults

Evaluate the safety and effectiveness of acupuncture in older adults with chronic low back pain Chronic low back pain in adults ≥65 years of age at 4 performance sites (789 patients) Primary: Back-related function

Secondary: Pain Intensity, Pain Interference

Euro-QOL-5d

global impression of change (PGIC) - 1 item

Pain Catastrophizing Scale (PCS) 6 item scale

Patient Health Questionnaire (PHQ-2)

GAD-2

PEG

PROMIS Ability to participate in social roles and activities 4a

PROMIS physical functioning  6b

PROMIS sleep disturbance 6a

Roland Morris Disability Questionnaire (RMDQ)

TAPS 1

acupunture outside the study

Adherence to assigned treatment

Adverse events

back pain history

EHR

High impact chronic pain

Sleep duration question

cognitive functioning screener

Frailty profile

PHQ4 screener for depression and anxiety

Impact of COVID on overall health and access to healthcare

Sciatica detection

Pain related healthcare and self-mgmt practices

NIH LBP Task Force fear avoidance (1-item)

EXPECT  acupuncture expectation questions (1-items)

PROMIS fatigue scale

Heal CDE demographic questions + BMI

BeatPain Utah

Nonpharmacologic Pain Management in Federally Qualified Health Centers (FQHCs) Primary Care Clinics

 

To compare the effectiveness of nonpharmacologic intervention
strategies for patients with back pain
500 English- or Spanish-speaking patients with  chronic low back pain seeking care in FQHCs throughout the state of Utah Primary: the Pain, Enjoyment and General Activity measure of pain

Secondary: HEAL measures

GAD-2

global impression of change (PGIC) - 1 item

Pain Catastrophizing Scale (PCS) 6 item scale

Patient Health Questionnaire (PHQ-2)

Patient global impression change (PGIC)

PEG

PHQ-2

PROMIS physical functioning  6b

PROMIS sleep disturbance 6a

Sleep duration question

TAPS 1

back pain history

HICP (High Impact Chronic Pain)

Pain Medications

PSEQ-4

FM TIPS

Fibromyalgia TENS in Physical Therapy Study

Test the feasibility and effectiveness of adding transcutaneous electrical nerve stimulation (TENS) nonpharmacologic treatment for pain and fatigue in patients with fibromyalgia (FM) Fibromyalgia in adults at 24 routine physical therapy clinics and 6 health systems in rural and urban settings (~600 patients) Primary: Pain

Secondary: Physical functioning

Brief pain inventory - short form

Fibromyalgia Impact Questionnaire- Revised (FIQR)

GAD-7

Pain Catastrophizing Scale (PCS) - 13 item scale

global impression of change (PGIC) - 1 item

Movement evoked (5x sit to stand) pain

Movement evoked fatigue

Multidimensional Assessment of Fatigue (MAF)

PHQ-8

PROMIS physical functioning  6b

PROMIS sleep disturbance 6a

Rapid Assessment of Physical Activity (RAPA)

Resting fatigue by NRS

Resting pain by NRS

TAPS 1

Adverse events

Barrier to TENS

Medications (targeted to pain, mood, sleep)

Patient specific functional scale (psfs)

Sleep duration question

Symptom Severity Score

Widespread pain index (WPI)

 

GRACE

Hybrid Effectiveness-Implementation Trial of Guided Relaxation and Acupuncture for Chronic Sickle Cell Disease Pain

To assess the effects of guided relaxation and acupuncture treatments for people with sickle cell disease. 366 people, aged 18 and up, living with chronic pain resulting from
Sickle Cell Disease
Primary: Pain interference, enjoyment of life, and physical function

Secondary: Anxiety, depression, sleep disturbance, and substance use

GAD-7

global impression of change (PGIC) - 1 item

Pain Catastrophizing Scale (PCS) 6 item scale

Patient Health Questionnaire (PHQ-2)

PEG

PHQ-9

PROMIS GI Constipation 9a

PROMIS pain interference 4a

PROMIS physical functioning  6b

Sleep duration question

TAPS 1

Acupuncture Protocol Checklist

ED Visits and Hospitalizations

Non-Pharm Treatments

Opioid Followback

PROMIS sleep disturbance 8a

Implementation Questionnaire

Adverse Events Form

 

NOHARM

Nonpharmacologic Options in Postoperative Hospital-based and Rehabilitation Pain Management

Evaluate the feasibility of EHR-embedded patient- and clinician-facing decision support for nonpharmacologic pain care after surgery Post-surgical pain following eligible procedures in adults at 6 large integrated health systems; 23 practice clusters (~100,000 patients) in 5 tranches Primary: Physical function and pain interference

Secondary: Anxiety, sleep disturbance, use of opioids, and nonpharmacologic pain care modalities

PHQ-2

TAPS1

GAD-2

Pain Catastrophizing Scale (PCS) 6 item scale

Pain NRS

PROMIS CAT Anxiety

PROMIS CAT Pain Interference

PROMIS CAT physical function

OPTIMUM

Group-Based Mindfulness for Patients With Chronic Low Back Pain in the Primary Care Setting

Evaluate a group-based mindfulness program (mindfulness-based stress reduction) for patients with chronic low back pain within primary care Chronic low back pain in patients at primary care clinics in 3 large health systems (~450 patients) Primary: Pain intensity, physical function

Secondary: Pain interference, psychological function, opioid prescriptions

Charlson co-morbidity index

Cognitive and affective mindfulness scale - revised (cams-r)

current opioid misuse measure (comm)

GAD-2

global impression of change (PGIC) - 1 item

healing encounters and attitudes list (heal-expectation)

health care system utilization (patient report)

opioid use, single item

Pain Catastrophizing Scale (PCS) 6 item scale

Patient Health Questionnaire (PHQ-2)

PEG

PHQ-2

PROMIS physical functioning  6b

PROMIS sleep disturbance 6a

satisfaction, single item

Sleep duration question

TAPS 1

COPC measure

EHR

promis 29

Ethics, single item

Telehealth Usability Questionnaire

PPACT

Pain Program for Active Coping and Training

Help patients adopt self-management skills for chronic pain, limit use of opioid medications, and identify factors amenable to treatment in the primary care setting Chronic pain in patients on long-term opioid therapy at 3 staff model health plans; involves 106 primary care clusters (860 patients) Primary: Pain impact

Secondary: Pain-related disability, patient satisfaction, opioids and benzodiazepines dispensed and health care utilization

Primary: PEG, a validated 3-item measure for pain (Cleeland and Ryan 1994; Keller et al. 2004)

Secondary: 24-item Roland Morris Disability Questionnaire (RMDQ) (Roland and Fairbank 2000; Jordan et al. 2006)

TSOS

Trauma Survivors Outcomes and Support

 

To coordinate care and improve outcomes for trauma survivors with post-traumatic stress disorder (PTSD) and comorbidity and to provide the American College of Surgeons  with multisite pragmatic trial evidence that could further inform regulatory policy PTSD and comorbidity in trauma survivors at  25 US level 1 trauma centers (635 patients) Primary: PTSD symptoms

Secondary: Depression, alcohol use, physical functioning

Primary: 17-item PTSD checklist, civilian version (Weathers et al. 1991)

Secondary: The 9-item Patient Health Questionnaire (PHQ-9) brief depression severity measure (Kroenke et al. 2001; Arroll et al. 2010)

The Alcohol Use Disorder Identification Test (AUDIT), a 10-item screening instrument for the early identification of problem drinkers (Bohn et al. 1995)

The SF-12 at baseline Physical Components Summary Score and the SF-36 Physical Components Summary Score at the follow-up time points

Abbreviations: NRS = numeric rating scale; PROMIS = Patient-Reported Outcomes Measurement Information System; CAT = computer adaptive testing; PEG = pain, enjoyment of life, and general activity; SF = short form

For NOHARM, all outcomes are collected via the EHR. For all other trials, separate mechanisms were needed to collect the measures.

As with all trial endpoints, researchers should specify in the research protocol whether a PRO endpoint will serve as a primary, secondary, or exploratory endpoint (FDA Guidance for Industry 2009). Often times there will be interest in the effect of an intervention on more than one aspect of the patient’s experience—for example, on pain severity, pain frequency, and interference in daily activities due to pain. With multiple PRO endpoints, care must be taken to create a strong a priori rationale for how the multiple endpoints will be handled at the analysis phase, because of the risk of Type I error inflation and/or challenges in interpreting patterns of results across endpoints. FDA has published guidance entitled Multiple Endpoints in Clinical Trials Guidance for Industry, which describe strategies managing multiple endpoints in a study.

 

It should be noted that ethical issues arise when collecting sensitive data that could signal distress, such as suicidal ideation, opioid use disorder, or depression. According to an article by Ali et al, investigators have an ethical obligation to monitor these signals and identify in advance if, when, and how such signals will trigger a response (Ali et al. 2022). Using examples from the NIH Collaboratory Trials, the authors offered preliminary recommendations and identified opportunities for future work, which include:

  • Understanding and aligning stakeholder expectations
  • Considering characteristics of the trial and study population to inform a response
  • Defining triggers, thresholds, and responsibilities for action
  • Identifying appropriate response mechanisms and capabilities
  • Integrating with clinical practices and systems
  • Addressing patient-subject privacy

PRO Measures as Monitoring Tools (Adverse Events and Symptoms)

PRO measures can also be used to capture or monitor the adverse effects of an intervention separately from its effectiveness. For example, clinical trial investigators collect adverse event (AE) data to ensure patient safety and inform sponsors, regulators, patients, caregivers, and clinicians about adverse effects of treatment. Clinicians typically grade AEs using the National Cancer Institute’s Common Terminology Criteria for Adverse Events (CTCAE) (National Cancer Institute 2018). In order to better capture negative, uncomfortable, and impactful symptoms from the patient’s perspective, such as nausea and anxiety, the National Cancer Institute developed the PRO-CTCAE, which was designed for adults participating in oncology trials (Basch et al. 2014; Dueck et al. 2015).

PRO Measures as the Intervention

In some cases, the PRO measure can be the intervention. A study by Basch et al. demonstrated that clinical benefits were associated with self-report of symptoms in patients receiving care for cancer, such as improved health-related quality of life, fewer emergency department visits, fewer hospitalizations, longer duration of palliative therapy, and superior survival rates (Basch et al. 2016). In the study, patients recorded symptoms on a tablet, and an automated alert was sent to clinicians when patient-reported symptoms were severe or worsening. The authors postulate that the benefits were due to increased rates of discussion between patients and clinicians resulting in intensified symptom management and improved symptom control.

Clinical Care

Ideally, a PRO instrument will not only be a valid and reliable way to collect data, but it will also make a positive contribution to clinical care (Farnik and Pierzchała 2012). Data collected from a PRO instrument can be used in longitudinal reporting at the point of care and as part of clinical decision-making and review of systems. In addition, PRO data can be used to trigger patient education and interventions and as a means to triage patients to receive other services, helping the patient understand that the information they are reporting is meaningful to their care.

One of the NIH Collaboratory Trials within the NIH Collaboratory provides a good example of how a PRO-based intervention for research can be incorporated into clinical care. The Collaborative Care for Chronic Pain in Primary Care project was a mixed-methods, cluster-randomized pragmatic clinical trial designed evaluate the integration of psychosocial services into the primary care of patients with chronic pain. The intervention, the Pain Program for Active Coping and Training (PPACT), involves behavioral skills training designed to engage patients in their own care and help them manage their pain.

The study compared the effects of the intervention versus usual care on a number of measures, including patients’ pain symptoms, functional ability, satisfaction with healthcare services, and receipt of opioids and benzodiazepine medication. As part of the project, the patient completed a brief pain inventory (online using the EHR patient portal, using interactive voice response technology, or via a call with a medical assistant). For patients randomized to the active intervention, a more extensive intake evaluation was completed and compiled into an electronic summary outside the firewall of the EHR and sent to participant’s primary care physician through the EHR. The report incorporated real-time analysis and scoring of the data and presented the information in clinical context; as a result, it provided the physician with easily interpretable, actionable information derived from PRO data in order to promote discussion with the patient, trigger educational interventions, and aid in clinical decision-making (Debar et al. 2022).

Patient Satisfaction and Quality Assurance

A systematic review of 27 studies in the cancer setting suggests that PROs improved patient-provider communication and patient satisfaction, in part, because clinicians talked to patients about their feelings and health status and were able to develop a shared view of treatment goals, health status, or reason for the visit (Chen et al. 2013). Online patient self-reporting of toxicity symptoms during chemotherapy has been shown to be feasible, even among patients with advanced cancer and high symptom burdens (National Quality Forum 2013). PROs can also be used as reliable measures of healthcare performance; for example, the National Quality Forum endorses the use of PRO-based performance measures for the purposes of performance improvement and accountability (National Quality Forum 2013).


SECTIONS

CHAPTER SECTIONS

REFERENCES

back to top

REFERENCES

Ali J, Morain SR, O’Rourke PP, et al. 2022. Responding to signals of mental and behavioral health risk in pragmatic clinical trials: Ethical obligations in a healthcare ecosystem. Contemp Clin Trials. 113:106651. doi:10.1016/j.cct.2021.106651.

Arroll B, Goodyear-Smith F, Crengle S, et al. 2010. Validation of PHQ-2 and PHQ-9 to screen for major depression in the primary care population. Ann Fam Med. 8(4):348-353. doi:10.1370/afm.1139. PMID: 20644190.

Basch E, Abernethy AP, Mullins CD, et al. 2012. Recommendations for incorporating patient-reported outcomes into clinical comparative effectiveness research in adult oncology. J Clin Oncol. 30(34):4249-4255. doi:10.1200/JCO.2012.42.5967. PMID: 23071244.

Basch E, Deal AM, Kris MG, et al. 2016. Symptom monitoring with patient-reported outcomes during routine cancer treatment: a randomized controlled trial. J Clin Oncol. 34(6):557-565. doi:10.1200/JCO.2015.63.0830. PMID: 26644527.

Basch E, Reeve BB, Mitchell SA, et al. 2014. Development of the National Cancer Institute’s Patient-Reported Outcomes Version of the Common Terminology Criteria for Adverse Events (PRO-CTCAE). J Natl Cancer Inst. 106(9):dju244-dju244. doi:10.1093/jnci/dju244. PMID: 26644527.

Bohn MJ, Babor TF, Kranzler HR. 1995. The Alcohol Use Disorders Identification Test (AUDIT): validation of a screening instrument for use in medical settings. J Stud Alcohol. 56(4):423-432. doi:10.15288/jsa.1995.56.423. PMID: 7674678.

Chen J, Ou L, Hollis SJ. 2013. A systematic review of the impact of routine collection of patient reported outcome measures on patients, providers and health organisations in an oncologic setting. BMC Health Serv Res. 13:211. doi:10.1186/1472-6963-13-211. PMID: 23758898.

Cleeland CS, Ryan KM. 1994. Pain assessment: global use of the Brief Pain Inventory. Ann Acad Med Singap. 23(2):129-138. PMID: 8080219.

Dueck AC, Mendoza TR, Mitchell SA, et al. 2015. Validity and reliability of the US National Cancer Institute’s Patient-Reported Outcomes Version of the Common Terminology Criteria for Adverse Events (PRO-CTCAE). JAMA Oncol. 1(8):1051-1059. doi:10.1001/jamaoncol.2015.2639. PMID: 26270597.

Farnik M, Pierzchała WA. 2012. Instrument development and evaluation for patient-related outcomes assessments. Patient Relat Outcome Meas. 3:1-7. doi:10.2147/PROM.S14405. PMID: 22915979.

FDA Guidance for Industry. 2009. Patient-Reported Outcome Measures: Use in Medical Product Development to Support Labeling Claims. Accessed September 25, 2013. http://www.fda.gov/downloads/Drugs/Guidances/UCM193282.pdf.

Gnanasakthy A, DeMuro C, Clark M, Haydysch E, Ma E, Bonthapally V. 2016. Patient-reported outcomes labeling for products approved by the Office of Hematology and Oncology Products of the US Food and Drug Administration (2010-2014). J Clin Oncol. 34(16):1928-1934. doi:10.1200/JCO.2015.63.6480. PMID: 27069082.

Gnanasakthy A, Mordin M, Evans E, Doward L, DeMuro C. 2017. A review of patient-reported outcome labeling in the United States (2011–2015). Value Health. 20(3):420-429. doi:10.1016/j.jval.2016.10.006. PMID: 28292487.

Jordan K, Dunn KM, Lewis M, Croft P. 2006. A minimal clinically important difference was derived for the Roland-Morris Disability Questionnaire for low back pain. J Clin Epidemiol. 59(1):45-52. doi:10.1016/j.jclinepi.2005.03.018.

Keller S, Bann CM, Dodd SL, Schein J, Mendoza TR, Cleeland CS. 2004. Validity of the brief pain inventory for use in documenting the outcomes of patients with noncancer pain. Clin J Pain. 20(5):309-318. doi:10.1097/00002508-200409000-00005. PMID: 16360560.

Kroenke K, Krebs EE, Turk D, et al. 2019. Core outcome measures for chronic musculoskeletal pain research: recommendations from a Veterans Health Administration Work Group. Pain Med. 20(8):1500–1508. doi:10.1093/pm/pny279. PMID: 30615172.

National Cancer Institute. 2018. Common Terminology Criteria for Adverse Events (CTCAE). Accessed August 5, 2019. https://ctep.cancer.gov/protocoldevelopment/electronic_applications/ctc.htm#ctc_50.

National Quality Forum. 2013. Patient Reported Outcomes (PROS) in Performance Measurement. Accessed December 5, 2013. http://www.qualityforum.org/Publications/2012/12/Patient-Reported_Outcomes_in_Performance_Measurement.aspx.

Roland M, Fairbank J. 2000. The Roland-Morris Disability Questionnaire and the Oswestry Disability Questionnaire. Spine. 25(24):3115-3124. doi:10.1097/00007632-200012150-00006. PMID: 11124727.

Weathers F, Huska J, Keane T. 1991. The PTSD checklist-civilian version. Accessed May 5, 2020. https://www.mirecc.va.gov/docs/visn6/3_PTSD_CheckList_and_Scoring.pdf.

Willke RJ, Burke LB, Erickson P. 2004. Measuring treatment impact: a review of patient-reported outcomes and other efficacy endpoints in approved product labels. Control Clin Trials. 25(6):535-552. doi:10.1016/j.cct.2004.09.003. PMID: 15588741.


Version History

September 14, 2022: Updated as part of the annual review process (changes made by K. Staman).

October 7, 2020: Changed the title of AcuOA NIH Collaboratory Trial to its new title, BackInAction (change made by L. Wing).

Published May 30, 2020

Introduction

Real-World Evidence: Patient-Reported Outcomes (PROs)


Section 1

Introduction

When pragmatic trials test interventions designed to change how someone feels or functions in their day-to-day lives, outcomes can be obtained through direct patient report, as patients are typically the best source of information about how they are feeling and managing outside of the clinic. For example, the PPACT trial tested whether multidisciplinary services, such as physical therapy and psychological interventions, would lead to improvements in pain, as reported by participants.

As defined by the U.S. Food and Drug Administration (FDA), a patient-reported outcome (PRO) is “any report of the status of a patient’s health condition that comes directly from the patient, without interpretation of the patient’s response by a clinician or anyone else" (FDA Guidance for Industry 2009). Work funded by the Patient-Centered Outcomes Research Institute (PCORI) defines PROs as “the patient’s report of the impact of health, disease, and treatment from the patient perspective, generally collected via questionnaires (Snyder and Wu 2017).” This definition makes a distinction between patient-generated data collected from devices (such as pedometers or home blood pressure monitors) and patients’ direct report of outcomes, which typically include information about symptoms, functioning, satisfaction with care or symptoms, adherence to prescribed medications or other therapy, and perceived value of treatment.

In this chapter, we describe how PROs are used in different settings and how to choose and integrate a PRO measure into an embedded pragmatic clinical trial protocol. In addition, we briefly describe resources for measures, including core outcome sets for PROs.


SECTIONS

CHAPTER SECTIONS

Resources

Patient-Centered Outcomes Core Toolkit 

The purpose of this toolkit is to provide resources to support the capture of patient-reported outcome measures in diverse study populations participating in the NIH Collaboratory Trials and other pragmatic clinical trials. This toolkit contains a Checklist focused on health equity considerations and PROs, along with Additional Resources.

Grand Rounds:

Users’ Guide for Integrating Patient-Reported Outcomes in Electronic Health Records (Claire Snyder, PhD, Albert W. Wu, MD, MPH)

Living Textbook Chapter:

Endpoints and Outcomes: Outcomes Measured via Direct Patient Report

White Paper:

Patient-Reported Outcomes

This white paper covers how to use, measure, interpret, and implement PRO measures.

 

REFERENCES

back to top

FDA Guidance for Industry. 2009. Patient-Reported Outcome Measures: Use in Medical Product Development to Support Labeling Claims. http://www.fda.gov/downloads/Drugs/Guidances/UCM193282.pdf. Accessed June 1, 2020.

Snyder C, Wu AW eds. 2017. Users’ Guide to Integrating Patient-Reported Outcomes in Electronic Health Records. http://www.pcori.org/document/users-guide-integrating-patient-reported-outcomes- electronic-health-records. Accessed May 28, 2020.


Version History

June 26, 2020: Added resource for PRO white paper (changes made by K. Staman)

Published May 30, 2020