Electronic Health Records–Based Phenotyping

Section 5 Data Quality

Contributors

The quality of the data in healthcare information systems has the potential to affect the results of phenotype-based queries in such a way that the resulting data may not be useful. Secondary use of healthcare data is defined as use of the data for a purpose other than that for which the data were originally collected (Safran et al 2007). This means that secondary users should not expect the data to meet their needs. For these reasons, data quality assessment should accompany phenotype validation.

Using healthcare data in the absence of an understanding of their accuracy, consistency, missingness, and possible biases can lead to misleading answers. The capacity of the data to support research conclusions is so important that requests for applications for the NIH Pragmatic Trials Collaboratory Trials require that data validation be addressed. A recent methodology report from the Patient-Centered Outcomes Research Institute (PCORI) (Kahn et al 2018) recommends reporting of data quality along with study results for observational and comparative effectiveness research. The report also provides a data quality assessment model and framework. Other guidelines from research networks provide practical advice for data quality checks and reporting (Brown, Kahn, and Toh 2013; Kahn et al 2015).

The NIH Pragmatic Trials Collaboratory has developed a data quality assessment framework to help investigators and research teams identify and implement necessary assessments. (See “Assessing Data Quality for Healthcare Systems Data Used in Clinical Research.”) There are few validated electronic methods for data quality assessment that can be executed on a dataset. Instead, current methods for data quality assessment are comparison-based, involving comparison of chart review to data returned from a phenotype-based query, or comparison of 2 datasets to quantify the number and type of discrepancies and understand how they might be distributed in a dataset.

Previous Section Next Section

SECTIONS

CHAPTER SECTIONS

sections

Resources

Assessing Data Quality for Healthcare Systems Data Used in Clinical Research
Guidance document from the NIH Collaboratory's Electronic Health Records Core Working Group.

REFERENCES

Brown JS, Kahn M, Toh S. 2013. Data quality assessment for comparative effectiveness research in distributed data networks. Med Care. 51(8 Suppl 3):S22-S29. doi:10.1097/MLR.0b013e31829b1e2c. PMID: 23793049.

Kahn MG, Brown JS, Chun AT, et al. 2015. Transparent reporting of data quality in distributed data networks. EGEMS (Wash DC). 3(1):1052. doi:10.13063/2327-9214.1052. PMID: 25992385.

Kahn M, Ong T, Barnard J, Maertens J. 2018. Developing Standards for Improving Measurement and Reporting of Data Quality in Health Research. Washington, DC: Patient-Centered Outcomes Research Institute. https://doi.org/10.25302/3.2018.ME.13035581. Accessed June 30, 2020.

Safran C, Bloomrosen M, Hammond WE, et al. 2007. Toward a national framework for the secondary use of health data: an American Medical Informatics Association white paper. J Am Med Inform Assoc. 14:1-9. doi:10.1197/jamia.M2273. PMID: 17077452.

ACKNOWLEDGMENTS

Key contributors to previous versions of this chapter included Michelle Smerek, Shelley Rusincovitch, Meredith Nahm Zozus, Paramita Saha Chaudhuri, Ed Hammond, Robert Califf, Greg Simon, Beverly Green, Michael Kahn, and Reesa Laws.

The Electronic Health Records Core Working Group (formerly the Phenotypes, Data Standards, and Data Quality Core Working Group) of the NIH Collaboratory influenced much of this content through monthly meetings. These additional contributors included Monique Anderson, Nick Anderson, Alan Bauck, Denise Cifelli, Lesley Curtis, John Dickerson, Chris Helker, Michael Kahn, Cindy Kluchar, Melissa Leventhal, Rosemary Madigan, Renee Pridgen, Jon Puro, Jennifer Robinson, Jerry Sheehan, and Kari Stephens. We are also grateful to the Duke Center for Predictive Medicine for development and clarification of the scientific validity and evaluation of phenotype definitions.

Version History

June 23, 2022: Updated the name of the NIH Collaboratory in the contributors list and throughout the text as part of the annual content update (changes made by D. Seils).

July 8, 2020: Updated links in the list of contributors; and made nonsubstantive corrections to the text (changes made by D. Seils).

July 1, 2020: Addition of Resources sidebar; and minor corrections to layout and formatting (changes made by D. Seils).

Published June 30, 2020

COVID-19 Resources

COVID-19 Resources

Rethinking Clinical Trials

A Living Textbook of Pragmatic Clinical Trials

Data Quality

Electronic Health Records–Based Phenotyping

Section 5

Data Quality

SECTIONS

sections

Resources

REFERENCES

ACKNOWLEDGMENTS

current section :

Data Quality

Citation: