Assessing Fitness for Use of Real-World Data Sources
Section 7
Data Provenance
There is widespread variability in how information is captured in EHRs within and across healthcare systems. The same is true with how administrative claims are processed by health insurance providers. There is also variability in how data at sites are mapped between source systems and the value sets within a dataset or CDM. Knowledge about data collection practices and the decisions made in the source system–to–CDM translation can provide additional insight and context into the reliability of a dataset as it relates to data accrual (Johnson et al 2014). Many DRNs, for instance, ask their collaborators to complete surveys that describe the provenance of their data sources and to provide detail about the characteristics of their clinical workflows and/or source systems (Qualls et al 2018). Similarly, the NIH Collaboratory data quality white paper includes a recommendation that researchers explore the data collection and transformation procedures at each site, though this can be a highly manual process involving interviews and discussions between local data experts and the research team. It is important not to make this process overly burdensome, however, as these surveys should not be a onetime event, given that clinical systems change over time. Having multiple responses can provide a more complete longitudinal history. Provenance information can also be incorporated at the record level, for instance by assigning values as part of the data transformation process (eg, whether a diagnosis originated in a billing system, was entered by a clinician, or was derived from a note through natural language processing). These details are important because datasets that include records from one source and not the others or includes records from several sources that are not distinguished from one another will end up generating very different profiles.
SECTIONS
Resources
Evaluation System for Health Technology Coordinating Center (NESTcc) Data Quality Framework; NEST Coordinating Center; 2019.
REFERENCES
Johnson KE, Kamineni A, Fuller S, Olmstead D, Wernli KJ. 2014. How the provenance of electronic health record data matters for research: a case example using system mapping. EGEMS (Washington, DC). 2:1058. doi:10.13063/2327-9214.1058. PMID: Medline:25821838.
Qualls LG, Phillips TA, Topping J, et al. 2018. Evaluating Foundational Data Quality in the National Patient-Centered Clinical Research Network (PCORnet). eGEMs (Generating Evidence & Methods to improve patient outcomes. Apr 13;6(1):3. doi:10.5334/egems.199. PMID: 29881761.