Data Sharing and Embedded Research

Section 2 Data Sharing Concerns

Contributors

Type of Information Disclosed

Traditional clinical trials, such as tests of efficacy of a new drug, device, behavioral treatment or process, typically create research data sets, which are readily deidentified and contain a limited number of data elements that pertain to the research question. With these data sets it is generally feasible for researchers who did not participate in the original trial to both reproduce the primary results and to perform additional analyses addressing different questions. However, the range of these additional questions is typically limited by the design of the trial dataset. Pragmatic trials and other embedded research typically compare alternative treatments, treatment strategies, or policies. In those comparisons, variation in practice patterns among providers or facilities are potentially important confounders—especially in trials randomizing providers or facilities rather than individual patients.

Embedded research data sets may contain rich information extracted from health system records. Those practice-based data often contain more specific information about the providers and the systems themselves than do conventional clinical trials. Examples include the number, size, or location of facilities and practices; practice volume; the number, size, and census of primary, specialty, and inpatient care units; the number or type of personnel they employ, the structure of their formularies; and information about their vendors and supply chain. For example, the NIH Pragmatic Trials Collaboratory's Active Bathing to Eliminate (ABATE) Infection NIH Collaboratory Trial uses data from over 600,000 admissions in 53 hospitals (For a complete description of all the trials, see the table in Definition of a Pragmatic Trial). The dataset included information on every hospital’s census and length of stay on most wards, plus individuals’ procedure and comorbidity data that could reveal sensitive business information regarding patient volume, size of individual services, length of stay on individual wards, and case mix. The size and richness of the data set effectively precluded protection against re-identification of the hospitals by comparison with external data sources. These facilities varied in size, and could readily be identified by the simple release of numerators and denominators. In addition, because the ABATE study evaluated changes in the number of multidrug-resistant organisms in clinical cultures in these facilities, the potential for misuse and misinterpretation of the data for purposes unrelated to the original research question (e.g., using the data to make biased comparisons of the quality of care at these facilities) would be unacceptable to the healthcare system.

Similarly, the PROVEN NIH Collaboratory Trial worked with 2 nursing home systems operating in over 20 states and obtained a wide array of clinical data downloaded monthly on over 200,000 admissions and 60,000 long-stay residents treated in 360 skilled nursing facilities (Mor el al 2017). Detailed clinical and demographic data from standardized patient assessments on all these patients was automatically merged with longitudinal information about staffing, treatments, and hospitalizations from the facility, which in turn was merged with Medicare claims data to track hospital use and vital status, regardless of whether the patient switched facilities, over the entire study period. While facilities participating in the intervention group represented less than 1% of all US facilities, it would not be difficult to identify the facilities depending upon the level of detail in which study results were presented.

Provider and Institutional Confidentiality

There are two types of confidentiality risks to provider groups and healthcare systems. The first involves revealing business information (e.g., which drugs are purchased or what price is paid for specific services), which has a clear right of privacy. The second is revealing information that could be used for naïve and potentially biased comparisons of quality of care or performance, especially if that information is different (more detailed; more subsets, limited to vulnerable populations, lacking case-mix adjustment) from what is publicly reported by all systems or facilities. In a perfect world there would be no right of privacy about quality of healthcare delivery, but the current world is not perfect because the scope of disclosures for those participating in embedded research could be far greater than that required for assessing quality parameters. Health systems volunteer to participate in research to improve public health, and bearing an additional risk of misuse of sensitive information may be unacceptable (Platt et al 2016). Moreover, healthcare entities that are incentivized for their performance on quality metrics may be especially concerned about research that may produce data inconsistent with public reports because of differences between definitions or methods used by a study versus those used for public quality measures (for example, Healthcare Information Data and Information Set [HEDIS] based on claims data versus HEDIS that relies on medical record abstraction) (Simon et al 2017). Also, the information may be extremely sensitive and may include vulnerable populations. For example, for PPACT, part of the reason that healthcare systems and individual providers partnered for the research was the tremendous concern about overprescribing opioids and the dangers it presents. Yet, there were substantial sensitivities about individual primary care provider prescribing patterns, which in turn influenced what data could be made available in the shared data sets. Sensitive medical domains that might be the focus of an embedded trial—areas with complex and sensitive issues—could present similar concerns.

Consent

Current and proposed disclosure policies are particularly problematic for observational studies and cluster-randomized trials because providers and delivery systems, like their patients, have some of the attributes of research subjects. This is especially problematic for providers since, while care systems may authorize use of their data, individual providers typically are not provided this opportunity.

Many embedded research studies are granted a waiver of consent from patients, with the requirement that personal health information be protected from disclosure. For providers, practices, and health systems that participate in research studies, although there are no similar regulatory protections, there is a reasonable corollary, especially for individuals whose involvement is determined by their inclusion in a randomized cluster without their explicit consent. Some have argued that heath systems, providers, and/or individual practitioners are participants in embedded research—much like patients—and therefore we have ethical obligation to provide suitable assurances regarding legitimate privacy and confidentiality concerns about use and re-use of proprietary data collected during clinical care. However, this ethical argument has proved contentious; the scientific community is encouraging a shift to a more transparent clinical trials enterprise, and this type of data sharing is required in other industries, including the pharmaceutical and device industries. The crux of the argument boils down to a very practical matter, which is ensuring voluntary participation.

Risk of Breach in Data Security

Embedded research studies are typically orders of magnitude larger than conventional clinical trials, making delivery systems especially sensitive to the potential for breaches of data security. For example, the median sample size for the NIH Pragmatic Trials Collaboratory trials is 19,500 individuals; the largest involves 600,000 individuals. Thus, the potential for harm from a single security breach is substantial.

Previous Section Next Section

SECTIONS

CHAPTER SECTIONS

sections

Resources

Data Sharing and Pragmatic Clinical Trials: Law and Ethics Amidst a Changing Policy Landscape; NIH Pragmatic Trials Collaboratory PCT Grand Rounds; November 11, 2022

REFERENCES

NIH Collaboratory Healthcare Systems Interactions Core. 2016. Lessons Learned from the NIH Health Care Systems Research Collaboratory Trials. Accessed September 1, 2016.

Mor V, Volandes AE, Gutman R, Gatsonis C, Mitchell SL. 2017. PRagmatic trial Of Video Education in Nursing homes: the design and rationale for a pragmatic cluster randomized trial in the nursing home setting. Clin Trials. 14(2):140–151. doi:10.1177/1740774516685298. PMID: 28068789

Platt R, Ramsberg J. 2016. Challenges for sharing data from embedded research. N Engl J Med. 374(19):1897–1897. doi:10.1056/NEJMc1602016. PMID: 27096325.

Simon GE, Coronado G, DeBar LL, et al. 2017 Oct 3. Data sharing and embedded research. Ann Intern Med. doi:10.7326/M17-0863. PMID: 28973353.

Version History

March 9, 2023: Changed the description of the trials to past tense (changes made by K. Staman).

February 16, 2023: Added a resource to the Resources sidebar (changes made by D. Seils).

December 5, 2018: Added references as part of the annual update (changes made by K. Staman).

Published August 25, 2017

COVID-19 Resources

COVID-19 Resources

Rethinking Clinical Trials

A Living Textbook of Pragmatic Clinical Trials

Data Sharing Concerns

Data Sharing and Embedded Research

Section 2

Data Sharing Concerns

Type of Information Disclosed

Provider and Institutional Confidentiality

Consent

Risk of Breach in Data Security

SECTIONS

sections

Resources

REFERENCES

current section :

Data Sharing Concerns

Citation: