The contributors to this chapter initially wrote an opinion piece for Annals of Internal Medicine (Simon et al. 2017) on data sharing. In this chapter, we expand on the ideas presented there and frame them using lessons learned from the Collaboratory.

Video originally published in Annals of Internal Medicine (Simon et al. 2017) and used with permission.

Emerging policies and procedures for sharing analyzable research datasets hold great promise for increasing transparency and reproducibility in medical research. Expanded use of the data can increase the knowledge base through secondary analyses, decrease selective reporting, and lead to improvements in clinical care (Institute of Medicine 2015; Krumholz et al. 2016; Warren 2016; NIH Data Sharing Policy 2015). Enabling the responsible sharing of data is a global priority and a number of solutions have been proposed, including by the National Academy of Medicine (2015), the International Committee of Medical Journal Editors (ICMJE; Taichman et al. 2016) and the National Institutes of Health. In Europe, access to data from industry sponsored trials has increased markedly, and there are encouraging programs in the U.S., such as the Yale University Open Data Access (YODA) partnership with Medtronic (Krumholz et al. 2013), the Academic Research Organization Consortium for Continuing Evaluation of Scientific Studies — Cardiovascular (ACCESS CV 2016) the Supporting Open Access to Researchers (SOAR) Initiative (Pencina et al. 2016), and the OptumLabs healthcare industry collaborative research and innovation center.

While we enthusiastically support data sharing, the conceptual framework for it is rooted in individually randomized clinical trials (RCTs) with participants’ explicit informed consent, which can include authorization for data sharing. Pragmatic research embedded in health systems is different from conventional trials: it often involves a waiver of patient consent, uses data from the electronic health record (EHR), and often includes information that could identify patients, health care providers, and health care facilities or organizations. In some cases, the primary data for enrolled patients includes every encounter, medication, and procedure. As we describe in the Annals article, even if study data would not allow identification of individual participants, the potential for disclosure of sensitive information regarding providers or health systems may still be substantial. These data have the capacity to do harm if taken out of context, used inappropriately or for comparative purposes, or to single out an individual, provider or institution. Healthcare systems voluntarily participate in embedded research and have raised these concerns about releasing information from electronic health records. Their specific concern is that health systems or facilities volunteering to participate in research might be penalized by release of detailed operational information that others are not required to make public. Participation in public-domain research is distinct from health systems’ participation in public quality reporting programs, where measures are standardized and public comparison of providers or facilities is either required or a clear expectation of multiple organizations.

Because of the unique concerns of clinicians and health care systems participating in embedded research, a requirement to share data using mechanisms designed for conventional, individually randomized trials will be challenging and might dissuade some healthcare systems from participating, thereby reducing opportunities to answer important scientific and healthcare questions using data acquired from clinical health care delivery.

Embedding clinical trials in healthcare systems as part of the delivery of care could improve the speed, quality, and cost of research, but it is not currently necessary (or required) for the systems to participate, and it often imposes opportunity costs that can distract from operational priorities. Although the health care systems sometimes derive direct benefit from participation in research (e.g., when there is congruence with prioritized quality improvement efforts), their principal motive is typically an altruistic one—to contribute to the knowledge base about the relative benefits, risks, and burdens of treatments. In this respect, health system participants in pragmatic research are similar to individual participants in conventional clinical trials. For individuals who participate in clinical research, researchers offer guarantees through the informed consent process that sensitive information will not be misused and ensure that individual protected health information is not exposed through trial activities or data sharing. In the same way, pragmatic research needs to consider the specific confidentiality concerns of participating health care providers or systems and to identify appropriate processes and technical structures for data sharing. We use experiences from the NIH Collaboratory, which supports embedded clinical trials that address major national priorities, to explore the most important concerns and potential solutions in this chapter.




back to top

Institute of Medicine. 2015. Sharing Clinical Trial Data: Maximizing Benefits, Minimizing Risk. Washington, D.C: National Academies Press.

Krumholz HM, Ross JS, Gross CP, et al. 2013. A historic moment for open science: the Yale University Open Data Access project and medtronic. Ann Intern Med. 158:910–911. doi:10.7326/0003-4819-158-12-201306180-00009. PMID:23778908.

Krumholz HM, Terry SF, Waldstreicher J. 2016. Data acquisition, curation, and use for a continuously learning health system. JAMA. 316:1669–1670. doi:10.1001/jama.2016.12537. PMID:27668668.

Pencina MJ, Louzao DM, McCourt BJ, et al. 2016. Supporting open access to clinical trial data for researchers: The Duke Clinical Research Institute–Bristol-Myers Squibb Supporting Open Access to Researchers Initiative. Am Heart J. 172:64–69. doi:10.1016/j.ahj.2015.11.002. PMID:26856217.

Simon GE, Corondo G, DeBar LL, et al. 2017. Data Sharing and Embedded Research. Ann Intern Med. doi:10.7326/M17-0863. PMID:28973353.

Taichman DB, Backus J, Baethge C, et al. 2016. Sharing clinical trial data: a proposal from the International Committee of Medical Journal Editors. Ann Intern Med. 164:505. doi:10.7326/M15-2928. PMID:26792258.

The Academic Research Organization Consortium for Continuing Evaluation of Scientific Studies — Cardiovascular (ACCESS CV). 2016. Sharing data from cardiovascular clinical trials — a proposal. New Engl J Med. 375:407–409. doi:10.1056/NEJMp1605260. PMID:27518659.

Warren E. 2016. Strengthening research through data sharing. New Engl J Med. 375:401–403. doi:10.1056/NEJMp1607282. PMID:27518656.

Version History

Published August 25, 2017


Simon G, Coronado G, DeBar L, et al. Data Sharing and Embedded Research: Introduction. In: Rethinking Clinical Trials: A Living Textbook of Pragmatic Clinical Trials. Bethesda, MD: NIH Health Care Systems Research Collaboratory. Available at: Updated July 25, 2019. DOI: 10.28929/068.