Skip to content

COVID-19 Resources

Access the latest information on COVID-19 for clinical researchers
  • Home
  • About
    • NIH Collaboratory
      • Coordinating Center
      • NIH Collaboratory Trials
      • Core Working Groups
      • Steering Committee
      • Distributed Research Network
      • Our Impact
    • Living Textbook
      • Table of Contents
      • How to Use This Site
  • Resources
    • Data and Resource Sharing
    • Training Resources
    • Tools for Researchers
    • Publications
    • Knowledge Repository
  • Webinar
  • Podcast
  • News
    • News Feed
    • Calendar
    • Subscribe
return to home
Subscribe to Newsletter go to twitter feed go to linkedin go to blue sky feed
Search
NIH Collaboratory
Living Textbook of
Pragmatic Clinical Trials

COVID-19 Resources

Access the latest information on COVID-19 for clinical researchers
home button

Rethinking Clinical Trials

A Living Textbook of Pragmatic Clinical Trials

  • Design
    • What is a Pragmatic Clinical Trial?
    • Decentralized Pragmatic Clinical Trials
    • Developing a Compelling Grant Application
    • Experimental Designs and Randomization Schemes
    • Endpoints and Outcomes
    • Analysis Plan
    • Using Electronic Health Record Data
    • Building Partnerships and Teams to Ensure a Successful Trial
    • Intervention Delivery and Complexity
    • Patient Engagement
  • Data, Tools & Conduct
    • Assessing Feasibility
    • Acquiring Real-World Data
    • Assessing Fitness-for-Use of Real-World Data
    • Study Startup
    • Participant Recruitment
    • Monitoring Intervention Fidelity and Adaptations
    • Patient-Reported Outcomes
    • Clinical Decision Support
    • Mobile Health
    • Electronic Health Records–Based Phenotyping
    • Navigating the Unknown
  • Dissemination & Implementation
    • Data Sharing and Embedded Research
    • Dissemination Approaches for Different Audiences
    • Implementation
    • End-of-Trial Decision-Making
  • Ethics & Regulatory
    • Privacy Considerations
    • Identifying Those Engaged in Research
    • Collateral Findings
    • Consent, Disclosure, and Non-Disclosure
    • Data and Safety Monitoring
    • Ethical Considerations of Data Sharing in Pragmatic Clinical Trials
    • Ethics for AI and ML
    • IRB Responsibilities and Procedures

Data Sharing Solutions for Embedded Research

CHAPTER SECTIONS

Data Sharing and Embedded Research


Section 3


Data Sharing Solutions for Embedded Research

Expand Contributors
Gregory E. Simon, MD, MPH
Gloria Coronado, PhD
Lynn L. DeBar, PhD, MPH
Laura M. Dember, MD
Beverly Green, MD, MPH
Susan S. Huang, MD, MPH
Jeffrey G. Jarvik, MD, MPH
Vincent Mor, PhD
Joakim Ramsberg, PhD
Edward J. Septimus, MD
Miguel A. Vazquez, MD
William M. Vollmer, PhD
Douglas Zatzick, MD
Adrian F. Hernandez, MD, MHS
Richard Platt, MD, MS

Contributing Editor
Karen Staman, MS

The last few years have seen real progress in increasing openness (Ebrahim et al. 2014), and several methods, with varying degrees of restriction, transparency, and cost have been deployed (see table below), ranging from public release of data sets to private data enclaves with distributed research networks. These methods afford different levels of protection for health systems but also require different levels of support for implementation. We discuss these solutions below using the NIH Collaboratory Trials as examples.

 

Technical Structures for Data Sharing From Least Restrictive (and Least Expensive) to Most Restrictive (and Most Expensive)

Structure Description Additional elements Resource needs Example
Public archive
  • Analyzable data can obtained by any user for any use
  • No restriction on the kinds of research questions new users can address
  • May impose restrictions like prohibitions against re-identification or access to small cell counts
  • May de-identify certain elements, such as study site or demographics, or present sensitive data as an aggregate summary variable
  • Initial development and annotation
  • Maintenance and access costs

 

Agency for Healthcare Research and Quality (AHRQ) Healthcare cost and utilization project (HCUP)
Private archive

 

  • Analyzable data can be obtained by authorized users
  • Honest broker or the original owner of the data decides which uses to authorize
  • Requires binding agreement by recipient regarding protection and use of transferred data
  • As noted for public archive

 

  • As noted for public archive
  • Evaluation of requests
  • Execution of data sharing, data use, data transfer, and other agreements, including agreements covering data with full identifiers
  • Monitoring of compliance with agreements, and response to breach of agreements
Yale University Open Data Access (YODA) Project

Centers for Medicaid and Medicare (CMS) Limited Data Sets

National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) Central Repository

Public enclave
  • Any user may query the data, but not take possession of it. Only aggregate results may be removed from the enclave
  • No restriction on the kinds of questions users can address

 

  • May impose restrictions like prohibitions against re-identification, passing the data to other users, or access to small cell counts
  • May de-identify certain elements, such as study site or demographics

 

  • Initial development and annotation
  • Ongoing curation and governance
  • Creation and maintenance of informatics support for analyses, including software licenses and computational capabilities, and file storage
  • Personnel needed to ensure data quality, etc.
Centers for Medicaid and Medicare (CMS) Virtual Research Data Center (VRDC)
Private enclave
  • Similar to public enclave with regard to provisions for analyzing data without taking possession of it
  • Honest broker or the original owner of the data decides which uses to authorize
  • Moderated by an honest broker or by representatives of the study and/ or site (either queries or results)

 

  • As noted for public enclave
  • Additional resources to evaluate requests and supervise the conduct of approved studies
Food and Drug Administration (FDA) Sentinel Distributed Data Set

 

 

Public and Private Data Archives

With a data archive, data are annotated and de-identified as deemed necessary, and stored for later analyses by interested users. A publicaly available archive is the least restrictive and least expensive option for sharing data, and a number of Collaboratory trials have used this method (see table below). In most cases, some modification to or restriction of the full analytic dataset was necessary to protect the privacy of health systems or providers. For example, the Suicide Prevention Outreach Trial (SPOT) was developed to compare suicide attempt rates in patients who receive one of two suicide prevention strategies versus usual care. The investigators did not plan to include study site (health system) in the publicly available dataset given concerns by participating health systems that such data could be used for inappropriate comparisons of suicide attempt rates across health systems. A naïve analysis of these data could compare rates of suicide attempt across health systems without considering well-established variation by geographic region and race/ethnicity. In this context, a health system making extra efforts to engage higher-risk populations could paradoxically be shown to have high suicide rates. To facilitate the examination of variation in intervention effects across health systems, datasets including health system identifiers are available on request, following a supervised data archive, subject to specific agreements regarding use and re-disclosure. Because SPOT was randomized at the patient level, failure to account for study site in the released data set may lead to a mis-estimation of variance, but the data will still be of scientific and public health value.

As another example, the Collaboratory’s Pain Program for Active Coping and Training (PPACT) trial was developed to coordinate and integrate services for helping patients adopt self-management skills for chronic pain, limit use of opioid medications, and identify factors amenable to treatment in the primary care setting (Debar et al. 2012). The study was conducted at Kaiser Permanente in the Northwest, Georgia, and Hawaii regions thereby representing diversity of patients and healthcare systems. Because the trial was conducted in three distinct regions with different racial and ethnic distributions, release of demographic information would readily identify the regions and potentially the participating PCPs. Because participating health plans were concerned that naïve analyses of region-specific data could be used to conduct inappropriate or invalid comparisons of pain treatment and outcomes across various health systems, their data-sharing plan attempted to assure regional anonymity. Similarly, there were sensitivities about examination of individual clinician opioid prescribing patterns. Thus, such data was included only in an aggregated format. As such, the PPACT investigators created a public-release data archive that can be shared and that enables individuals to replicate, or at least closely replicate, the primary analysis. The public release dataset was expected to include anonymous patient and cluster identifiers, but no information on region or clinic facility.

The TiME and ICD-Pieces trials both used the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) Central Repository—a private archive—to share data. Use of this archive transfers the administrative, financial, and oversight responsibilities to NIDDK, substantially decreasing the burden of the investigators. The availability of more repositories for data sharing will help future investigators more effectively and efficiently share data.

Among national trauma care systems, there are incentives supporting data sharing in multisite pragmatic trials. For example, TSOS trial has a private archive and shares data with researchers whose aim is to impact future policy or effect clinical care in trauma centers nationwide. Although the data are housed in an archive, the team will consider an enclave approach—analyzing the data themselves and returning results—depending on the research question and potential ethical obligations not to reveal or receive sensitive data. Perhaps most importantly, the TSOS team was incentivized to share data and publish with teams at other U.S. trauma sites as part of the larger study goal of disseminating knowledge that will further American College of Surgeons practice guidelines for PTSD and comorbidity screening and intervention (Zatzick et al. 2016).

Public and Private Data Enclaves

A data enclave allows investigators to perform analyses without taking possession of the data. A public enclave allows any user to conduct research on any topic; for a private enclave an honest broker or the original owners of the data will determine appropriate use. Private enclaves may establish their own rules regarding users and uses of their data. The NIH Collaboratory’s ABATE trial used this type of private enclave for all primary and secondary analyses of trial data. All analyses were conducted behind Hospital Corporation of America's firewall using a supervised data enclave model to prevent misuse of data for comparative purposes. This model requires a data use agreement (DUA), and all data are de-identified. Other investigators, with approval, could reproduce the results if ever needed. This solution allows investigators to perform analyses without actually downloading the data themselves, but it is costly, and is in effect only for a finite period of time.

A distributed research network approach is a variation of a private data enclave and has been an important factor in obtaining the voluntary participation of health care organizations in public health activities and research for the public good. It allows the organizations to maintain physical and operational control over both their patients’ and their own confidential data. They thus can opt in to participation in a wide array of societally beneficial programs without concern that they are putting their data at risk of other uses. The Patient-Centered Outcomes Research Network (PCORnet) and FDA’s Sentinel System are examples of a network of users where each participating site holds their data behind their firewall, but can make them available (opt-in) through a distributed research network for approved queries.

Limitations of These Solutions

All of these data sharing mechanisms have drawbacks. Greater control inevitably involves greater expense because of the added leadership, legal, statistical, and information technology resources required. Further, when sensitive health system characteristics are important potential confounders, the least restrictive and least expensive methods are also often the least useful, because the data that can be shared with no restriction will lack information needed to replicate the primary analysis, or to address some additional questions. The most restrictive—an enclave controlled by the original data owner—does not guarantee access. All of these solutions incur meaningful costs for annotation of the data, and all but public archives require ongoing support for oversight. Enclaves also incur substantial ongoing costs for oversight and for maintaining a computing environment that can support analyses.

In this JAMA Viewpoint article, NIH Collaboratory investigator Dr. Richard Platt and colleague Dr. Tracy Lieu discuss the value of data enclaves to facilitate information sharing in support of research, quality improvement, and public health reporting (Platt and Lieu 2018).

“Data enclaves address 2 major barriers to data sharing. First, they allow health systems to protect patients’ interests and their own by maintaining physical and operational control, permitting the systems to opt in or out of proposed analyses. Second, they obviate the need to build new secure systems. (Platt and Lieu. 2018)”

 

Collaboratory Data Sharing Plans (Assumes HIPAA-Compliant Patient De-identification for All Patients and a Data Use Agreement Where Appropriate)

Study name Risks to providers or health systems Data sharing structure Steps to mitigate risks to providers or health systems
PPACT Pain Program for Active Coping and Training Data on opioid prescribing patterns could be misused for inappropriate comparisons of providers or facilities. Public archive of a modified dataset Public-use dataset does not include facility or health system identifiers, characteristics or prescribing/referral practices of individual providers, or patient-level data on race or ethnicity.
STOP CRC Strategies and Opportunities to Stop Colon Cancer in Priority Populations Data on screening rates could be misused for inappropriate or biased comparisons of performance across clinics or inaccurate comparisons with public quality measures. Private archive managed by study team De-identified patient-level data are available, with permissions and data use agreements in place. Data use agreements are limited to specific research uses and require destruction after authorized analyses are completed.

 

SPOT Suicide Prevention Outreach Trial Data on suicide attempt rates could be used for biased or inappropriate comparisons of suicide attempts or suicide mortality across health systems. Public archive of a modified dataset Public-use dataset does not include indicator for health system.

 

TiME Time to Reduce Mortality in End-Stage Renal Disease Data regarding mortality could be misused for inappropriate or biased comparisons of facilities or healthcare systems. Detailed data regarding patterns of care could reveal proprietary business information. Private archive managed by NIDDK De-identified patient-level data were aggregated across provider organizations and stored at the NIDDK Central Repository. Facility identifiers, dialysis provider organization identifiers, and data elements that were unique to one of the dialysis providers were removed. Data are made available through formal request and a data use agreement between the requestor and the NIDDK.
PROVEN Pragmatic Trial of Video Education in Nursing Homes Data regarding mortality could be misused for inappropriate or biased comparisons of participating facilities or systems. Data regarding admissions and discharges could reveal proprietary business information. Public archive of aggregate-level dataset Public-use dataset includes facility-level aggregate data, with restrictions to prevent re-identification of participating facilities.

 

 

 

LIRE Lumbar Image Reporting with Epidemiology Data regarding treatment patterns and resource use could be used for inappropriate or biased comparisons across health systems and could reveal proprietary health system business information. Private archive managed by study team Patient-level datasets were de-identified by health systems, clinics, providers, and patients. Investigators  authorize release to specific users for specific purposes.

 

 

ABATE Active Bathing to Eliminate Infection Data regarding infection rates could be used for inappropriate comparisons of facilities or with public reports. Detailed information regarding facilities and utilization patterns could reveal proprietary business information. Private enclave managed by study team Potential users may propose specific queries. Only query results (not individual data) will be shared.
ICD-Pieces Improving Chronic Disease management with Pieces Data regarding patterns of care could be used for biased or inappropriate comparisons across facilities or health systems. Given different specifications, comparison to publicly reported quality measures would be misleading. Private archive managed by NIDDK Patient-level data were de-identified and stored in aggregate database. Identifiers for healthcare system, primary practice and patients were removed. Use of aggregate dataset is governed by authorized agreements with NIDDK.
TSOS Trauma Survivors Outcomes and Support Data regarding baseline patient characteristics and study outcomes could be used for biased or inappropriate comparisons of care in participating facilities. Private archive managed by study team De-identified patient level data are provided, with priority given to research that affects trauma care systems nationwide and Collaboratory investigators.

 

 

 

Previous Section Next Section

SECTIONS

CHAPTER SECTIONS

sections

  1. Introduction
  2. Data Sharing Concerns
  3. Data Sharing Solutions for Embedded Research
  4. Patient Perspectives on Data Sharing
  5. Data-sharing Policy at the NIH, Collaboratory, and HEAL
  6. Incentive Structure and Citations for Data Sets
  7. Preparing for Data Sharing
  8. Moving Forward
  9. Additional Resources
  10. FAQ

Resources


Introduction to the NIMH Data Archive
Presentation from the BackInAction trial with an overview of the basics and the steps involved in using the NIMH Data Archive.

REFERENCES

back to top

Debar LL, Kindler L, Keefe FJ, et al. 2012. A primary care-based interdisciplinary team approach to the treatment of chronic pain utilizing a pragmatic clinical trials framework. Transl Behav Med. 2:523–530. doi:10.1007/s13142-012-0163-2. PMID:23440672.

Ebrahim S, Sohani ZN, Montoya L, et al. 2014. Reanalyses of randomized clinical trial data. JAMA. 312:1024–1032. doi:10.1001/jama.2014.9646. PMID: 25203082.

back to top

 

Platt R, Lieu T. Data Enclaves for Sharing Information Derived From Clinical and Administrative Data. 2018;320:753. doi: 10.1001/jama.2018.9342.PMID:30083726

Zatzick DF, Russo J, Darnell D, et al. 2016. An effectiveness-implementation hybrid trial study protocol targeting posttraumatic stress disorder and comorbidity. Implement Sci. 11:58. doi:10.1186/s13012-016-0424-4. PMID:27130272.

 

back to top


Version History

September 25, 2025: Added resources box with NIMH Data Archive presentation (changes made by G. Uhlenbrauck).

March 9: Updated to make descriptions of the trials past tense (changes made by K. Staman).

December 5, 2018: Updated and added reference as part of annual review (change made by K. Staman).

Published August 25, 2017

current section :

Data Sharing Solutions for Embedded Research

  1. Introduction
  2. Data Sharing Concerns
  3. Data Sharing Solutions for Embedded Research
  4. Patient Perspectives on Data Sharing
  5. Data-sharing Policy at the NIH, Collaboratory, and HEAL
  6. Incentive Structure and Citations for Data Sets
  7. Preparing for Data Sharing
  8. Moving Forward
  9. Additional Resources
  10. FAQ

Citation:

Simon G, Coronado G, DeBar L, et al. Data Sharing and Embedded Research: Data Sharing Solutions for Embedded Research. In: Rethinking Clinical Trials: A Living Textbook of Pragmatic Clinical Trials. Bethesda, MD: NIH Pragmatic Trials Collaboratory. Available at: https://rethinkingclinicaltrials.org/chapters/dissemination/data-share-top/data-sharing-solutions-for-embedded-research/. Updated December 3, 2025. DOI: 10.28929/070.

Footer Menu

  • How to Use This Site
  • About NIH Collaboratory
  • Enrollment Reporting
  • Grand Rounds
  • Funding Statement
Link to Twitter Link to LinkedIn Link to Blue Sky Link to NIH Collaboratory email

Reference in this Web site to any specific commercial products, process, service, manufacturer, or company does not constitute its endorsement or recommendation by the U.S. Government or National Institutes of Health (NIH). NIH is not responsible for the contents of any “off-site” Web page referenced from this server.

Log in
Privacy Statement
WordPress is a content management system and should not be used to upload any PHI as it is not an environment for which we exercise oversight, meaning you the author are responsible for the content you post. Please use this system accordingly. Site Map