Data Sharing and Embedded Research
Section 7
Preparing for Data Sharing
Investigators need to consider and prepare for data sharing throughout the ePCT lifecycle—from writing to the grant through publication of results and sharing data sets—all of which can take time and resources (Li and Rockhold 2019).
- Grant submission: The Draft NIH Policy for Data Management and Sharing and supplemental draft guidance proposes that applicants for research funding submit a plan describing how scientific data will be managed and shared.
- Trial registration: After funding has been awarded, investigators will be asked to provide a data sharing statement on ClinicalTrials.gov as part of trial registration (Taichman et al. 2016).
- Conduct: During the data collection process, continuous data curation, cleaning, and preparing for data sharing would have a profound effect on the evidence generated from ePCTs and help speed the dissemination process.
- Dissemination: Investigators will be asked by medical journals to provide a data sharing statement as a condition of publication (Taichman et al. 2017)
- Sharing of data: After trial completion, data will be need to be shared in a repository using a mechanism to promote re-use and proper citation of the data (Pierce et al. 2019).
To help investigators think through the considerations for their data sharing plans and statements, NIH Collaboratory Trials are given a Data and Resource Sharing Informational Document and an Onboarding Data and Resource Sharing Questionnaire during the onboarding process. Upon closeout, NIH Collaboratory Trials are provided a Closeout Data and Resource Sharing Checklist and are expected to utilize this checklist to provide a final data share package, which is shared on the Living Textbook Resources page.
There are companies, such as Vivli, that can support the data sharing process at all of the various stages, as well as make data available for requests. For more information, see the Grand Rounds, Preparing for Clinical Trial Data Sharing and Re-use: The New Reality for Researchers.
When preparing for data sharing, investigators should understand the unique aspects of sharing data from research that uses healthcare system data from embedded research. NIH Collaboratory leadership and NIH Collaboratory Trials principal investigators, along with their colleagues, highlighted these considerations when responding to the draft policy on data sharing.
The main topics covered in the response are:
- Assessing and mitigating re-identification risk: Embedded pragmatic research occurs in a different context than traditional research. It uses routinely collected data from electronic health records and claims databases, and may involve detailed data on large populations, often including hundreds of thousands of patients. In many cases, these studies are conducted with waiver of informed consent. Before sharing data, investigators may need to do more than simply remove or alter explicit identifiers; they may also need to remove or alter data elements that could enable re-identification through data linkage.
- Protecting secondary subjects: Embedded pragmatic trials require different considerations to protect the privacy and confidentiality of those involved, who include not only the participants in the trial, but also friends and family members of participants, providers, healthcare systems, and members of vulnerable classes.
- Use of data enclaves: Health systems are often voluntary participants in embedded research with the goal of answering specific questions. They may not be willing to bear the risk for use of sensitive organizational information to address unrelated topics. Their providers are often unable to opt out of embedded research in which their delivery system participates. The potential for disclosure of sensitive information regarding providers or health systems could be substantial, with commensurate harm. Data archives and enclaves are acceptable data sharing mechanisms in routine use that can help mitigates these risks. The Centers for Medicare and Medicaid Services Virtual Research Data Center is an example of a research enclave. It permits investigators to conduct research on approved topics by working with the data in the enclave, and only aggregated data can be removed from the enclave. This has proven to provide a good balance between access and protection of patients’ privacy.
- Credit those who share data: As stated Credit Data Generators for Data Re-use, we need to develop and mandate the use of a data set ID that will link the use and published analysis from a data set back to the original researchers (Pierce et al. 2019).
Other signatories include participants in the National Academy of Medicine’s Clinical Effectiveness Research Innovation Collaborative of the Leadership Consortium for Value and Science-Driven Health Care, and leaders of the Health Care Systems Research Network.
The full letter is available for download and includes the list of signatories.
SECTIONS
REFERENCES
Li R, Rockhold F. 2019. Preparing for Clinical Trial Data Sharing and Re-use: The New Reality for Researchers. https://rethinkingclinicaltrials.org/news/september-27-2019-preparing-for-clinical-trial-data-sharing-and-re-use-the-new-reality-for-researchers-rebecca-li-phd-frank-rockhold-phd/. NIH Collaboratory Grand Rounds.
Pierce HH, Dev A, Statham E, Bierer BE. 2019. Credit data generators for data reuse. Nature. 570(7759):30–32. doi:10.1038/d41586-019-01715-4. PMID: 31164773.
Taichman DB, Backus J, Baethge C, et al. 2016. Sharing clinical trial data: a proposal from the International Committee of Medical Journal Editors. Ann Intern Med. 164(7):505. doi:10.7326/M15-2928. PMID: 26792258.
Taichman DB, Sahni P, Pinborg A, et al. 2017. Data sharing statements for clinical trials: a requirement of the International Committee of Medical Journal Editors. Lancet. doi: 10.1016/S0140-6736(17)31282-5. PMID: 28596041.