Data Sharing and Embedded Research
Section 6
Incentive Structure and Citations for Data Sets
Increased data sharing is expected to bolster scientific advancement and research integrity; however, the incentive structure for academic researchers is designed to reward publication in scholarly journals, not the creation of data sets that can be shared and re-used to generate new knowledge. Some have suggested changing the incentive structure to recognize that the generation of data that others use for secondary research is a valuable scientific contribution. (Pierce et al. 2019; Popkin 2019). We note that investigators may need to devote considerable effort to annotating data sets and analytic programs in a way that makes publicly available data sets sufficiently easy for others to use. Providing financial resources to support this effort can address part of this need. However, true success will require shifting the paradigm from simply requiring data sharing to creation of incentives for investigators to want their data sets to gain wider use.
One way to do this will be for universities to revise their appointment, promotion, and tenure (APT) process to incorporate effective data sharing into the decision-making and recognize creators of data sets that gain meaningful use by others (Hernandez 2019). However, in order to accomplish this, a well-defined system for linking researchers to their data is needed for citing data sets so that academic researchers can get credit for their work (Pierce et al. 2019). In a recent article, “Credit Generators for Data Re-use,” Pierce et al. depict a mechanism of linking a persistent identifier to an author’s ORCHID ID, and the digital object identifier (DOI) of the published article, to ensure appropriate credit in a “virtuous cycle”.
Figure from Pierce et al. Nature 2019. Used with permission.
The infrastructure for sharing data should ensure that data are cited properly, and data management strategies that encourage making data sets “FAIR” (findable, accessible, interoperable, and reusable) (Wilkinson et al. 2016) have been endorsed by the US National Academies of Sciences, Engineering, and Medicine and the European Commission.
“If a system linked data sets to individuals and reliably tracked the subsequent uses of those data, would institutions incorporate these metrics into the promotion process?
“The answer is an unambiguous ‘yes’,” says Antony Rosen, vice-dean for research at Johns Hopkins School of Medicine in Baltimore, Maryland. “Having an objective method to assess the uses of data would give faculty additional ways to communicate the contributions of their work.”—from Pierce et al. 2019
How to attach a DOI to a data set:
Digital object identifiers (DOIs) are unique, persistent identifiers that can be attached to data sets or other objects. These persistent identifiers can be cited in order to give credit for the creation of the data set.
DOIs are essentially a permanent name of an entity (or object) on a digital network that does not change even when the location (or URL) or other characteristics change.
To assign a DOI to a clinical data set (or other object), an individual should:
1. Deposit the data set in an appropriate data repository, which can include public or private enclaves or archives, as described in the section Data Sharing Solutions for Embedded Research. The journal Scientific Data also provides a list of public repositories for clinical data.
2. Acquire a URL through the data repository for the data set and assemble the metadata.
3. Contact a registration agency appropriate for the domain of data to be shared. For clinical data sets, registration agencies include Figshare, Zenodo, CrossRef, or Dryad, among others.
Anecdotally, the registration agency used to create DOIs for the Living Textbook chapters is CrossRef, an agency dedicated to the scholarly communication of research outputs.
SECTIONS
REFERENCES
Hernandez AF. 2019. Open Science: Are we there yet? [accessed 2020 Feb 12]. https://rethinkingclinicaltrials.org/news/august-9-2019-open-science-are-we-there-yet-adrian-hernandez-md/. NIH Collaboratory Grand Rounds
Pierce HH, Dev A, Statham E, Bierer BE. 2019. Credit data generators for data reuse. Nature. 570(7759):30–32. doi:10.1038/d41586-019-01715-4. PMID: 31164773
Popkin G. 2019. Data sharing and how it can benefit your scientific career. Nature. 569(7756):445–447. doi:10.1038/d41586-019-01506-x. PMID: 31081499.
Wilkinson MD, Dumontier M, Aalbersberg IjJ, et al. 2016. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data. 3(1):160018. doi:10.1038/sdata.2016.18. PMID: 26978244.