June 7, 2018: NIH Releases First Strategic Plan for Data Science

On June 4, the National Institutes of Health (NIH) released its first Strategic Plan for Data Science. The plan outlines steps the agency will take to modernize research data infrastructure and resources and to maximize the value of data generated by NIH-supported research.

Data science challenges for NIH have evolved and grown rapidly since the launch of the Big Data to Knowledge (BD2K) program in 2014. The most pressing challenges include the growing costs of data management, limited interconnectivity and interoperability among data resources, and a lack of generalizable tools to transform, analyze, and otherwise support the usability of data for researchers, institutions, industry, and the public.

The goals of the NIH Strategic Plan for Data Science are to:

  • support an efficient, effective data infrastructure by optimizing data storage, security, and interoperability;
  • modernize data resources by improving data repositories, supporting storage and sharing of individual data sets, and integrating clinical and observational data;
  • develop and disseminate both generalizable and specialized tools for data management, analytics, and visualization;
  • enhance workforce development for data science by expanding NIH’s internal data science workforce and supporting expansion of the national research workforce, and by engaging a broader community of experts and the general public in developing best practices; and
  • enact policies that promote stewardship and sustainability of data science resources.

As part of the implementation of the strategic plan, the NIH will hire a chief data strategist. For information about the position, see the job announcement.

March 21, 2018: Dr. Rob Califf to Speak on Data Science at March 23 Grand Rounds

Robert Califf, MD, former FDA Commissioner and current Vice Chancellor for Health Data Science at Duke University School of Medicine, will present at NIH Collaboratory Grand Rounds on Friday, March 23 at 1 pm ET. The webinar will be broadcast live and is open to the public. Following the presentation, Dr. Califf will answer questions from the Grand Rounds audience.

As Director of Duke Forge, Duke’s interdisciplinary center for actionable health data science, Dr. Califf is currently working on initiatives designed to harness biostatics, machine learning, and sophisticated informatics approaches to improve health and healthcare. Dr. Califf is also an adjunct professor of medicine at Stanford University and is employed by Verily Life Sciences as a scientific advisor. Verily, part of the Alphabet (Google) family of companies, is aimed at transforming the growth of health-related data into practical applications.

Dr. Califf has been a pioneer in the fields of clinical, translational, and outcomes research, and the NIH Collaboratory looks forward to hearing his thoughts on the pragmatic applications of data that will advance health and health care strategies and practice.

Topic: Data Science in the Era of Data Ubiquity

Date: Friday, March 23, 2018, 1:00-2:00 p.m. ET

Meeting Info: To check whether you have the appropriate players installed for UCF (Universal Communications Format) rich media files, go to https://dukemed.webex.com/dukemed/systemdiagnosis.php.

To join the online meeting:
Go to https://dukemed.webex.com/dukemed/j.php?MTID=m1a4a0665a615ae0382440edecedbdd33

November 17, 2017: New Video in Living Textbook Explores Data Sharing and Embedded Research

As part of an article published in Annals of Internal Medicine, Dr. Greg Simon created a short video in which he describes concerns related to data sharing and embedded research, as well as potential solutions for those concerns. We recently added this video to the Living Textbook chapter on Data Sharing and Embedded Research. In the chapter, the authors expand on the ideas presented in the Annals article and fame them using lessons learned from the Collaboratory’s Demonstration Projects. Data collected as part of research embedded in a health system comes from a fundamentally different context than stand-alone explanatory trials. When they are taken out of context or used for comparisons, they have the potential to do harm—something that can potentially discourage health systems from volunteering to participate in embedded research. The authors suggest that data sharing plans for embedded research be developed in partnership with health system leaders in ways that maximize the amount of data that can be shared while protecting patient privacy and healthcare system interests.

“Ultimately, it’s a practical question: if we want healthcare providers and healthcare systems to participate in research, we shouldn’t expect them to bear extra risk. In an ideal world, all information about the quality of health care and healthcare outcomes across the country would be completely open to everyone, but we don’t live in that world now. So if we are asking healthcare providers and healthcare systems to open up and be more transparent by participating in research, we certainly would not want to punish those who volunteer.” — Simon et al. in video for Ann Intern Med


Simon G, Coronado G, DeBar L, et al. Data Sharing and Embedded Research: Introduction. In: Rethinking Clinical Trials: A Living Textbook of Pragmatic Clinical Trials. Bethesda, MD: NIH Health Care Systems Research Collaboratory. Available at: http://rethinkingclinicaltrials.org/data-share-top/data-sharing-and-embedded-research-introduction/. Updated November 13, 2017.

October 3, 2017: New Collaboratory Article Explores Data Sharing and Embedded Research

In an article published in Annals of Internal Medicine, authors from the NIH Collaboratory describe concerns and solutions regarding data sharing and embedded research. Pragmatic research embedded in health systems uses data from the electronic health record and comes from a fundamentally different context than explanatory trials, which collect research-specific data. Data from embedded research have the potential to do harm if taken out of context or used for comparisons. Therefore, while the authors enthusiastically support data sharing, they also recognize that mandating data sharing may discourage health systems from volunteering to participate in embedded research.

“In an ideal world of transparency regarding healthcare processes and outcomes, health systems would have no expectation of or need for privacy regarding quality of health care delivery.  But the current world is not perfect, and unintentional disclosures from participation in embedded research could be far greater than that required for public quality measures. Health systems volunteering to participate in research to improve public health may not be willing to bear the additional risk of misuse of sensitive information.” — Simon et al. Ann Intern Med

The authors use examples from the NIH Collaboratory Demonstration Projects to illustrate potential solutions, and emphasize that data sharing plans for embedded research should be developed in partnership with health system leaders in ways that maximize the amount of data that can be shared while protecting patient privacy and healthcare system interests.

Journal Editors Propose New Requirements for Data Sharing

On January 20, 2016, the International Committee of Medical Journal Editors (ICMJE) published an editorial in 14 major medical journals in which they propose that clinical researchers must agree to share the deidentified data set used to generate results (including tables, figures, and appendices or supplementary material) as a condition of publication in one of their member journals no later that six months after publication. By changing the requirements for manuscripts they will consider for publication, they aim to ensure reproducibility (independent confirmation of results), foster data sharing, and enhance transparency. To meet the new requirements, authors will need to include a plan for data sharing as a component of clinical trial registration that includes where the data will be stored and a mechanism for sharing the data.

Evolving Standards for Data Reporting and Sharing

As early as 2003, the National Institutes of Health published a data sharing policy for research funded through the agency, stipulating that “Data should be made as widely and freely available as possible while safeguarding the privacy of participants, and protecting confidential and proprietary data.” Under this policy, federally funded studies receiving over $500,000 per year were required to have a data sharing plan that describes how data will be shared, that shared data be available in a usable form for some extended period of time, and that the least restrictive method for sharing of research data is used.

In 2007, Congress enacted the Food and Drug Administration Amendments Act. Section 801 of the Act requires study sponsors to report certain kinds of clinical trial data within a specified interval to the ClinicalTrials.gov registry, where it is made available to the public. Importantly, this requirement applied to any study classified as an “applicable clinical trial” (typically, an interventional clinical trial), regardless of whether it was conducted with NIH or other federal funding or supported by industry or academic funding. However, recent academic and journalistic investigations have demonstrated that overall compliance with FDAAA requirements is relatively poor.

In 2015, the Institute of Medicine (now the National Academy of Medicine) published a report that advocates for responsible sharing of clinical trial data to strengthen the evidence base, allow for replication of findings, and enable additional analyses. In addition, these efforts are being complemented by ongoing initiatives aimed at widening access to clinical trial data and improving results reporting, including the Yale University Open Data Access project (YODA), the joint Duke Clinical Research Institute/Bristol-Myers Squibb Supporting Open Access to clinical trials data for Researchers initiative (SOAR), and the international AllTrials project.

Responses to the Draft ICMJE Policy

The ICMJE recommendations are appearing in the midst of a growing focus on issues relating to the integrity of clinical research, including reproducibility of results, transparent and timely reporting of trial results, and facilitating widespread data sharing, and the release of the draft policy is amplifying ongoing national and international conversations taking place on social media and in prominent journals. Although many researchers and patient advocates have hailed the policy as timely and needed, others have expressed concerns, including questions about implementation and possible unforeseen consequences.

The ICMJE is welcoming feedback from the public regarding the draft policy at www.icmje.org and will continue to collect comments through April 18, 2016.


Journal editors publish editorial in 14 major medical journals stipulating that clinical researchers must agree to share a deidentified data set: Sharing clinical trial data: A proposal from the International Committee of Medical Journal Editors (Annals of Internal Medicine version). January 20, 2016.

A New England Journal of Medicine editorial in which deputy editor Dan Longo and editor-in-chief Jeffrey Drazen discuss details of the ICJME proposal: Data sharing. January 21, 2016.

A follow-up editorial in the New England Journal of Medicine by Jeffrey Drazen: Data sharing and the Journal. January 25, 2016.

Editorial in the British Medical Journal: Researchers must share data to ensure publication in top journals. January 22, 2016.

Commentary in Nature from Stephan Lewandowsky and Dorothy Bishop: Research integrity: Don’t let transparency damage science. January 25, 2016.

National Public Radio interview on Morning Edition: Journal editors to researchers: Show everyone your clinical data with Harlan Krumholz. January 27, 2016.

Institute of Medicine (now the National Academy of Medicine) report advocating for responsible sharing of clinical trial data: Sharing clinical trial data: maximizing benefits, minimizing risk. National Academies Press, 2015.

Rethinking Clinical Trials Living Textbook Chapter, Acquiring and using electronic health record data, which describes the use of data collected in clinical practice for research and the complexities involved in sharing data. November 3, 2015.

NIH Health Care Systems Research Collaboratory data sharing policy. June 23, 2014.

List of International Committee of Medical Journal Editors (ICMJE) member journals.

New Living Textbook Chapter on Acquiring and Using Electronic Health Record Data for Research

Topic ChaptersMeredith Nahm Zozus and colleagues from the NIH Collaboratory’s Phenotypes, Data Standards, and Data Quality Core have published a new Living Textbook chapter about key considerations for secondary use of electronic health record (EHR) data for clinical research.

In contrast to traditional randomized controlled clinical trials where data are prospectively collected, many pragmatic clinical trials use data that were primarily collected for clinical purposes and are secondarily used for research. The chapter describes the steps a prospective researcher will take to acquire and use EHR data:

  • Gain permission to use the data. When a prospective researcher wishes to use data, a data use agreement (DUA) is usually required that describes the purpose of the research and the proposed use of the data. This section also describes use of de-identified data and limited data sets.
  • Understand fundamental differences in context. Data collected in routine care settings reflect standard procedures at an individual’s healthcare facility, and are not collected in a standard, structured manner.
  • Assess the availability of health record data. Few assumptions can be made about what is available from an organization’s healthcare records; up-front, detailed discussions about data element collection over time at each facility is required.
  • Understand the available data. A secondary data user must understand both the data meaning and the data quality; both can vary greatly across organizations and affect a study’s ability to support research conclusions.
  • Identify populations and outcomes of interest. Because healthcare facilities are obligated to provide only the minimum necessary data to answer a research question, investigators must identify the needed patients and data elements with specificity and sensitivity to answer the research question given the available data.
  • Consider record linkage. Studies using data from multiple records and sources will require matching data to ensure they refer to the correct patient.
  • Manage the data. The investigator is responsible for receiving, managing, and processing data and must demonstrate that the data are reproducible and support research conclusions.
  • Archive and share the data after the study. Data may be archived and shared to ensure reproducibility, enable auditing for quality assurance and regulatory compliance, or to answer other questions about the research.

In Nature: The Precision Medicine Initiative & DNA Data Sharing

A recent article in Nature highlights the Precision Medicine Initiative, launched in January 2015 and spearheaded by the National Institutes of Health. Precision medicine is an emerging approach for disease treatment and prevention that takes into account individual variability in genes, environment, and lifestyle for each person. This initiative will involve collection of data on genomes, electronic health records, and physiological measurements from 1 million participants. A main objective is for participants to be active partners in research.

But a major decision faced by the initiative’s working group is how much information to share with participants about disease risk, particularly genetic data. Though there is much debate in the field, the article suggests that public opinion on data sharing may be shifting toward openness.

The Precision Medicine Initiative working group will be releasing a plan soon. For details on the goals of the Precision Medicine Initiative, read the perspective by NIH Director Dr. Francis Collins in the New England Journal of Medicine.


Task Force Releases Recommendations for National Medical Device Evaluation System

A new report (PDF) containing recommendations for the creation of a national registry system for evaluating and monitoring medical devices has been released for public comment today. The report, a joint project of the Medical Device Registry Task Force and cover_19aug2015 the Medical Device Epidemiology Network (MDEpiNet), is available on boh the US Food and Drug Administration (FDA) website and on  the MDEpiNet website.

The report reflects the results of a year-long effort, prompted by the FDA’s Center for Devices and Radiological Health (CDER), that  is focused on fostering a national system for monitoring the use of medical devices in the “real-world” setting of patient care, once the devices have been approved for the market (known as “postmarket surveillance”).

The term “medical devices” encompasses a wide range of technologies, including implantable pacemakers, cardiovascular stents, robotic surgical devices, and artificial joint replacements, among many others. At present, information about the use of these devices in routine care settings, including safety issues reported by doctors and patients, is collected in a variety of registries and health record systems. A  networked national system, such as the one described in the task force report, would be able to unite and build upon both existing and novel data resources, thereby improving safety monitoring and accelerating the development of new devices:

“Task Force recommendations for [Coordinated Registry Network] CRN architecture, and thus for the National System, center on leveraging existing, self sustaining electronic resources, such as device registries, electronic health records, administrative data and even social media and personal mobile device sources.”

The Task Force Report offers recommendation in several key areas, including:

  • Establishing a national dialog about medical device evaluation that includes all stakeholders;
  • Leveraging existing efforts in the arena of device registries and electronic data systems;
  • Describing the desired characteristics of a national Coordinated Registry Network (CRN) for medical devices;
  • Outlining priorities for developing and refining medical devices in multiple therapeutic areas;
  • Identifying and improving methods for analyzing data on medical devices; and
  • Addressing network governance and issues related to patient privacy and informed consent.

Each of these key areas also features suggested pilot projects designed to inform ongoing efforts.

A related perspective article summarizing the National Registry System project has also been published online in the Journal of the American Medical Association.

Related Links

ClinicalTrials.gov Analysis Dataset Available from CTTI

Tools for ResearchAs part of a project that examined the degree to which sponsors of clinical research are complying with federal requirements for the reporting of clinical trial results, the Clinical Trials Transformation Initiative (CTTI) and the authors of the study are making the primary dataset used in the analysis available to the public. The full analysis dataset, study variables, and data definitions are available as Excel worksheets from the CTTI website and on the Living Textbook’s Tools for Research page.

Study Examines Public Attitudes Toward Data-Sharing Networks

A new study examining public attitudes about the sharing of personal medical data through health information exchanges and  distributed research networks finds a mixture of receptiveness and concerns about privacy and security. The study, conducted by researchers from the University of California, Davis and University of California, San Diego and published online in the Journal of the American Medical Informatics Association (JAMIA), reports results from a telephone survey of 800 California residents. Participants were asked for their opinions about the importance of sharing personal health data for research purposes and their feelings about related issues of security and privacy, as well as the importance of notification and permission for such sharing.

The authors found that a majority of respondents felt that sharing health data would “greatly improve” the quality of medical care and research. Further, many either somewhat or strongly agreed that the potential benefits of sharing data for research and care improvement outweighed privacy considerations (50.8%) or the right to control the use of their personal information (69.8%), although study participants also indicated that transparency regarding the purpose of any data sharing and controlling access to data remained important considerations.

However, the study’s investigators also found evidence of widespread concern over privacy and security issues, with substantial proportions of respondents reporting a belief that data sharing would have negative effects on the security (42.5%) and privacy (40.3%) of their health data. The study also explored attitudes about the need to obtain permission for sharing health data, as well as whether attitudes toward sharing data differed according to the purpose (e.g., for research vs. care) and the groups or individuals among which the data were being shared.

The authors note that while data-sharing networks are increasingly viewed as a crucial tool for enabling research and improving care on a national scale, they ultimately rely upon trust and acceptance from patients. As such, the long-term success of efforts aimed at building effective data-sharing networks may depend on accurately understanding the views of patients and accommodating their concerns.

Read the full article here: 

Kim KK, Joseph JG, Ohno-Machado L. Comparison of consumers' views on electronic data sharing for healthcare and research. J Am Med Inform Assoc. 2015 Mar 30. pii: ocv014. doi: 10.1093/jamia/ocv014. [Epub ahead of print]