June 7, 2019: In Dreams Begin Responsibilities: Data Science as a Service—Using AI to Risk Stratify a Medicare Population and Build a Culture (Erich Huang, MD, PhD)

Speaker

Erich S. Huang, MD, PhD
Co-Director, Duke Forge
Departments of Biostatistics & Bioinformatics and Surgery
Duke University School of Medicine

Topic

In Dreams Begin Responsibilities: Data Science as a Service—Using AI to Risk Stratify a Medicare Population and Build a Culture

Keywords

Data science; Data liquidity; Data standards; Machine learning; Duke Forge; Application programming interface; Artificial intelligence

Key Points

  • Duke Forge focuses on bringing the best methodological approaches to actionable data problems in health. It is motivated by a framework of value-based healthcare to address societal inequities in health.
  • Essential components to building a data science culture include clinical subject matter expertise, quantitative and methodological expertise, and software architecture and engineering expertise, along with interoperable tools and applications.
  • Like freight shipping containers, health-relevant data needs standardized containers that make any type of data easy to pack, grab, combine, and move around. The aim should be to build a “data liquidity ecosystem” equivalent to freighters, cranes, trains, and trucks that facilitate the logistics of health data transport.

Discussion Themes

If we’re trying to build an ecosystem, then the electronic health record (EHR) platform needs to be evaluated by whether it is truly participatory in this ecosystem. If not, then its deficiencies must be remediated.

The faster we can move to the cloud and use building blocks that “snap” together, the faster we can get answers. We want to be building applications instead of infrastructure.

Algorithms don’t have ethics; some have hidden biases. Algorithms need to be scrutinized and tested for such biases. They also must be secured so they cannot be manipulated.

Read more about Duke Forge and check out articles on the blog.

Tags

#pctGR, @Collaboratory1, @DukeForge

March 1, 2019: Approaches to Patient Follow-Up for Clinical Trials: What’s the Right Choice for Your Study? (Keith Marsolo, PhD)

Speaker

Keith Marsolo, PhD
Department of Population Health Sciences
Duke Clinical Research Institute
Duke University School of Medicine

Topic

Approaches to Patient Follow-Up for Clinical Trials: What’s the Right Choice for Your Study?

Keywords

Pragmatic clinical trial; Real-world data; Distributed research network; Electronic health records; EHR; Health data sources; Data standardization; Common data model; Fast Healthcare Interoperability Resources (FHIR); Application programming interface (API)

Key Points

  • Different sites have different capabilities and levels of sophistication around data. Clinical trial investigators should think from the beginning about the questions they want to answer and how much data is needed.
  • From different sources, such as the EHR, claims, or participant, data can be procured and provided in different ways, either by the patient, staff or clinician, or through IT and data experts.
  • PCTs with many sites may require a “patchwork quilt” of approaches for patient follow-up depending on the needs of the trial. Clinician-generated reports, direct from patients, and solutions involving application programming interfaces (APIs) are all good options for data exchange.

Discussion Themes

How do we think through the options for getting patient data where some sites may not be in the distributed research network or use a common data model?

Fast Healthcare Interoperability Resources (FHIR) is a draft standard describing data formats and elements and an application programming interface (API) for exchanging electronic health records. The FHIR interface requests data as an object, and for each defined domain it specifies allowable values and variables and predefines the information that you get out of the system.

Until data are collected/generated using the same standards/formats as the API, there will still be a need to understand the EHR-to-interface mapping.

For more information on using health data in embedded pragmatic clinical trials, visit the NIH Collaboratory’s EHR Core webpage.

Tags

#CommonDataModel, #RealWorldData, #FHIR, #pctGR, @Collaboratory1

July 23, 2018: New Report Summarizes Patient-Reported Health Data and Metadata Standards in the ADAPTABLE Trial

A new report in the Living Textbook describes results of a literature review of data standards and metadata standards for variables of interest to the ADAPTABLE trial. Based on the review, the authors recommend standards for ADAPTABLE, also known as the Aspirin Study, which is the first major randomized comparative effectiveness trial to be conducted by the National Patient-Centered Clinical Research Network (PCORnet). The trial aims to identify the optimal dose of aspirin therapy for secondary prevention in atherosclerotic cardiovascular disease.

Because the ADAPTABLE trial relies on patients to report key information at baseline and throughout follow-up, it represents a unique opportunity to develop, pilot, and evaluate methods to validate and integrate patient-reported information with data obtained from electronic health records (EHRs). In 2016, the National Institutes of Health implemented a project with the goal of using the ADAPTABLE study to develop methods to (1) assess the quality of patient-reported data and (2) integrate the data with existing EHR data. It is hoped that this project will inform future efforts to synthesize potentially inconsistent data from patient-reported and EHR sources and identify opportunities to streamline data.

Download the report.

June 7, 2018: NIH Releases First Strategic Plan for Data Science

On June 4, the National Institutes of Health (NIH) released its first Strategic Plan for Data Science. The plan outlines steps the agency will take to modernize research data infrastructure and resources and to maximize the value of data generated by NIH-supported research.

Data science challenges for NIH have evolved and grown rapidly since the launch of the Big Data to Knowledge (BD2K) program in 2014. The most pressing challenges include the growing costs of data management, limited interconnectivity and interoperability among data resources, and a lack of generalizable tools to transform, analyze, and otherwise support the usability of data for researchers, institutions, industry, and the public.

The goals of the NIH Strategic Plan for Data Science are to:

  • support an efficient, effective data infrastructure by optimizing data storage, security, and interoperability;
  • modernize data resources by improving data repositories, supporting storage and sharing of individual data sets, and integrating clinical and observational data;
  • develop and disseminate both generalizable and specialized tools for data management, analytics, and visualization;
  • enhance workforce development for data science by expanding NIH’s internal data science workforce and supporting expansion of the national research workforce, and by engaging a broader community of experts and the general public in developing best practices; and
  • enact policies that promote stewardship and sustainability of data science resources.

As part of the implementation of the strategic plan, the NIH will hire a chief data strategist.

New Living Textbook Chapter on Acquiring and Using Electronic Health Record Data for Research

Topic ChaptersMeredith Nahm Zozus and colleagues from the NIH Collaboratory’s Phenotypes, Data Standards, and Data Quality Core (now the Electronic Health Records Core) have published a new Living Textbook chapter about key considerations for secondary use of electronic health record (EHR) data for clinical research.

In contrast to traditional randomized controlled clinical trials where data are prospectively collected, many pragmatic clinical trials use data that were primarily collected for clinical purposes and are secondarily used for research. The chapter describes the steps a prospective researcher will take to acquire and use EHR data:

  • Gain permission to use the data. When a prospective researcher wishes to use data, a data use agreement (DUA) is usually required that describes the purpose of the research and the proposed use of the data. This section also describes use of de-identified data and limited data sets.
  • Understand fundamental differences in context. Data collected in routine care settings reflect standard procedures at an individual’s healthcare facility, and are not collected in a standard, structured manner.
  • Assess the availability of health record data. Few assumptions can be made about what is available from an organization’s healthcare records; up-front, detailed discussions about data element collection over time at each facility is required.
  • Understand the available data. A secondary data user must understand both the data meaning and the data quality; both can vary greatly across organizations and affect a study’s ability to support research conclusions.
  • Identify populations and outcomes of interest. Because healthcare facilities are obligated to provide only the minimum necessary data to answer a research question, investigators must identify the needed patients and data elements with specificity and sensitivity to answer the research question given the available data.
  • Consider record linkage. Studies using data from multiple records and sources will require matching data to ensure they refer to the correct patient.
  • Manage the data. The investigator is responsible for receiving, managing, and processing data and must demonstrate that the data are reproducible and support research conclusions.
  • Archive and share the data after the study. Data may be archived and shared to ensure reproducibility, enable auditing for quality assurance and regulatory compliance, or to answer other questions about the research.

FDA Releases Action Plan to Encourage Greater Patient Diversification in Trials


In August 2014, the Food and Drug Administration (FDA) released an action plan (link opens as a PDF) aimed at encouraging more diverse patient participation in drug and medical device clinical trials. The Action Plan to Enhance the Collection and Availability of Demographic Subgroup Data includes 27 responsive and pragmatic actions, divided into 3 overarching priorities:

  • Data quality: improving the completeness and quality of demographic subgroup data collection, reporting, and analysis
  • Participation: identifying barriers to subgroup enrollment in clinical trials and employing strategies to encourage greater participation
  • Transparency: making demographic subgroup data more available and transparent

The plan follows an August 2013 report to Congress on these concerns and reflects the agency’s commitment to encouraging the inclusion of a diverse patient population (with reference to sex, age, race, and ethnicity) in biomedical research that supports applications for FDA-regulated medical products. Increasing representation is a multifaceted challenge that requires a multifaceted approach and collaboration of federal partners, industry, healthcare providers, patients and patient advocacy groups, academicians, and community groups.

message from the Commissioner of the FDA contains background and details.


Collaboratory Phenotypes, Data Standards, and Data Quality Core Releases Data Quality Assessment White Paper


The NIH Collaboratory’s Phenotypes, Data Standards, and Data Quality Core (now the Electronic Health Records Core) has released a new white paper on data quality assessment in the setting of pragmatic research. The white paper, titled Assessing Data Quality for Healthcare Systems Data Used in Clinical Research (V1.0) provides guidance, based on the best available evidence and practice, for assessing data quality in pragmatic clinical trials (PCTs) conducted through the Collaboratory. Topics covered include an overview of data quality issues in clinical research settings, data quality assessment dimensions (completeness, accuracy, and consistency), and a series of recommendations for assessing data quality. Also included as appendices are a set of data quality definitions and review criteria, as well as a data quality assessment plan inventory.

The full text of the document can be accessed through the “Tools for Research” tab on the Living Textbook or can be downloaded directly here (PDF).