April 13, 2018: OHDSI: Drawing Reproducible Conclusions from Observational Clinical Data


George Hripcsak, MD, MS
Vivian Beaumont Allen Professor of Biomedical Informatics, Columbia University
Chair, Department of Biomedical Informatics, Columbia University
Director, Medical Informatics Services, NewYork-Presbyterian Hospital/Columbia


OHDSI: Drawing Reproducible Conclusions from Observational Clinical Data


Clinical research; Observational research; Precision medicine; Electronic health records; OHDSI; Clinical data

Key Points

  • Observational Health Data Sciences and Informatics (OHDSI) was developed to improve health by empowering communities to collaboratively generate the evidence that promotes better health decisions and patient care.
  • Patient-level predictions for personalized evidence requires big data. OHDSI provides a global network of at least a billion open-source records on more than 400 million patients in 80 databases.
  • It is feasible to encode the world population in a single data model if results are shared through a multi-stakeholder, interdisciplinary approach.
  • Through access to observational data, researchers can look at multiple outcomes or treatments at once to produce large numbers of comparisons.

Discussion Themes

There are five steps in reproducible research:

  1. Address measured confounding variables
  2. Identify residual confounding variables
  3. Use multiple databases, locations, and practice types
  4. Publish hypotheses, parameters, and runs
  5. Generate evidence

Data dredging is not about what researchers do, but about what they throw out, so it should not be done in the literature. Every last parameter should be published so a study can be reproduced.

There is an “upside down” approach to the problem of multiple comparisons and selective publication, by telling researchers that all comparisons are required.

People get their care from multiple institutions, so OHDSI is looking for ways to ensure that outcomes are not left out. Administrative databases tend to be more complete in terms of longitudinal information.

NCDR cardiovascular registry data, certain labs, radiology/path notes, and sociodemographic information are all key elements that are being considered as sources to enrich observational research.


To learn more about OHDSI, visit http://ohdsi.org or follow @OHDSI on Twitter.

For information on electronic health record (EHR) phenotyping, visit The Living Textbook https://bit.ly/2Hk5pQ8.


@PCORI, @Columbia, #clinicaldata, #observationaldata, #PrecisionMedicine, #openscience, #jointhejourney, #pctGR

WordPress is a content management system and should not be used to upload any PHI as it is not an environment for which we exercise oversight, meaning you the author are responsible for the content you post. Please use this system accordingly. Site Map