Developing and Refining the Research Questions

Using Electronic Health Record Data in Pragmatic Clinical Trials

Section 3

Developing and Refining the Research Questions


Rachel Richesson, MS, PhD, MPH

Richard Platt, MD, MSc

Gregory Simon, MD, MPH

Lesley Curtis, PhD

Reesa Laws, BS

Adrian Hernandez, MD, MSH

Jon Puro, MPA-HA

Doug Zatzick, MD

Erik van Eaton, MD, FACS

Vincent Mor, PhD


Contributing Editor

Karen Staman, MS

As with any type of research study, PCTs begin with a scientific question. A clearly articulated research question defines the phenomena of interest, the purpose for using EHR data, the possible sources of data to detect that phenomena, and, more specifically, the data requirements, definitions, quality, and data collection plan. In practice, however, there is often an iterative process between defining the data requirements for the EHR and understanding what is actually available at a given institution. The researcher must consider what information is needed to answer the question, and in turn, the available data may then influence the research question (i.e., ongoing cycles of this conversation: Health System: What data do you need for the trial? Researcher: That depends… What data is available?) This dialogue may lead investigators to refine their research question slightly to one that is likely more “answerable” based upon what data are collected or available. Anecdotally, many researchers see the potential for discovery in clinical data warehouses derived from EHRs, and are excited to use the available data to generate questions and answers. While this approach is understandable and practical in theory, the complexity of EHR architectures, and their inherent bias and possible error, could produce inaccurate data and lead to misinterpretation of results and erroneous conclusions. Consequently, we assert that the scientific research question should be the fundamental driver for the study design and hence the foundation for any PCT.

Good clinical research practice and ethics dictate that clinical trials collect the necessary data (and ONLY the necessary data) to answer a specific research question (ICH Harmonised Tripartite Guideline 1996). In most PCTs, it is vitally important to identify and measure co-variates between study arms or clusters. In these cases, there need to be high quality (accurate, complete) data from a number of variables so that researchers can assess the comparability between groups. The objective is to achieve balance between the groups along as many dimensions that are relevant, important, and feasible.

Because developing a PCT that will use data from the EHR can be extremely complicated, the NIH Collaboratory has developed a set of papers and chapters that provide a deeper dive into many of the issues, including developing data definitions and phenotyping, developing a plan to assess and assessing data quality, and steps to use to acquire and manage the data. See the Resources section in the side panel.



Electronic Health Records-Based Phenotyping

A resource chapter describing mechanisms for identifying and evaluating phenotype definitions, with a particular focus on standardization efforts from the Collaboratory

Suggestions for Identifying Phenotype Definitions Used in Published Research

A guide to help those conducting a literature search for publications related to utilizing EHR data for the purpose of characterizing patients, populations, or cohorts

Phenotypes Environmental Scan

A catalog of phenyotype-related efforts identified

Resources for Computable Phenotype Development: Sources of Existing Phenotypes

List of sources of existing phenotypes



back to top

ICH Harmonised Tripartite Guideline. 1996. Guideline for Good Clinical Practice E6(R1). Assessed Aug 14, 2017.


Richesson R, Platt P, Simon G, et al. Using Electronic Health Record Data in Pragmatic Clinical Trials: Developing and Refining the Research Questions. In: Rethinking Clinical Trials: A Living Textbook of Pragmatic Clinical Trials. Bethesda, MD: NIH Health Care Systems Research Collaboratory. Available at: Updated November 7, 2017.