Statistical Design Considerations – ARCHIVED

Experimental Designs and Randomization Schemes


Section 2

Statistical Design Considerations – ARCHIVED

Although PCTs do not necessarily require a specific statistical design approach, both the kinds of questions PCTs are designed to answer and the settings in which they take place may tend to favor certain approaches, such as cluster randomization. The nature of the interventions they seek to test, which may involve healthcare delivery changes, might be better implemented through randomization at the practice, clinic, or even hospital level.

In the following sections, we examine key considerations in statistical study design and analysis for PCTs.

Watch the video module: Choosing the Right Study Design

Three Kinds of Randomized Trials

A key design factor for PCTs is the choice of unit of randomization. There are 3 kinds of randomized trials:

In traditional RCTs, the unit of randomization is generally the individual trial participant, and each individual is randomly assigned to receive an experimental intervention, a comparator therapy, or a placebo. There is no interaction among trial participants after randomization. Most drug trials are traditional RCTs.

In individually randomized group treatment trials (IRGTs), like traditional RCTs, the unit of randomization is the individual trial participant, and each individual is randomly assigned to a study condition. However, there is interaction among trial participants after randomization in 1 or more of the study conditions. Many surgical trials and behavioral trials are IRGTs. These trials are sometimes called "partially nested" or "partially clustered" designs.

In cluster randomized trials (CRTs), randomization takes place at the level of the physician, practice, hospital, health system, city block, or other unit that comprises multiple patients or other participants. The groups are randomly assigned to study conditions, and there is interaction among members of the same group before and after randomization. Many trials conducted in communities, worksites, schools, and other settings are CRTs. CRTs are sometimes called "group randomized trials" or "community trials." There are 2 kinds of CRTs:

  • Parallel CRTs: In a parallel CRT, there are parallel intervention and control conditions throughout the trial with no crossover.
  • Stepped-wedge CRTs: In a stepped-wedge CRT, all groups start the trial in the control condition. The groups cross over to the intervention condition in random order and on a staggered schedule. All groups receive the intervention before the end of the trial.

Case Example: Unit of Randomization vs Unit of Measurement

The “Time to Reduce Mortality in End-Stage Renal Disease” (TiME) trial, an NIH Collaboratory Trial, provides an example of the difference between the unit of randomization and the unit of measure typical of cluster randomized trials.

In the TiME trial, participating dialysis clinics providing care to patients with end-stage renal disease were randomly assigned to provide one of two interventions: an “extended” period of hemodialysis for a minimum of 4.25 hours or standard care. The trial was designed to evaluate whether the extended period of dialysis would be associated with better survival and quality-of-life outcomes. Thus, the unit of randomization for TiME was the dialysis clinic, but the measurements of interest were the outcomes of individual patients.

SECTIONS

CHAPTER SECTIONS

Resources

Essentials of ePCTs Seminar: 2019 AcademyHealth Annual Research Meeting
Electronic booklet from a full-day preconference seminar hosted by AcademyHealth in partnership with the NIH Collaboratory, June 1, 2019

Workshop on Design and Analysis of Embedded Pragmatic Clinical Trials
NIH Collaboratory Steering Committee Meeting, May 2, 2019. Panel discussions of challenges in the design and analysis of ePCTs, including measurement and data, choosing a parallel group or stepped-wedge design, and unique complications of conducting research in dynamic healthcare settings.

Linking Design to Analysis of Cluster Randomized Trials: Covariate Balancing Strategies
NIH Collaboratory Grand Rounds, February 9, 2018


Version History

January 22, 2021: Added embedded video (change made by G. Uhlenbrauck).

July 2, 2020: Minor corrections to layout and formatting (changes made by D. Seils).

May 27, 2020: Revised the text to further describe the 3 kinds of randomized trials, added the “Essentials of ePCTS Seminar” to the Resources sidebar, added Heagerty to the contributors list, and reordered the sections of this chapter as part of the annual content update (changes made by D. Seils).

April 21, 2020: Added information about the NIH Collaboratory Workshop on the Design and Analysis of Embedded Pragmatic Trials (changes made by K. Staman).

January 16, 2019: Added a Resources box and made nonsubstantive changes to the text as part of the annual content update (changes made by D. Seils).

Published August 25, 2017

Data on Different Approaches to Disclosure – ARCHIVED

Consent, Disclosure, and Non-Disclosure


Section 5

Data on Different Approaches to Disclosure – ARCHIVED

version 1.0 - removed 10/18/2022 go to latest version

People's Preferences for Different Approaches

Research has been done to elicit people’s views on different types of research and their preferences regarding consent and disclosure.

  • In a survey regarding research on medical practices (ROMP), ¾ of US respondents indicated that they prefer to have conversations about participating in randomized or retrospective observational studies with their physician as opposed to researchers (Cho et al 2015).
  • Focus group data indicate people want and expect information about ROMP to come from their physicians, in part because of the responsibility that the physician has for the patient and people trust their physicians and believe they will only propose research that is worthwhile and safe (Kelley et al 2015).
  • Two-thirds of the people in the ROMP survey indicated that they would be comfortable using an alternative approach to written consent if the research could not otherwise take place, but would prefer to have traditional consent documented in a signed form if possible (Cho et al 2015).
  • In another survey, people were generally comfortable waiving consent for minimal-risk quality improvement activities (Kaplan et al 2016), but pragmatic research can exist on the borderline between quality improvement and traditional clinical research (Finkelstein et al 2015), raising questions about what people are comfortable with across the continuum of these activities.

In another study, focus group respondents “desire to be actively notified and asked was more prominent with regard to CER [Comparative Effectiveness Research] studies than with regard to Operations studies (Weinfurt et al 2016).” The authors suggest that effective policy and guidance will involve a balancing of “different patients' interests and potentially different sets of interests for different types of research studies on usual medical practices.”

Although people generally endorse research and want to participate, and a majority indicate that they are comfortable with alternate approaches, there remains a substantial minority of people who want to be at least minimally engaged in decision making about this type of research (Sugarman 2016), and this raises ethical questions regarding the most appropriate ways to protect these individual’s rights and interests.

Weinfurt et al conducted a series of web-based surveys to compare different models for notification and authorization in different types of CER to determine how acceptable these approaches were to participants as well as their level of understanding (Weinfurt et al 2017).

Key findings:

  • Many people have significant difficulty understanding certain aspects of pragmatic trials of commonly used medical practices, especially randomization and extra burden. They tend to automatically assume that a medical research study will involve extra visits, calls, etc, even when the consent form explicitly states that there will be no extra requirements of them.
  • Although willingness to participate did not differ by notification or authorization approach (opt out, opt in, broad notification, oral or written consent, etc), it was not universal; 28% to 49% of respondents would decline to participate. This could lead to nontrivial consent bias because the people who decline could be different in some significant way to the people who agree to participate. However, this bias was the same for all approaches for notification and authorization
  • Most of the respondents viewed less active approaches to notification, such as no notification ahead of time or broad notification, as unacceptable for some types of pragmatic research.
  • When using written consent in cases where researchers are testing accepted medical interventions that have known clinical risks but with no incremental risks of participating in the research, it could be acceptable to omit the clinical risks from the consent documents, thereby shortening the form.
  • Active alternatives to written consent—such as oral consent—did not compromise consent quality (from the patient’s perspective).

Based on these findings, the authors suggest that effort and resources should be expended to develop, field, and test alternate approaches to notification and authorization (Weinfurt et al 2017). As research practices evolve more work is needed to balance our understanding of what people want and the desire to gather more information about the risks, burdens and benefits of interventions and therapies to improve health care.

Methodological Challenges for Studying Different Approaches to Disclosure/Notification and Authorization

In most of the research regarding people’s preferences, it is unclear whether people have a full understanding of the nature of the research being described and the differences between conventional research and related activities, such as pragmatic research, quality improvement research, and comparative effectiveness research. Our current understanding relies, in large part, on hypothetical examples, such as those described above, and may not reflect what would happen in actual practice. To gain further insight, investigators would need to be inquisitive about the different approaches to disclosure and authorization and test them as part of future embedded trials to help determine what is and is not acceptable across a spectrum of different types of research (Sugarman 2016).

SECTIONS

CHAPTER SECTIONS

REFERENCES

back to top

Cho MK, Magnus D, Constantine M, et al. 2015. Attitudes toward risk and informed consent for research on medical practices: a cross-sectional survey. Ann Intern Med. 162:690–696. doi:10.7326/M15-0166. PMID: 25868119.

Finkelstein JA, Brickman AL, Capron A, et al. 2015. Oversight on the borderline: quality improvement and pragmatic research. Clin Trials. 12:457–466. doi:10.1177/1740774515597682. PMID: 26374685.

Kaplan SH, Gombosev A, Fireman S, et al. 2016. The patient’s perspective on the need for informed consent for minimal risk studies: development of a survey-based measure. AJOB Empirical Bioethics. 7:116–124. doi:10.1080/23294515.2016.1161672.

Kelley M, James C, Alessi Kraft S, et al. 2015. Patient perspectives on the learning health system: the importance of trust and shared decision making. Am J Bioeth. 15:4–17. doi:10.1080/15265161.2015.1062163. PMID: 26305741.

Sugarman J. 2016. Ethics of research in usual care settings: data on point. AJOB Empirical Bioethics. 7:71–75. doi:10.1080/23294515.2016.1152104.

Weinfurt KP, Bollinger JM, Brelsford KM, et al. 2017. Comparison of approaches for notification and authorization in pragmatic clinical research evaluating commonly used medical practices. Med Care. doi:10.1097/MLR.0000000000000762. PMID: 28650924.

Weinfurt KP, Bollinger JM, Brelsford KM, et al. 2016. Patients’ views concerning research on medical practices: implications for consent. AJOB Empirical Bioethics. 7:76–91. doi:10.1080/23294515.2015.1117536. PMID: 27800531.


Version History

July 3, 2020: Minor corrections to layout and formatting (changes made by D. Seils).

Published August 25, 2017

Non-disclosure of Research Activities – ARCHIVED

Consent, Disclosure, and Non-Disclosure


Section 4

Non-disclosure of Research Activities – ARCHIVED

version 1.0 - removed 10/18/2022 go to latest version

Ethical Foundations and Requirements

Currently, the threshold for a determination is the same for both waiver and alteration of informed consent, which arguably limits the options of the IRB when determining what the consent process will be for a given study (McKinney et al 2015). Some have suggested that new standards are necessary to enable pragmatic research while ensuring the protection of patient’s rights and interests (McKinney et al 2015).

With non-disclosure (ie, if consent is waived), no attempt is made to notify or request permission for people whose information will be used or who may be affected by a trial.

Case Example: Lumbar Imaging With Reporting of Epidemiology (LIRE)

Lumbar Imaging With Reporting of Epidemiology (LIRE), an NIH Collaboratory Trial, was designed to test whether inserting epidemiological evidence (essentially representing the normal range) in lumbar spine imaging reports will reduce subsequent diagnostic and therapeutic interventions, including cross-sectional imaging (MR/CT), opioid prescriptions, spinal injections and surgery (Jarvik et al 2015). As part of their justification for a waiver of patient consent in the Project Summary, the investigators cite that the project is minimal risk and will not adversely affect the rights and welfare of the subjects, and that the research could not practicably carried out without a waiver [because there is no easily implementable means of obtaining consent and there is a large study population (~250,000)] AND:

“4.1.3c By informing primary care providers and patients of the study, we risk invalidating the results. If providers and patients are aware of the intervention but are allocated to the control group, they may nevertheless change their behavior.

4.1.3d The risk of contacting subjects is greater than the risk of the study procedures. The risk for breach of patient confidentiality increases when subject contact information is maintained for the purposes of contacting patients for their consent. It is our opinion that this increased risk far exceeds the risk to subjects associated with the insertion of epidemiologic data into the radiology report interpreted by their provider (Jarvik 2013).”

Individual Notification

Even when a waiver of consent is granted, some investigators provide individual notification and may offer an opportunity to opt out as a mechanism to preserve autonomy. For example, in the Time to Reduce End Stage Renal disease (TiME) trial (described in Section 3), all patients initiating treatment with maintenance hemodialysis at participating facilities were provided with written information that included the trial sponsor, the purpose of the trial, how the trial would affect dialysis session duration, the treating physician’s role, a description of the transmission of de-identified patient data to the University of Pennsylvania, a statement that no additional testing will be performed for the trial, and a toll-free telephone number to contact with questions or to opt-out of participation.

SECTIONS

CHAPTER SECTIONS

REFERENCES

back to top

Jarvik JG. 2013. LIRE Supplementary Material.

Jarvik JG, Comstock BA, James KT, et al. 2015. Lumbar Imaging With Reporting of Epidemiology (LIRE): protocol for a pragmatic cluster randomized trial. Contemp Clin Trials. doi:10.1016/j.cct.2015.10.003. PMID: 26493088

McKinney RE, Beskow LM, Ford DE, et al. 2015. Use of altered informed consent in pragmatic clinical research. Clin Trials. 12:494–502. doi:10.1177/1740774515597688. PMID: 26374677.


Version History

July 3, 2020: Minor corrections to layout and formatting (changes made by D. Seils).

Published August 25, 2017

Alternative Approaches to Disclosure and Authorization – ARCHIVED

Consent, Disclosure, and Non-Disclosure


Section 3

Alternative Approaches to Disclosure and Authorization – ARCHIVED

version 1.0 - removed 10/18/2022 go to latest version

Ethical Foundations and Requirements

Alternative procedures for disclosing information and obtaining authorization for research participation may be allowable as a regulatory matter, but as an ethical matter they should be consistent with the ethical principles outlined in the Belmont Report. Alternative forms of disclosure and authorization may be less burdensome and ethically sound in the conduct of some types research, such as some PCTs or when this research is randomized by cluster (ie, by hospital, primary care provider, etc).

An IRB may waive or alter the requirements of informed consent if all of the below are deemed true:

“(1) The research involves no more than minimal risk to the subjects;

(2) The waiver or alteration will not adversely affect the rights and welfare of the subjects;

(3) The research could not practicably be carried out without the waiver or alteration; and

(4) Whenever appropriate, the subjects will be provided with additional pertinent information after participation.” §46.116

In this section, we expand on 2 key elements in the requirements to waive or alter informed consent: minimal risk and practicability.

Minimal Risk Research

Numerous questions have been raised about how best to define minimal risk: Does randomization itself cause more than minimal risk? In a cluster-randomized trial, if sites are randomized to interventions thought to be in clinical equipoise, is that minimal risk? Should minimal risk be judged relative to healthy people or to people with the disease or problem under investigation? In a cluster, is minimal risk determination an assessment of the average or does the study need to be minimal risk for every person in it?

A definition of “minimal risk” is provided in the Common Rule:

Minimal risk means that the probability and magnitude of harm or discomfort anticipated in the research are not greater in and of themselves than those ordinarily encountered in daily life or during the performance of routine physical or psychological examinations or tests.” §46.102

However, this definition has proved hard to apply in actual practice. For example, the Time to Reduce End Stage Renal disease (TiME) trial, and NIH Collaboratory Trial, evaluated a minimum hemodialysis session duration of at least 4.25 hours (if determined medically appropriate by the treating nephrologist) compared with usual care (no trial-driven approach to session duration) for patients with end-stage renal disease on survival and quality of life for patients with kidney failure (Dember et al 2016). The participants in TiME are, arguably, a high-risk population because they have end-stage renal disease. Although observational data suggested that longer dialysis times are associated with better outcomes, there was initial concern about the ability to designate this research as “minimal risk” because of the study population.

Practicability

The notion of what, exactly “practicability” means in the regulatory clause “the research could not practicably be carried out without a waiver” can also be difficult to discern. To clarify this, the Secretary’s Advisory Committee on Human Research Protections (SACHRP) put out a set of considerations regarding it, including:

  1. “scientific validity would be compromised if consent were required because it would introduce bias to the sample selection
  2. subjects’ behaviors or responses would be altered, such that study conclusions would be biased
  3. the consent procedure would itself create additional threats to privacy that would otherwise not exist
  4. there is risk of inflicting significant psychological, social or other harm by contacting individuals or families.”

Once the IRB has determined that the waiver or alteration of consent is permissible, a modified approach to disclosure and authorization may be considered.

Additional New Regulatory Requirements Related to Alternative Approaches to Disclosure and Authorization

The revisions to the Common Rule establish new exempt categories of research, and research fitting these categories would need only limited IRB review to ensure adequate privacy for the research.

The revised Common Rule also allows the use of broad consent for the use of identifiable information or identifiable biospecimens for other research studies (other than the proposed one) for

  • Storage and maintenance for secondary research use
  • Secondary research (including future uses)

Broad consent may be permitted as an alternative to specific informed consent for a future research study. If so, in addition to selected standard elements of traditional informed consent, broad consent must also include:

  • A description of the types of research that may be conducted
  • What information/biospecimens will be used
  • Period of time of storage and maintenance
  • If applicable, the subject will not be informed about specific research studies
  • Disposition of clinically relevant research results
  • Contact information §__.116.d

If broad consent is implemented for a particular study, and a potential participant refuses to consent, then the IRB can no longer grant a waiver of consent. This may make this approach to doing research under broad consent logistically complicated or infeasible.

Approaches to Alteration of Consent

In cases when traditional informed consent is waived or alteration of consent is approved, the approaches to disclosure and authorization shown below may be appropriate from both regulatory and ethics perspectives.

Broad Notification

  • Broad notification includes using posters, emails, brochures, social media or web portals, etc. to provide opportunities to inform patients that research is being conducted (McGraw et al 2015). In some cases, a waiver of consent may be granted, but an IRB may still require that particular study information be posted.
  • Example: The Randomized Evaluation of Decolonization versus Universal Clearance to Eliminate MRSA (REDUCE MRSA) trial (Huang et al 2013), compared three strategies in current use for preventing methicillin-resistant Staphylococcus aureus (MRSA) infections in adult intensive care units (ICUs) of a single health system. Although a consent waiver was granted, the IRB required that notices were posted in each ICU room to provide an opportunity for patients and their families to be informed.

Opt-out

  • Potential participants can decline to participate.
  • Example: The Strategies and Opportunities to Stop Colorectal Cancer in Priority Populations (STOP CRC) trial was designed to improve rates of colorectal-cancer screening among the predominantly minority and low-income patients who receive health care services through Federally Qualified Health Centers (FQHCs; Coronado et al 2014). Prospective participants were identified in the electronic health record as being aged 50 to 74 years and not up-to-date with colorectal-cancer screening guidelines. Investigators sent these individuals an Introduction Letter explaining the program and provided an option to opt out of the mailed program by contacting the FQHC (Coronado et al 2014). The letter says: “If you have had a colonoscopy in the past 9 years or prefer that we not mail you a test, please contact us at xxx-xxx-xxxx.” Patients who did not opt out subsequently received a fecal immunochemical test (FIT) kit in the mail with a letter and a set of wordless instructions. Participants who did not opt out were also reminded by post card or phone to complete their tests. It is worth noting that the letter did not include an option to opt out of data collection for research activities. De-identified data was provided to the investigators from the Oregon Community Health Information Network (OCHIN).

Opt-in

  • Prospective participants are asked (either in writing or orally) if they would like to participate in a research trial, and are included if they say “yes” or opt in. This agreement is usually documented in the electronic health record (EHR).
  • Example: The goal of the Collaborative Care for Chronic Pain in Primary Care/Pain Program for Active Coping and Training (PPACT) trial was to coordinate and integrate services for helping patients adopt self-management skills for managing chronic pain, limit use of opioid medications, and identify exacerbating factors amenable to treatment that are feasible and sustainable within the primary care setting (Debar et al 2012). Prospective participants were sent a letter explaining the program and indicating that someone from the PPACT team may call to explain more about the program. The letter provided a phone number to call if the individual would prefer no further contact with the investigative team—to opt out. When the team member discussed the program, the individual was given the opportunity to opt in, which was recorded in the EHR.

Short Form

  • A short form may be used stating that the elements of informed consent as required by §46.116 were presented orally, although the consent document is shorter, the consent process itself may be longer (McKinney et al 2015). According to §46.117 the requirements for using a short form include:
    • A witness to the oral presentation
    • An IRB-approved written summary of what is to be said to the participant (or representative)
    • Signatures, as follows
      • Short form to be signed by participant (or representative) and witness
      • Copy of the summary to be signed by witness and person actually obtaining consent

Electronic Consent

The Office for Human Research Protections (OHRP) and the Food and Drug Administration (FDA) published a Guidance for Institutional Review Boards, Investigators, and Sponsors on use of electronic consent. According to the guidance:

  • Electronic consent must contain all required elements of informed consent, be presented in an easily understandable manner, and minimize the possibility of coercion.
  • If the consent process takes place remotely, the electronic system “must include a method to ensure that the person electronically signing the informed consent is the subject who will be participating in the research study or is the subjects legally authorized representative.”
  • Subjects should be given contact information for questions.
  • Interactive technology (such as quizzes) may be used to assess comprehension
  • The electronic consent form must include a statement that significant findings that could affect the participant’s willingness to participate will be shared.
  • Electronic signatures can be used to document consent, although steps should be taken to verify the identity of participants before commencing study-related activities.
  • For pediatric studies, parental permission may be obtained electronically using the same procedures as informed consent.

SECTIONS

CHAPTER SECTIONS

Resources

For pragmatic clinical trials, minimal risk determinations have been variable and confusing. The authors of the article, Considerations in the evaluation and determination of minimal risk in pragmatic clinical trials (Lantos et al. 2015) examine factors involved in the determination of minimal risk for pragmatic clinical trials and advocate for an assessment based on incremental risk. The implications for informed consent are also explored.

In October and November of 2017, the PCORnet E-consent Workgroup hosted a two-part webinar series to share real-world experience with implementing electronic consent within PCORnet and to disseminate best practices.

REFERENCES

back to top

Coronado GD, Burdick T, Petrik A, Kapka T, Retecki S, Green B. Using an Automated Data-driven, EHR-Embedded Program for Mailing FIT kits: Lessons from the STOP CRC Pilot Study. J Gen Pract (Los Angel). 2014;2:1000141. doi:10.4172/2329-9126.1000141. PMID: 25411657.

Coronado GD, Vollmer WM, Petrik A, et al. Strategies and Opportunities to STOP Colon Cancer in Priority Populations: design of a cluster-randomized pragmatic trial. Contemp Clin Trials. 2014;38(2):344-349. doi:10.1016/j.cct.2014.06.006. PMID: 24937017.

Debar LL, Kindler L, Keefe FJ, et al. 2012. A primary care-based interdisciplinary team approach to the treatment of chronic pain utilizing a pragmatic clinical trials framework. Transl Behav Med. 2:523-530. doi:10.1007/s13142-012-0163-2. PMID: 23440672.

Dember LM, Archdeacon P, Krishnan M, et al. 2016. Pragmatic trials in maintenance dialysis: perspectives from the Kidney Health Initiative. J Am Soc Nephrol. 27:2955-2963. doi:10.1681/ASN.2016030340. PMID: 27401689.

Huang SS, Septimus E, Kleinman K, et al. 2013. Targeted versus universal decolonization to prevent ICU infection. N Engl J Med. 368:2255-2265. doi:10.1056/NEJMoa1207290. PMID: 23718152.

Lantos JD, Wendler D, Septimus E, Wahba S, Madigan R, Bliss G. 2015. Considerations in the evaluation and determination of minimal risk in pragmatic clinical trials. Clin Trials. 12:485–493. doi:10.1177/1740774515597687. PMID: 26374686.

McGraw D, Greene SM, Miner CS, Staman KL, Welch MJ, Rubel A. 2015. Privacy and confidentiality in pragmatic clinical trials. Clin Trials. 12:520–529. doi:10.1177/1740774515597677. PMID: 26374682.

McKinney RE, Beskow LM, Ford DE, et al. 2015. Use of altered informed consent in pragmatic clinical research. Clin Trials. 12:494–502. doi:10.1177/1740774515597688. PMID: 26374677.


Version History

July 3, 2020: Minor corrections to layout and formatting (changes made by D. Seils).

Published August 25, 2017

Informed Consent – ARCHIVED

Consent, Disclosure, and Non-Disclosure


Section 2

Informed Consent – ARCHIVED

version 1.0 - removed 10/18/2022 go to latest version

Ethical Foundation of Informed Consent

In the United States, federal regulations for protecting the rights, interests, and welfare of human subjects who participate in research are based in part on the Belmont Report, which articulates three ethical principles that always warrant consideration in research:

  • Respect for persons
    • Individuals should be treated as autonomous agents
    • Those with diminished autonomy are entitled to protection
  • Beneficence
    • Obligation to do no harm
    • Maximize possible benefits and minimize possible harm
  • Justice
    • The benefits and burdens of research are distributed fairly

Informed consent is one of the primary means of upholding the ethical principle of Respect for Persons in much research. The Belmont Report lists 3 important components of informed consent:

Information. The consent process should include information about the research procedure, the purpose of the research, risks and anticipated benefits, alternative procedures and a statement offering the subject the opportunity to ask questions and to withdraw at any time from the research.

Comprehension. The information should be conveyed in a manner and context that promotes understanding of the information.

Voluntariness. Consent is valid only if it is voluntarily given.

Current Regulatory Requirements (for Whom, When, What to Disclose)

The Common Rule applies to most federally funded research (or research conducted in federally funded institutions that elect to follow these rules for all research conducted within their institution; see list of the departments and agencies that adhere to the Common Rule). The Common Rule defines research as “a systematic investigation, including research development, testing and evaluation, designed to develop or contribute to generalizable knowledge.” §46.012 The rule outlines the basic requirements for Institutional Review Boards (IRBs), including provisions related to obtaining and documenting informed consent.

According to 45 CFR 46 section .116 (General requirements for consent), an investigator must obtain legally effective informed consent to involve a human being as a subject in research so that the prospective subject can be given sufficient opportunity to decide whether or not to participate. Under certain circumstances, an IRB may approve a consent procedure which alters or waives the requirements for informed consent, and this is described in more detail in Section 3: Alternative Approaches to Disclosure and Authorization.

The Common Rule defines a human research subject as “a living individual about whom an investigator (whether professional or student) conducting research obtains

(1) Data through intervention or interaction with the individual, or

(2) Identifiable private information.” §46.012

New Regulatory Requirements

On January 19, 2017, the Department of Health and Human Services (DHHS) and 15 other agencies published revisions to the Common Rule, and some of the most significant changes are aimed at enhancing the informed consent process for research (Sugarman 2017).

In order to give institutions additional time to prepare, implementation of most of the revisions was delayed until January 20, 2019. The compliance date for cooperative research (and use of a central or single IRB) has not changed and is still January 20, 2020.

When the proposed changes to the Common Rule go into effect, the definition of human subject will change:

Human subject “means a living individual about whom an investigator (whether professional or student) conducting research: (i) Obtains information or Biospecimens through intervention or interaction with the individual, and uses, studies, or analyzes the information or biospecimens; or (ii) Obtains, uses, studies, analyzes, or generates identifiable private information or identifiable biospecimens.” §__.102.e

The revised rule includes new requirements regarding the information that must be provided as part of the informed consent process:

  • The rule specifies that the informed consent process must begin with a summary of the main reasons why a person might or might not want to participate in the research.
    • It must start with “a concise and focused presentation of the key information that is most likely to assist a prospective subject or legally authorized representative in understanding the reasons why one might or might not want to participate in the research.” This part of the informed consent must be organized and presented in a way that facilitates comprehension.” §__.116.a.5.i
  • Key information that is most important to the subject and likely to help a potential subject (or their legal representative) make a decision about participation must be provided. The approach should emphasize the fostering of overall understanding and comprehension
    • “Informed consent as a whole must present information in sufficient detail relating to the research, and must be organized and presented in a way that does not merely provide lists of isolated facts, but rather facilitates the prospective subject’s or legally authorized representative’s understanding of the reasons why one might or might not want to participate.” §__.116.a.5.i
  • The prospective subject should also be provided with an opportunity to discuss the information

There are also additional disclosure requirements, including some for research that involves the collection of identifiable private information or identifiable biospecimens. There must be either be

  • A statement that identifiers might be removed so the information or biospecimens could be used for future research, OR
  • A statement that, even if identifiers are removed, the information or biospecimens will not be used or distributed for future research studies. §__.116.b.9

To facilitate transparency, at least one IRB-approved version of the consent form for a given clinical trial must be posted on the Federal Website “after the trial is closed to recruitment and no later than 60 days after the last study visit by any subject.” §__.116.h

SECTIONS

CHAPTER SECTIONS
Resources

Proposed revisions to the Common Rule.

For a detailed information on the theoretical and regulatory foundations of informed consent and what should be disclosed as part of the consent process see the white paper: Informed Consent.

REFERENCES

back to top

Sugarman J. 2017. Examining provisions related to consent in the revised Common Rule. Am J Bioeth. 17:22-26. doi:10.1080/15265161.2017.1329483. PMID: 28661754.


Version History

July 3, 2020: Minor corrections to layout and formatting (changes made by D. Seils).

Published August 25, 2017

Assessing Outcomes

Using Electronic Health Record Data in Pragmatic Clinical Trials


Section 8

Assessing Outcomes

Because people receive health care from a variety of sources, it is often necessary to collect data from multiple providers in order to adequately determine whether patient outcomes have changed as a result of the intervention. Some considerations when defining outcomes based on the EHR are discussed in the Living Textbook Chapter Choosing and Specifying Endpoints and Outcomes. Key questions from that chapter include:

  • Is the outcome a medically significant such that a patient would seek care?
  • How will the endpoint be medically attended or documented?
  • Does it require hospitalization?
  • Is the treatment for the outcome generally inpatient or outpatient?
    • Outpatient events may not be coded very specifically in the EHR or claims.
  • What is the intensity of medical care?
    • If high, as with a myocardial infarction, then there will be a clear record in claims and/or EHR data.
    • If low, as with the gout example, there may or may not be a record of the event. A solution to this problem is to use a PRO, and reach out to the participant at specified intervals.

For more, read the full chapter.

Strategies to mitigate the incomplete capture of patient health care services and follow-up include:

  • Patient-targeted prospective (new) data collection, such as online or telephone surveys regarding patient-reported or patient-centered outcomes, using dedicated research staff
  • Linkage to other sources, such as insurance claims

Collecting Patient-Reported Outcomes in the EHR

Investigators from six of the NIH Pragmatic Trials Collaboratory’s PCTs shared lessons learned from challenges they encountered while collecting patient-reported outcome (PRO) measures during their trials and the tactics used to mitigate them.

PRO measures reflect meaningful aspects of health and provide information about outcomes that are experienced uniquely by the patient, such as pain intensity, fatigue, and satisfaction with social roles. The authors of a paper on utilizing PROs in the EHR articulate case examples for each of the challenges encountered (Zigler et al 2024), as summarized below:.

Case Examples

  • Healthcare systems do not collect the necessary PRO measures for research.
    • Recommendation: Realize differences in system priorities and facilitate discussion within the healthcare systems to understand how changes would impact their work.
  • Healthcare systems are complex and often have “information overload.”
    • Recommendation: The EHR should display only relevant, actionable information to providers versus raw PROM data.
  • Healthcare systems have unique processes, cost structures, and timelines for prioritization Additionally, health system/health system’s IT timelines often don’t match the grant timelines.
    • Recommendation: Create temporary solutions for gathering the necessary PRO data before integration with the HER.
  • Data entry is an additional burden on clinical staff, and care team may not see the benefit.
    • Recommendation: Utilize electronic PRO measures as much as possible to lift some of the data entry burden from care teams and demonstrate their value by using them during specific encounters.
  • Scores must be interpretable; clinicians don’t always know what is clinically significant.
    • Recommendation: Ensure the PRO measures are meaningful, and the results can be interpreted by clinicians and patients alike.
  • Low adoption and reach of technology such as personal health records (PHRs) at low resource settings such as safety net community health clinics (CHCs).
    • Recommendation: Utilize more familiar technologies, such as bidirectional SMS messaging that can connect patients to health services.
  • PRO measures are typically chosen based their specific intended use, so the “best” PROM for an electronic pragmatic clinical trial would likely be less optimal in a clinical care setting
    • Recommendation: Clinicians and patients should work together to identify the PRO measures best suited to support their individual treatment goals.

The authors of this paper suggest that these barriers require study teams to use separate data collection systems or integrate externally collected PRO data into the electronic health record.

“When using patient-reported outcome measures for embedded pragmatic clinical trials investigators must make important decisions about whether to use data collected from the participating health system’s electronic health record, integrate externally collected patient-reported outcome data into the electronic health record, or collect these data in separate systems for their studies(Zigler et al 2024).”

Case Example: Assessing Outcomes

The goal of the Collaborative Care for Chronic Pain in Primary Care (PPACT) study was to enable patients to adopt self-management skills for chronic pain, limit use of opioid medications, and identify factors amenable to treatment in the primary care setting. Investigators needed patient-reported outcomes (PRO) data for their primary endpoints. The study was conducted in three distinct regions of` Kaiser Permanente: the Northwest, Hawaii, and Georgia. Investigators determined that the PRO data collected via standard clinical practices in each region was not sufficient to meet the needs of the project. In order to address this, project leadership worked with national Kaiser to create buy-in for using a common instrument across the regions and then local IT built it within each region. In addition, a multi-tiered approach was developed to supplement the clinically collected PRO data at 4 project required time points (3, 6, 9 and 12 months). Two tiers were within the clinical system; secure email from EHR was sent with an attached survey followed by an automated Interactive Voice Recognition phone call. A follow-up phone call by research staff was necessary to maximize data collection at each time point. These follow-up calls were consistent with standard clinical practice of having medical assistant staff follow up with patients via phone calls.

While data linkage may appear to be a ‘pragmatic’ option for including data from multiple providers, there is still no guarantee that all patient outcome data can be accessed and included in the PCT. Linkage with other data sources fills in gaps only to the extent that those sources can provide complete data on care provided to the trial population throughout the given study time period. This may not always be true—especially for transient populations or areas where access and benefits for health care are variable and dynamic.

Endpoints

EHR data can be used to define outcome measure and endpoint definitions for pragmatic research. The AHRQ-sponsored Outcome Measurement Framework provides specification and guidance to support standard approaches for querying heterogeneous and highly granular EHR data for calculated and derived measures that can be used to measure patient outcomes and study endpoints (Leavy et al 2019).

Longitudinal Data Linkage

Compounding the difficulties described above, care systems are highly fragmented, and EHRs from one system will lack information about whether or not a patient had additional care outside a system’s EHR. In order to fully capture all the care that a patient receives—complete longitudinal data—linking research data to data from insurance claims or other real-world data sources may be necessary. For example, if the outcome being measured is myocardial infarction, the system’s EHR will only capture information about the event if the patient was treated for the myocardial infarction at a participating health system. If the patient travels and is treated in a different facility, then the only way to capture this information is through linkage with insurance claims or data from the treating health system. The United States still lacks a universal patient identifier (Carpenter and Chute 1993), which means efforts to link patients rely on techniques like privacy-preserving record linkage (PPRL) that can match patients based on encrypted combinations of patient identifiers (e.g., first name + last name + date of birth + current zip code) (Marsolo et al 2023). The application of PPRL methods in research is often referred to as “tokenization” because the methods generate encrypted tokens that are used to match patients across sources. In recent years, there have been vendors that provide tokenization services and allow trial data to be linked with real-world data sources like EHRs and claims. Patient consent is usually required and there are some governance considerations in accessing the linked sources, but it remains an option to potentially obtain additional information on participants.

Even with tokenization techniques, there are some issues to consider with linkage.  For instance, although claims data may indicate that care was provided for a given patient for a certain condition, it might not contain the necessary medical detail needed for a particular study (e.g., vitals readings such as blood pressure; lab results). Also, if a patient does not have or loses insurance coverage or has inadequate coverage, there might not even be a claims record available for their visits from the sources that are available to link.

  • Is there a need for vital statistics or mortality data from a state health department (e.g., birth data, death data, cancer repositories), or external claims data repositories?
  • Are data from organizations such as the state or insurers needed?

The challenge of getting claims data is described in the example below.

Case Example: Longitudinal Data Linkage

The Aspirin Dosing: A Patient-Centric Trial Assessing Benefits and Long-term Effectiveness (ADAPTABLE) study was a large, pragmatic trial designed to determine the optimal dose of maintenance aspirin for patients with coronary artery disease, and enrolled approximately 15,000 patients (Jones et al 2021). Patients were referred to the study by their physician and enrolled through an online portal, which also provided online consent and randomization. Each patient is given a unique identifier in the EHR, and data were collected during routine care.

However, to accurately assess outcomes—such as myocardial infarction, mortality, hospitalizations—and capture longitudinal information about all the care that each patient receives, claims data needed to be linked to study data.

How claims data were linked in ADAPTABLE depended on the type of insurer.

Centers for Medicare and Medicaid Services (CMS)

The patients on Medicare provided the last four digits of their Social Security Number, date of birth (DOB), and sex through the consent form; this information enabled linkage of the majority of participants to CMS beneficiary files. Without the explicit consent provided through the online portal, in order to link the CMS data to data in the EHR, health systems would have needed to allow access to a patient’s data of birth, Medicare ID, and sex; which is considered protected health information (PHI).

The ADAPTABLE investigators, with funding from PCORI, engaged two large, national insurers (Humana and Blue Cross Blue Shield Anthem) to support record linkage for participating members.

Some issues investigators had with linking to large insurers are described below.

  • Health insurance companies do not collect social security numbers, and the team needed both the group number and the member number to accurately identify individuals.
  • Some insurers required modification of consent language to include explicit authorization for the insurer to release certain records.
  • Large, national insurers often have subsidiaries—smaller groups with different names—although they all use the same umbrella company.

For patients with other types of insurance, the follow-up strategy is multi-tiered:

  • Participant was asked to login to the portal and enter information
  • If the patient did not go back to the portal, the call center at the Duke Clinical Research Institute called for follow up.

All of the above are complicated topics that require a set of multi-disciplinary experts with solutions that are customized to the specific trial (scientific question, study aims, and specific research settings). Because of these complexities, EHRs are rarely sufficient by themselves to support the needs of PCTs. Even though EHRs can be extremely valuable resources for clinical trials (and for some trials, EHRs may be essential), there are many steps and factors to consider, as described in previous sections.

SECTIONS

CHAPTER SECTIONS

REFERENCES

back to top

Carpenter PC, Chute CG. 1993. The Universal Patient Identifier: a discussion and proposal. Proc Annu Symp Comput Appl Med Care. 49–53. PMID: 8130521.

Dusetzina SB, Tyree S, Meyer A-M, Meyer A, Green L, Carpenter WR. 2014. Linking Data for Health Services Research: A Framework and Instructional Guide. (Prepared by the University of North Carolina at Chapel Hill under Contract No. 290-2010-000141.) AHRQ Publication No. 14-EHC033-EF. Rockville, MD: Agency for Healthcare Research and Quality. www.effectivehealthcare.ahrq.gov/reports/final.cfm. PMID: 25392892.

Jones WS, Mulder H, Wruck LM, et al. 2021. Comparative Effectiveness of Aspirin Dosing in Cardiovascular Disease. N Engl J Med. 384:1981–1990. doi: 10.1056/NEJMoa2102137. PMID: 33999548.

Leavy MB, Schur C, Kassamali FQ, et al. 2019. Development of Harmonized Outcome Measures for Use in Patient Registries and Clinical Practice: Methods and Lessons Learned. Agency for Healthcare Research and Quality (AHRQ). https://effectivehealthcare.ahrq.gov/topics/registry-of-patient-registries/standardized-library. Accessed March 5, 2022. doi:10.23970/AHRQEPCLIBRARYFINALREPORT.

Marsolo K, Kiernan D, Toh S, et al. 2023. Assessing the impact of privacy-preserving record linkage on record overlap and patient demographic and clinical characteristics in PCORnet®, the National Patient-Centered Clinical Research Network. J Am Med Inform Assoc. 30(3):447-455. doi: 10.1093/jamia/ocac229. PMID: 36451264.

Zigler CK, Adeyemi O, Boyd AD, et al. 2023 Dec. Collecting patient-reported outcome measures in the electronic health record: Lessons from the NIH pragmatic trials Collaboratory. Contemporary Clinical Trials. :107426. doi:10.1016/j.cct.2023.107426.


Version History

October 7, 2025: Updated text as part of annual review (changes made by K. Staman).

April 16, 2024: Added information from Zigler et al. paper (changes made by K. Staman).

August 26, 2022: Updated text as part of annual update (changes made by K. Staman).

July 3, 2020: Minor corrections to layout and formatting (changes made by D. Seils).

November 30, 2018: Updated text as part of annual update (changes made by K. Staman).

Published August 25, 2017

Additional Resources

Using Electronic Health Record Data in Pragmatic Clinical Trials


Section 10

Additional Resources

Resource Description

Real-World Data: Assessing Electronic Health Records and Medical Claims Data To Support Regulatory Decision-Making for Drug and Biological Products

This guidance is part of the FDA’s Real-World Evidence (RWE) program and applies to clinical studies that use real-world data (RWD) sources, such as information from routine clinical practice, to derive RWE. The purpose is to provide sponsors, researchers, and other interested stakeholders with 30 considerations when proposing to use EHRs or medical claims data in clinical studies to support a regulatory decision on effectiveness or safety.
The Observational Health Data Sciences and Informatics (or OHDSI, pronounced "Odyssey") program OHDSI is a multi-stakeholder, interdisciplinary collaborative designed to bring out the value of health data through large-scale analytics. This network is experienced in the use of clinical data for research and brings in-depth understanding of clinical and administrative data from many different organizations with rigorous observational research methods. They are developing and sharing tools for assessing data quality, transforming data to reference data standards, and for visualization and analysis of data distributed across many organizations. Findings and recommendations from a recent workshop describe the significant challenges and recommendations to enable the use of data from electronic health records and other non-traditional data sources to inform the development and evaluation of medications.

User’s Guide to Computable Phenotypes
This document provides a practical framework that will help physicians, clinical researchers and informaticians evaluate published phenotype algorithms for re-use in
various purposes. The framework is divided into three phases, aligned with expected user roles: overall assessment, clinical validation, and technical review.
Key Issues in
Extracting Usable Data from Electronic Health Records for Pragmatic Clinical Trials
A working document from the NIH Collaboratory Biostatistics/Study Design
Core.
Principles and Practice of Clinical Research This book provides input from experts at the NIH on the principles and practice of clinical research.
ADAPTABLE Tools for using patient-reported outcomes
ADAPTABLE Supplement Report: Patient-Reported Health Data and Metadata Standards in the ADAPTABLE Study Summary of patient-reported health data and metadata standards for the ADAPTABLE (Aspirin Dosing: A Patient-centric Trial Assessing Benefits and Long-Term Effectiveness) trial
LOINC ADAPTABLE patient-reported outcome set LOINC (Logical Observation Identifiers Names and Codes) provides re-usable standards clinical information in electronic reports.
Reference material for the patient-reported item set from ADAPTABLE in LOINC version 2.64 A GitHub repository for reference materials and slides that were used in the development of the ADAPTABLE item set
News
June 7, 2018 NIH Releases First Strategic Plan for Data Science
Grand Rounds
June 29, 2018 Policy & Priorities: Rethinking University Research with State Data (Aaron McKethan, PhD)
April 18, 2018 OHDSI: Drawing Reproducible Conclusions from Observational Clinical Data (George Hripcsak, MD, MS)
April 27, 2018 Expanding Use of Real-World Evidence: A National Academies Workshop Series (Greg Simon, MD)
March 23, 2018 Data Science in the Era of Data Ubiquity (Robert Califf, MD)
December 1, 2017 Providing a Shared Repository of Detailed Clinical Models for All of Health and Healthcare (Stanley Huff, MD)
October 20, 2017 Automated Public Health Surveillance Using Electronic Health Record Data (Michael Klompas, MD)
Podcasts
July 3, 2018 Policy & Priorities: Rethinking University Research with State Data (Aaron McKethan, PhD)
April 9, 2018 Data Science in the Era of Data Ubiquity (Robert Califf, MD)

 

SECTIONS

CHAPTER SECTIONS


Version History

October 7, 2025: Updated text as part of annual review (changes made by K. Staman).

August 26, 2022: Added resources as part of annual update (changes made by K. Staman).

July 3, 2020: Minor corrections to layout and formatting (changes made by D. Seils).

November 30, 2018: Added resources as part of annual update (changes made by K. Staman).

Published August 25, 2017

The Research Question Drives the Data Requirements

Using Electronic Health Record Data in Pragmatic Clinical Trials


Section 9

The Research Question Drives the Data Requirements

Using data from existing EHR systems (rather than prospectively collecting new data explicitly for the trial) generally entails trading control for efficiency. That loss of control often involves some sacrifices, which are sometimes acceptable to the research team and sometimes unacceptable, depending on the study and its aims. To determine what trade-offs are acceptable for a given study, the investigator must begin with a very clear idea of the specific information needs for the trial (e.g. assess baseline characteristics) and then assess how much study integrity would be lost by using existing records data rather than attempting to collect new data. If using EHR data will support the data requirements for the study in a way that ensures integrity of the data and subsequent results, then researchers should take steps to ensure that possible errors, biases, and variation can be identified and minimized. As with any research study, and to be consistent with Good Clinical Practices for research, investigators should clearly define and collect only the “minimum necessary” data for the trial, and conduct quality assessments for all of the data being used, usually at multiple timepoints (for more on data quality reporting recommendations. Ultimately, as in all research, the scientific research question provides the foundation that will guide all decisions and activities of PCT research design and implementation, including the appropriate and optimal use of EHR systems and data.

SECTIONS

CHAPTER SECTIONS


Version History

October 7, 2025: Updated text as part of annual review (changes made by K. Staman).

July 3, 2020: Minor corrections to layout and formatting (changes made by D. Seils).

Published August 25, 2017

Implementing and Monitoring the Delivery of an Intervention

Using Electronic Health Record Data in Pragmatic Clinical Trials


Section 7

Implementing and Monitoring the Delivery of an Intervention

In PCTs, EHR systems might be used as a part of the intervention and/or the data might be used to target or apply the intervention. For example, a best practice alert or a checklist could be embedded into the EHR system and evaluated as part of the PCT. Also, EHR data can support the important process of monitoring the fidelity of the intervention during the conduct of the trial. An on-going monitoring plan to ensure that data collection for the intervention is being captured is very important. In the case of multi-site research, investigators will need to determine if the intervention can be applied in a clinically equivalent way.

  • Is it possible to target and assess the intervention across multiple sites?

Because clinical systems will never be able to capture the data elements needed to answer all research questions, standardized approaches may be needed to augment EHR systems with additional data collection and research add-ons (that are equivalent across sites). Researchers can opt to add new data elements to existing EHRs systems with the caveat that organizations move slowly, and many stakeholders (outside of the PCT) will need to approve and support the addition. Other modes for data collection (laptops for web-based forms, etc.) can be considered as well as dedicated research information systems. Sampling might also be performed in order to identify and fill any gaps in data collection. Not only is it a challenge to get modified records designed to document implementation of an intervention into the EHR, but even when leadership allows this, line staff and providers may not actually use the new records to document their activities. For example, in PROVEN, because the documented rate of showing the videos in a partner facility was quite low, a more aggressive approach was undertaken to stimulate staff to show the video.  The approach revealed that, not infrequently, the video had been shown and patients’ advance directives were changed, but neither of these outcomes were documented in the EHR.

For PCTs, the act of randomization might indeed mitigate at least some of the variation in data quality and thereby reduce the impact of the different sources of error mentioned above. However, the assumption that with proper randomization, any and all confounders will be randomly distributed across groups or clusters is not always correct. Additionally, while PCTs by nature strive to collect as few new data as possible, it is often necessary to prospectively collect new process data on whether the individuals are in the control or intervention group, in order to ensure completeness and fidelity to the intervention. Researchers, informaticists, and healthcare administrators often must work together to support the collection of this type of process data that is critical for PCTs (Marsolo 2013; Richesson et al 2017). Further, process data can allow pragmatic trials to identify and adapt to external forces that threaten integrity of PCT design. The reporting of data related to where the study was done and characteristics of populations may also help readers assess the generalizability of the trial results to other populations (Kahn et al 2015; Zozus et al 2014).

SECTIONS

CHAPTER SECTIONS

REFERENCES

back to top

Kahn MG, Brown JS, Chun AT, et al. 2015. Transparent reporting of data quality in distributed data networks. EGEMS (Wash DC). 3:1052. doi:10.13063/2327-9214.1052. PMID: 25992385.

Marsolo K. 2013. Informatics and operations--let’s get integrated. J Am Med Inform Assoc. 20:122–124. doi:10.1136/amiajnl-2012-001194. PMID: 22940670.

Richesson RL, Green BB, Laws R, et al. 2017 Mar 14. Pragmatic (trial) informatics: a perspective from the NIH Health Care Systems Research Collaboratory. J Am Med Inform Assoc. doi:10.1093/jamia/ocx016. PMID: 28340241.

Zozus MN, Hammond WE, Green BB, et al. 2014. Assessing Data Quality for Healthcare Systems Data Used in Clinical Research. https://dcricollab.dcri.duke.edu/sites/NIHKR/KR/Assessing-data-quality_V1%200.pdf. Accessed Aug 14, 2017.


Version History

October 7, 2025: Updated text as part of annual review (changes made by K. Staman).

July 3, 2020: Minor corrections to layout and formatting (changes made by D. Seils).

November 30, 2018: Updated text as part of annual update (changes made by K. Staman).

Published August 25, 2017

Estimating and Identifying the Study Population and Assessing Baseline Prognostic Characteristics

Using Electronic Health Record Data in Pragmatic Clinical Trials


Section 6

Estimating and Identifying the Study Population and Assessing Baseline Prognostic Characteristics

Retrospective EHR data may be used during the planning of a pragmatic trial to estimate the number of eligible patients, to estimate rates of outcome events among those eligible, or to estimate clustering or intraclass correlation to inform power or sample size calculations. The same data may be used prospectively during trial recruitment or enrollment to identify eligible patients, assess baseline characteristics, or identify randomization clusters. For all of these activities—defining the study population and assessing baseline characteristics—investigators will need to know what fields are used in the EHR, what the sources of the data are, and why and how the data were collected. This information is also necessary for outcome data, and we discuss this in greater detail below. In particular, it is important to identify which biases are inherent in the data based upon their source. For example, healthcare data sets only include people that seek and receive medical care. There may be many with that condition that aren’t included in the data set because they do not have insurance or access to that health center for some reason (e.g., cancer center data is limited to patients who have a diagnosis and care plan with the center—people who are undiagnosed or have early diagnoses might not be included), or they simply might not have sought care during the study timeframe.

Health systems collect data for a number of different reasons: for clinical documentation of clinical events, payment/reimbursement, quality improvement, etc. The data may or may not be structured, and there may be differences in data collection workflows across sites. Many people assume that EHR data remain consistent over time, but this is rarely—if ever— the case. Things like EHR system upgrades, changing workflows and EHR interfaces, autonomy of clinicians for implementing different processes, the availability of charting and abstraction support, and organizational changes can affect data over time. These must all be accounted for in any PCT design. Coding support tools implemented for business purposes may influence recording of diagnoses and procedures, and these influences often change over time.

Case Example: Unstructured Data and Varying Sources of Data

The Trauma Survivors Outcomes and Support (TSOS) study, an NIH Pragmatic Trials Collaboratory study, was developed to coordinate care and improve outcomes for trauma survivors with post-traumatic stress syndrome (PTSD) and comorbidity and was conducted at 24 US trauma centers. The study used EHR data collected during the routine delivery of trauma care to identify injury patterns of enrolled participants. Clinical sites generally describe traumatic injuries in free text on admission, or pick one injury and use it to select an ICD code. On admission, this recorded EHR information may not accurately reflect the true burden of injury or psychiatric comorbidity that eventually will be diagnosed during the entire hospitalization. To improve completeness of data for TSOS, data from local trauma registries (using data definitions published by the National Trauma Data Bank) were used. These registry data are often manually abstracted from the EHR and entered to the trauma registry. For example, handwritten results that appear as scanned forms in the EHR are manually entered as structured text to the trauma registry. The study team for the TSOS study collected the trauma registry reports for enrolled patients and manually linked and reconciled these different trauma patient data with those data already collected at the time of admission. Patient identifiers permit positive linkage but—as is common in multisite IT projects—variations in site IT configurations and resources are best solved by manual work by the research team. Baseline data was collected in real time upon patient recruitment by the TSOS study team. Trauma registry data was collected months later from each of the 25 participating sites.

 

When considering using EHR data for a trial, researchers should ask a series of questions about the patients and health system features that may affect the completeness or quality of the documentation and data collection for each EHR data source under consideration.

  • What patients are included and excluded from the data source?
  • Is bias introduced by the study of specific populations (e.g., insured vs uninsured)?
  • Are any of the needed data derived or calculated from other data? If so, who does this and at what point in the data lifecycle does this happen?
  • Are there standardized data collection and documentation practices or are clinicians or clinical sites allowed the freedom to implement their own processes?
  • Are the data structured or unstructured?
  • Are the data of interest captured or generated in multiple places in the EHR? If so, which source is best to address the study question?

Health system processes or financial incentives can influence how care events are represented in EHRs or claims. Such incentives may affect completeness and possible confounding, as there may be contextual factors depending on health care system priorities. Some examples include:

  • Payors that require certain data be collected for reimbursement: Federally Qualified Health Center (FQHC) clinics are required by the Health Services and Resources Administration (HRSA) to collect various elements such as Federal Poverty Level and the patient’s primary language.
  • Some research or quality improvement projects may encourage that certain data be collected while the funding is available, leading to high quality data, but when the funding expires, the data may then be of lesser quality.
  • In capitated financing arrangements, such as Medicare Advantage, risk-adjustment policies may create financial incentives to identify and record particular diagnoses.
  • In fee-for-service financing arrangements, financial incentives may increase recording of particular diagnoses, procedures or services.
  • Provider productivity incentives or performance improvement initiatives might incentivize use of some procedure codes over others.

These processes or incentives can vary across health systems or across time within a health system. For PCTs, it is particularly important to identify incentives or business processes that might influence the actual care (e.g., formulary policies encourage prescription of one drug vs another) and those that might influence the recording or representation of care (e.g., risk-adjusted reimbursement favoring one depression diagnosis over another).

Integrating Data from Heterogeneous Systems

If the planned study is a multi-site trial, then the investigator must consider if clinically “equivalent” populations can be identified from multiple sites. What assessments or validation plan can be used to ensure that sample populations at each site are clinically equivalent? How much heterogeneity is there between sites? To answer these questions, a researcher can compare summary data (e.g., counts, distributions) and examine clinical workflows across sites. This information about workflows can help explain issues with data quality and completeness, or even proactively alert a researcher about impending unexpected data issues. Ideally, sources of heterogeneity across study sites (clinics, health plans, etc.) should be explored both quantitatively (by comparing relevant indicators across sites during the planning phase) and qualitatively (by interviewing clinicians and managers regarding workflows and incentives).

Investigators will also need to effectively communicate and coordinate IT with the business/healthcare organizations in multi-site trials. With embedded research and PCTs, researchers must engage stakeholders to work within systems that were built to optimize clinical operation, not research. There are socio-political challenges to obtaining, evaluating, and interpreting clinical data in PCTs. In fact, the governance issues around linkage of multiple data sources often prove to be much more problematic than the technical approaches to the linkage itself.

Case Example: Integrating Data From Heterogeneous Systems

The Pragmatic Trial of Video Education in Nursing Homes PROVEN trial (NCT02612688) was conducted to determine if showing advanced care planning videos in nursing homes affects the rates of hospitalization. PROVEN had two health system partners. While investigators benefit from both using the federally mandated minimum data sets (MDS) for assessments of nursing home residents on Medicare and Medicaid, the sites used different EHR vendors and so had different non-mandatory assessment modules that covered things like physician orders, the electronic Medication Administration Record (eMAR), nursing notes, transfer notes, social service records, etc. While this would have presented a problem if investigators had been seeking to use data from one of these non-systematic sources as an outcome or independent variable, there was also the problem of inconsistent or incomplete implementation of the various modules in an EHR within a health care system. For example, one nursing home partner had implemented the physician order—where the advance directives are located—module in its EHR several years ago. However, not all facilities that were supposed to have this module adopted had actually implemented it. This was determined after a deliberate facility-by-facility review of data completeness. Thus, before using the data from any facility, investigators first had to determine, based upon the degree of use, whether a facility was using the record at all and then, based upon the dates on which completed records were indicated, when they started to use the record. In the end, only a minority of the sites had what appeared to be useable data on advance directives.

To solve the missing data problem and include the other facilities in the trial, investigators used an earlier version of the mandatory nursing home resident assessment that included data fields in which residents' "code status" was noted. Although, the current MDS 3.0 no longer had this information because it wasn't necessarily updated as a clinically meaningful field, one of the Health Care System partners had incorporated physician order sets into their EHR, but only about 1/3 of their facilities have instituted this feature into their EHR. From the point of view of the PROVEN pragmatic trial, investigators had about 30 intervention facilities with Advanced Directive in the physician order set and some 60 control facilities. Since the number of facilities with this data feature was insufficient for our overall study, they couldn't restrict the study to only these. Nonetheless, they were able to compare the effect of the intervention on the adoption of advanced directives like do-not-resuscitate (DNR) or do-not-hospitalize (DNH) in this sub-set of facilities.

Another example of heterogeneous data sources for PROVEN relates to the use of the CMS Virtual Research Data Center (VRDC) for monitoring mortality and hospital transfers. However, the data on hospital transfers, the primary outcome, were only available for those who were fee-for-service because Medicare Advantage encounter data were not available in the VRDC.  This is another example of how only a subset (75% to 80% of the population) had the relevant data for outcomes from the primary data source.

SECTIONS

CHAPTER SECTIONS


Version History

October 7, 2025: Updated text as part of annual review (changes made by K. Staman).

August 26, 2022: Updated text as part of annual update (changes made by K. Staman).

July 3, 2020: Minor corrections to layout and formatting (changes made by D. Seils).

November 30, 2018: Updated text as part of annual update (changes made by K. Staman).

Published August 25, 2017