Skip to content

COVID-19 Resources

Access the latest information on COVID-19 for clinical researchers
  • Home
  • About
    • NIH Collaboratory
      • Coordinating Center
      • NIH Collaboratory Trials
      • Core Working Groups
      • Steering Committee
      • Distributed Research Network
      • Our Impact
    • Living Textbook
      • Table of Contents
      • How to Use This Site
  • Resources
    • Data and Resource Sharing
    • Training Resources
    • Tools for Researchers
    • Publications
    • Knowledge Repository
  • Webinar
  • Podcast
  • News
    • News Feed
    • Calendar
    • Subscribe
return to home
Subscribe to Newsletter go to twitter feed go to linkedin go to blue sky feed
Search
NIH Collaboratory
Living Textbook of
Pragmatic Clinical Trials

COVID-19 Resources

Access the latest information on COVID-19 for clinical researchers
home button

Rethinking Clinical Trials

A Living Textbook of Pragmatic Clinical Trials

  • Design
    • What is a Pragmatic Clinical Trial?
    • Decentralized Pragmatic Clinical Trials
    • Developing a Compelling Grant Application
    • Experimental Designs and Randomization Schemes
    • Endpoints and Outcomes
    • Analysis Plan
    • Using Electronic Health Record Data
    • Building Partnerships and Teams to Ensure a Successful Trial
    • Intervention Delivery and Complexity
    • Patient Engagement
  • Data, Tools & Conduct
    • Assessing Feasibility
    • Acquiring Real-World Data
    • Assessing Fitness-for-Use of Real-World Data
    • Study Startup
    • Participant Recruitment
    • Monitoring Intervention Fidelity and Adaptations
    • Patient-Reported Outcomes
    • Clinical Decision Support
    • Mobile Health
    • Electronic Health Records–Based Phenotyping
    • Navigating the Unknown
  • Dissemination & Implementation
    • Data Sharing and Embedded Research
    • Dissemination Approaches for Different Audiences
    • Implementation
    • End-of-Trial Decision-Making
  • Ethics & Regulatory
    • Privacy Considerations
    • Identifying Those Engaged in Research
    • Collateral Findings
    • Consent, Disclosure, and Non-Disclosure
    • Data and Safety Monitoring
    • Ethical Considerations of Data Sharing in Pragmatic Clinical Trials
    • Ethics for AI and ML
    • IRB Responsibilities and Procedures

Gaining Permission to Use Real-World Data

CHAPTER SECTIONS

Acquiring Real-World Data


Section 7

Gaining Permission to Use Real-World Data

Expand Contributors

Keith A. Marsolo, PhD
Rachel Richesson, MS, PhD, MPH
W. Edward Hammond, PhD
Michelle Smerek, BS
Lesley Curtis, PhD
Luke Gelinas, PhD
Pearl O’Rourke, MD

Contributing Editors

Karen Staman, MS
Damon M. Seils, MA

In the United States, using patient data for research generally requires institutional review board (IRB) approval if the study is a clinical investigation that supports applications for research or marketing permits for products regulated by the US Food and Drug Administration (21 CFR Parts 50 and 56) or, more broadly, research involving human subjects conducted, supported, or otherwise subject to regulation by any federal department or agency (45 CFR Part 46 [the Common Rule]).

In addition, use of patient data for research is generally subject to federal and state privacy laws, notably the Health Insurance Portability and Accountability Act (HIPAA), which limits access to identifiable or protected health information (PHI) outside the context of treatment, payment, or healthcare operations. Use or disclosure of PHI for research generally requires either patient authorization or a waiver of the requirement for authorization approved by a privacy board.

Recent changes to the Common Rule have made it easier to reuse data collected as part of routine healthcare operations for research purposes, including EHR data, with additional categories of studies now exempt from IRB oversight. However, to publish research involving data from human subjects, virtually all peer-reviewed journals require that the study must have been reviewed and approved by an IRB or ethics board or have received a determination that the research is exempt from oversight or is not human subjects research (Zozus et al 2015). In all of these cases, investigators are advised to locate an appropriate IRB or ethics board before embarking on research using patient data, even if their institution does not require it.

Data From Healthcare Organizations

Most healthcare organizations have procedures in place that define the permissible internal uses of the data they collect and store. These routine uses typically fall into the categories of treatment, payment, or operations, which are consistent with HIPAA regulations. Examples include data access for members of the care team, information exchange for care transitions, data use in quality improvement projects, and administrative reporting for organizational management. Facilities that conduct research also have procedures in place for secondary use of these health data. Secondary use of data for research is governed by federal regulations, privacy laws, and procedures established by the facility's IRB, privacy board, or research compliance office. Research uses of internal PHI typically fall under HIPAA privacy protections, requiring either patient authorization or a partial or full waiver of HIPAA protections, which may be granted by an IRB or privacy board under certain conditions.

Additional contractual agreements and regulatory compliance are required when investigators want to use data from institutions with which they are not directly associated (for example, a university researcher who wants to use data from local community hospitals). This will almost certainly be the case when using healthcare data as part of a multicenter pragmatic clinical trial. It is important to understand the requirements of HIPAA, which is the relevant regulation for such data disclosures. HIPAA applies to all covered entities, defined as health plans, healthcare clearinghouses, and healthcare providers who electronically transmit any health information in connection with transactions for which the US Department of Health and Human Services (DHHS) has adopted standards.

HIPAA addresses both the internal use and the external disclosure of PHI. Disclosure of PHI, defined as sharing outside of the covered entity, is allowed without patient authorization only in certain controlled situations, including release to healthcare reimbursement or operations departments, individual patients, or regulatory authorities and for national priority purposes. Disclosure of PHI for research and some other purposes requires either an authorization from each individual, an approved waiver of authorization, or the creation of a limited dataset (discussed below). To limit risk, many healthcare organizations will try to ensure that the data released are the “minimum necessary” to support a project. As a result, even in prospective studies for which investigators obtain patient consent and authorization, organizations may not wish to release more PHI than necessary.

A limited dataset, which is considered PHI, may contain identifiers including certain dates (for example, dates of hospital admission or discharge), gender, age and elements of a patient's address (such as zip code, city, and state), but may not contain other more “direct” HIPAA identifiers, such as name , telephone number, or street address (National Institutes of Health 2003). For both limited datasets and datasets containing more PHI than allowed in a limited dataset, the recipient of the data must execute a data use agreement (DUA), which is a contractual arrangement for the transfer of PHI that describes the purposes for which the data can be used and prohibits reidentification (45 CFR 164.514). The DUA will also include language securing the data by specifying limits on its use and sharing, and in most cases will prohibit reidentification of patients or linking to other data from the patients. (If patient consent is obtained and a project explicitly involves linkage, this language would not apply.)

When approaching a healthcare organization for a DUA, a prospective researcher should be prepared to provide a detailed, precise statement of what data elements are required, from what sources, and over what period. In addition, the investigator must describe how the data will be used and transferred securely and provide a list of all external personnel who will be permitted to use the information. Timelines for working out DUAs between stakeholders at healthcare facilities and external investigators vary greatly; in our experience, intervals range from 6 months to more than 2 years.

Of note, research with deidentified data is not considered to be research with human subjects and is not covered by HIPAA limitations.

HIPAA presents 2 approaches for covered entities to follow in creating deidentified datasets: Safe Harbor or Expert Determination.

The Safe Harbor method requires the removal of all 18 HIPAA identifiers and a determination by the entity sharing the data that they do “not have actual knowledge that the information could be used alone or in combination with other information to identify an individual who is a subject of the information.” The identifiers include name, dates, address, Social Security number, phone and fax numbers, email addresses, biometric information, and other individually unique information (45 CFR 164.514(b)(2); Guidance on the Safe Harbor Method).

The Expert Determination method, by contrast, involves a determination by a qualified individual that the risk of reidentifying individuals from the dataset, either alone or in combination with other “reasonably available information,” is not more than “very small.” This approach can allow identifiers such as dates to be retained in the dataset, though certain variables may need to go through a transformation to minimize the risk of reidentification. Examples of such transformations include date shifting (shifting all dates by an offset, with a different random offset used for each patient), generalizing zip codes, and suppressing patient ages (DHHS 2012). Due to its simplicity, most deidentified datasets have historically been produced using the Safe Harbor approach. The use of the Expert Determination method has gained favor in recent years as health systems and others within the healthcare industry have begun to license EHR datasets, aggregate and link them, and make deidentified versions of record-level data available for use by researchers and others (Noel and Bartelt 2023; Truveta 2025).

Deidentified datasets are most appropriate for retrospective, observational studies, though the lack of identifiers such as dates of service and dates of death can prove problematic for certain analyses.

Data From Other Sources

Datasets managed by government agencies (such as the Centers for Medicare & Medicaid Services, the US Census Bureau, or the Environmental Protection Agency) will often have similar restrictions to those of healthcare organizations. IRB approval and DUAs may be required, and there may be requirements that any future publication cite the organization that provided the data. Similar processes may also be in place for groups that manage product, device, or disease registries.

Previous Section Next Section

SECTIONS

CHAPTER SECTIONS

sections

  1. Introduction
  2. Common Real-World Data Sources
  3. Data Formats
  4. Acquiring Electronic Health Record Data
  5. Acquiring Claims Data and CMS Research-Identifiable Files
  6. Acquiring Patient-Reported Data
  7. Gaining Permission to Use Real-World Data
  8. Methods of Access
  9. Case Study: The IMPACT-AFib Trial

Resources

Using Electronic Health Record Data in Pragmatic Clinical Trials
Living Textbook chapter from the NIH Pragmatic Trials Collaboratory's Electronic Health Records Core

REFERENCES

back to top

US Department of Health and Human Services (DHHS). 2012. Guidance Regarding Methods for De-identification of Protected Health Information in Accordance with the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule. https://www.hhs.gov/hipaa/for-professionals/special-topics/de-identification/index.html. Accessed September 20, 2025.

National Institutes of Health. 2003. How can covered entities use and disclose protected health information for research and comply with the Privacy Rule? In: Protecting Personal Health Information in Research: Understanding the HIPAA Privacy Rule. NIH Publication Number 03-5388. https://privacyruleandresearch.nih.gov/pdf/HIPAA_Privacy_Rule_Booklet.pdf. Accessed September 24, 2025.

back to top

Noel A, Bartelt K. 2023. Cosmos: Real-world data powered by the healthcare community. J Soc Clin Data Manag. 3(S1):1-4. doi: 10.47912/jscdm.246.

Truveta. 2025. Truveta Data. https://www.truveta.com/truveta-data. Accessed September 20, 2025.

Zozus MN, Richesson RL, Hammond WE, Simon GE. 2015. Acquiring and Using Electronic Health Record Data. NIH Collaboratory Electronic Health Records Core. https://dcricollab.dcri.duke.edu/sites/NIHKR/KR/Acquiring%20and%20Using%20Electronic%20Health%20Record%20Data.pdf. Accessed September 24, 2025.


Version History

Published October 6, 2025

current section :

Gaining Permission to Use Real-World Data

  1. Introduction
  2. Common Real-World Data Sources
  3. Data Formats
  4. Acquiring Electronic Health Record Data
  5. Acquiring Claims Data and CMS Research-Identifiable Files
  6. Acquiring Patient-Reported Data
  7. Gaining Permission to Use Real-World Data
  8. Methods of Access
  9. Case Study: The IMPACT-AFib Trial

Citation:

Marsolo KA, Richesson R, Hammond WE, Smerek M, Curtis L, Gelinas L, O'Rourke P. Acquiring Real-World Data: Gaining Permission to Use Real-World Data. In: Rethinking Clinical Trials: A Living Textbook of Pragmatic Clinical Trials. Bethesda, MD: NIH Pragmatic Trials Collaboratory. Available at: https://rethinkingclinicaltrials.org/chapters/conduct/acquiring-real-world-data/gaining-permission-to-use-real-world-data-v2/. Updated December 3, 2025. DOI: 10.28929/285.

Footer Menu

  • How to Use This Site
  • About NIH Collaboratory
  • Enrollment Reporting
  • Grand Rounds
  • Funding Statement
Link to Twitter Link to LinkedIn Link to Blue Sky Link to NIH Collaboratory email

Reference in this Web site to any specific commercial products, process, service, manufacturer, or company does not constitute its endorsement or recommendation by the U.S. Government or National Institutes of Health (NIH). NIH is not responsible for the contents of any “off-site” Web page referenced from this server.

Log in
Privacy Statement
WordPress is a content management system and should not be used to upload any PHI as it is not an environment for which we exercise oversight, meaning you the author are responsible for the content you post. Please use this system accordingly. Site Map