ARCHIVED PAGE

Archived on October 6, 2025. Go to the latest version.

Acquiring Real-World Data

Section 7 Gaining Permission to Use Real-World Data – ARCHIVED

Contributors

In the United States, using patient data for research generally requires institutional review board (IRB) approval if the study is a clinical investigation that supports applications for research or marketing permits for products regulated by the US Food and Drug Administration (21 CFR Parts 50 and 56) or, more broadly, research involving human subjects conducted, supported, or otherwise subject to regulation by any federal department or agency (45 CFR part 46 [the Common Rule]). Recent changes to the Common Rule have made it easier to reuse data collected as part of routine healthcare operations for research purposes, including electronic health record (EHR) data, with additional categories of studies now exempt from IRB oversight. However, to publish research involving data from human subjects, virtually all peer-reviewed journals require that the study must have been reviewed and approved by an IRB or ethics board or have received a determination that the research is exempt from oversight or is not human subjects research (Zozus et al 2015). In the latter case, investigators are typically not able to make the determination on their own, so investigators are advised to locate an appropriate IRB or ethics board before embarking on research using patient data, even if their institution does not require it.

Data From Healthcare Organizations

Most healthcare organizations have procedures in place that define the permissible internal uses of the data they collect and store. Routine uses typically fall into the categories of treatment, payment, or operations. Examples include data access for members of the care team, information exchange for care transitions, data use in quality improvement projects, and administrative reporting for organizational management. Facilities that conduct research also have procedures in place for secondary use of these health data. Secondary use of data for research is governed by federal regulations and by procedures established by the facility's IRB or research compliance office.

Additional contractual agreements and regulatory compliance are required when investigators want to use data from institutions with which they are not directly associated (for example, a university researcher who wants to use data from local community hospitals). This will almost certainly be the case when using healthcare data as part of a multicenter pragmatic clinical trial. The Health Insurance Portability and Accountability Act (HIPAA) requires that covered entities and their business associates release protected health information (PHI) only in certain controlled situations, including release to healthcare reimbursement departments or operations, to individual patients, to regulatory authorities, for national priority purposes, with authorization from the individual, and as a limited dataset. The last 2 mechanisms are the ones primarily used when creating datasets for research that contain PHI.

Covered entities include health plans, healthcare clearinghouses, and healthcare providers who electronically transmit any health information in connection with transactions for which the US Department of Health and Human Services (DHHS) has adopted standards.

Under HIPAA, research datasets are considered either deidentified, a limited dataset, or a dataset containing more protected health information (PHI) than allowed in a limited dataset. Of the 3 types, deidentified datasets come with the fewest restrictions. Research using deidentified data is not considered to be human subjects research, and these datasets can often be obtained without additional usage agreements. There are 2 approaches that DHHS has outlined for covered entities to follow in creating deidentified datasets in accordance with HIPAA: Safe Harbor or the Expert Determination Method.

The Safe Harbor method involves the removal of all 18 types of HIPAA identifiers before the dataset is shared with an outside party. The Expert Determination Method involves verifying that the resulting dataset is statistically deidentified. Due to its simplicity, most deidentified datasets have historically been produced using the Safe Harbor approach. The Expert Determination Method is often used with projects that involve privacy-preserving record linkage. A dataset containing a series of hashed, encrypted PHI along with other deidentified clinical variables is demonstrated to be statistically deidentified. Multiple centers can generate similar datasets with the same privacy-preserving methods and link matching records to generate a longitudinal view of a patient's history. Given the growth in the use of real-world data for research and analytics in the life science and healthcare industries, however, the use of the Expert Determination Method has increased, and several companies now provide privacy frameworks and other services that can be used to attest that a dataset is deidentified.

A limited dataset contains more identifiers than a deidentified dataset, including dates and elements of a patient's address (such as zip code and state), but the other HIPAA identifiers must still be removed or masked. To receive a dataset that contains more PHI than is allowable in a limited dataset, it is almost always required that investigators obtain patient consent. For both limited datasets and datasets containing more PHI than allowed in a limited dataset, the recipient of the data must agree to a data use agreement (DUA) in which the purpose of the research and proposed uses for the data are described. The DUA will also include language securing the data, and in most cases will prohibit re-identification of patients or linking to other data from the patients (if patient consent is obtained and a project explicitly involves linkage, this language would not apply). Thus, use of healthcare data from organizations requires both a contractual agreement with the organization, as well as compliance with the Health Insurance Portability and Accountability Act (HIPAA) with respect to use and disclosure of the data. Recipients of deidentified datasets may also need to agree to a DUA, though with fewer restrictions (for example, language limited to securing the data and not attempting to reidentify patients).

Safe Harbor: A method of de-identifying health information that involves removing eighteen identifiers from the data before sharing them with an outside party. The identifiers include name, name, address, Social Security number, phone and fax numbers, email addresses, biometric information, and other individually unique information (45 CFR 164.514(b)(2); Guidance on the Safe Harbor Method).

Data use agreement: A DUA is a contractual document for the transfer of PHI that describes the purposes for which the data can be used and prohibits re-identification (45 CFR 164.514).

Deidentified datasets are most appropriate for retrospective, observational studies, though the lack of identifiers like dates of service and date of death can prove problematic for certain analyses. In order to limit risk, many healthcare organizations will try to ensure that the data released are the “minimum necessary” to support a project. As a result, even in prospective studies where investigators have obtained patient consent, organizations may not wish to release more PHI than necessary.

When approaching a healthcare organization for a DUA, a prospective researcher should be prepared to provide a detailed, precise statement of what data elements are required, from what sources, and over what time period. In addition, the investigator must describe how the data will be used and transferred securely to the investigator and provide a list of all personnel who will be permitted to use the information. Timelines for working out DUAs between stakeholders at healthcare facilities and external investigators vary greatly; in our experience, intervals range from 6 months to more than 2 years.

Data From Patients

Obtaining healthcare data directly from patients is more straightforward from a regulatory perspective because the HIPAA regulations that pertain to covered entities do not apply. While it is necessary to obtain patient permission, additional DUAs are typically not required. The logistics of obtaining the data may be more complicated, as data must be brokered through each patient (see the Methods of Access section of this chapter), but the regulatory process is much simpler.

Data From Other Sources

Datasets managed by government agencies (such the US Census Bureau or the Environmental Protection Agency) will often have similar restrictions to those of healthcare organizations. IRB approval and DUAs may be required, and there may be requirements that any future publication cite the organization that provided the data. Similar processes may also be in place for groups that manage product, device, or disease registries.

SECTIONS

CHAPTER SECTIONS

sections

Resources

Using Electronic Health Record Data in Pragmatic Clinical Trials
Living Textbook chapter from the NIH Pragmatic Trials Collaboratory's Electronic Health Records Core

REFERENCES

Zozus MN, Richesson RL, Hammond WE, Simon GE. 2015. Acquiring and Using Electronic Health Record Data. NIH Collaboratory Electronic Health Records Core. https://dcricollab.dcri.duke.edu/sites/NIHKR/KR/Acquiring%20and%20Using%20Electronic%20Health%20Record%20Data.pdf. Accessed August 21, 2020.

Version History

July 14, 2025: Updated resources (changes made by G. Uhlenbrauck).

October 14, 2022: Made nonsubstantive changes to the text, added Seils as a contributing editor, and reordered the section within the chapter as part of the annual content update (changes made by D. Seils).

Published August 25, 2020

COVID-19 Resources

COVID-19 Resources

Rethinking Clinical Trials

A Living Textbook of Pragmatic Clinical Trials

Gaining Permission to Use Real-World Data – ARCHIVED

ARCHIVED PAGE

Acquiring Real-World Data

Section 7

Gaining Permission to Use Real-World Data – ARCHIVED

Data From Healthcare Organizations

Data From Patients

Data From Other Sources

SECTIONS

sections

Resources

REFERENCES

current section :

Gaining Permission to Use Real-World Data – ARCHIVED

Citation: