Skip to content

COVID-19 Resources

Access the latest information on COVID-19 for clinical researchers
  • Home
  • About
    • NIH Collaboratory
      • Coordinating Center
      • NIH Collaboratory Trials
      • Core Working Groups
      • Steering Committee
      • Distributed Research Network
      • Our Impact
    • Living Textbook
      • Table of Contents
      • How to Use This Site
  • Resources
    • Data and Resource Sharing
    • Training Resources
    • Tools for Researchers
    • Publications
    • Knowledge Repository
  • Webinar
  • Podcast
  • News
    • News Feed
    • Calendar
    • Subscribe
return to home
Subscribe to Newsletter go to twitter feed go to linkedin go to blue sky feed
Search
NIH Collaboratory
Living Textbook of
Pragmatic Clinical Trials

COVID-19 Resources

Access the latest information on COVID-19 for clinical researchers
home button

Rethinking Clinical Trials

A Living Textbook of Pragmatic Clinical Trials

  • Design
    • What is a Pragmatic Clinical Trial?
    • Decentralized Pragmatic Clinical Trials
    • Developing a Compelling Grant Application
    • Experimental Designs and Randomization Schemes
    • Endpoints and Outcomes
    • Analysis Plan
    • Using Electronic Health Record Data
    • Building Partnerships and Teams to Ensure a Successful Trial
    • Intervention Delivery and Complexity
    • Patient Engagement
  • Data, Tools & Conduct
    • Assessing Feasibility
    • Acquiring Real-World Data
    • Assessing Fitness-for-Use of Real-World Data
    • Study Startup
    • Participant Recruitment
    • Monitoring Intervention Fidelity and Adaptations
    • Patient-Reported Outcomes
    • Clinical Decision Support
    • Mobile Health
    • Electronic Health Records–Based Phenotyping
    • Navigating the Unknown
  • Dissemination & Implementation
    • Data Sharing and Embedded Research
    • Dissemination Approaches for Different Audiences
    • Implementation
    • End-of-Trial Decision-Making
  • Ethics & Regulatory
    • Privacy Considerations
    • Identifying Those Engaged in Research
    • Collateral Findings
    • Consent, Disclosure, and Non-Disclosure
    • Data and Safety Monitoring
    • Ethical Considerations of Data Sharing in Pragmatic Clinical Trials
    • Ethics for AI and ML
    • IRB Responsibilities and Procedures

Acquiring Claims Data and CMS Research-Identifiable Files

CHAPTER SECTIONS

Acquiring Real-World Data


Section 5

Acquiring Claims Data and CMS Research-Identifiable Files

Expand Contributors

Eric L. Eisenstein, DBA
Kevin J. Anstrom, PhD
Meredith Zozus, PhD
Davera Gabriel, RN
Keith A. Marsolo, PhD
Bradley G. Hammill, PhD
Miguel Vazquez, MD
Lesley H. Curtis, PhD

Contributing Editor
Karen Staman, MS
Damon M. Seils, MA

Administrative claims are another secondary data source for collecting event information from healthcare systems. PCTs embedded in health insurance plans can be used to answer specific questions, such as whether an intervention is effective in different geographical locations, populations, and multiple complex organizations (Cocoros et al. 2023). These trials are best suited to studies that require large sample sizes. Health insurance data can be used to identify eligible individuals, facilitate patient and provider contact, and/or analyze the study outcomes.

“There are unique opportunities related to the design and conduct of pragmatic trials embedded in health insurance plans, which have longitudinal data on member/patient demographics, dates of coverage, and reimbursed medical care, including prescription drug dispensings, vaccine administrations, behavioral healthcare encounters, and some laboratory results.” (Cocoros et al. 2023)

Although trials embedded in health insurance plans hold the potential to generate evidence to improve care and population health, there are special challenges that must be considered in the planning, implementation, and analytic phases (Cocoros et al. 2023). Important logistical challenges require careful planning, including planning for timing (plan enrollment and disenrollment is typically at the beginning and end of a calendar year), lag time for data availability, and engagement of staff from health plans and providers. The intervention itself must also be fairly simple, as interventions will be disseminated through health plans.

In addition, the accuracy of billed diagnoses in identifying potential events has been shown to be less reliable than physician-adjudicated events (Guimarães et al 2017). For example, in the Treatment With Adenosine Diphosphate Receptor Inhibitors: Longitudinal Assessment of Treatment Patterns and Events After Acute Coronary Syndrome (TRANSLATE-ACS) trial, investigators compared the 1-year incidence of events after acute myocardial infarction as identified by medical claims or physician adjudication. They found modest agreement for myocardial infarction and stroke and poor agreement for bleeding (Guimarães et al 2017).

There are several sources of claims data that can be used for research:

  • Medicare claims data, which include data from Medicare beneficiaries who enroll in the traditional fee-for-service Medicare program (and do not include data from patients who enroll in Medicare Advantage plans).
  • Medicare Advantage data, which include both Part A and Part B claims but are provided by private insurance companies and, therefore, are not included in the data sources described below.
  • Claims from participants enrolled in Medicaid or the Children’s Health Insurance Program (CHIP).
  • Collected bills from private insurance companies. For example, in the ADAPTABLE study (Aspirin Dosing: A Patient-Centric Trial Assessing Benefits and Long-term Effectiveness), investigators engaged with 2 large, national insurance companies to support record linkage for participating members. (See the Using Electronic Health Record Data chapter of the Living Textbook for more information.)
  • Collected bills from a patient’s inpatient care facilities. (See the TRANSLATE-ACS case study.)

It is important to note that, for the data sources described below, patients provide consent or otherwise authorize data to be provided to a study. This is different from some workflows in which EHR data are used for research, which may not require direct consent from the patient.

Research-Identifiable Files

The traditional method for obtaining Centers for Medicare & Medicaid Services (CMS) data for research is a formal request process and shipment of data files. Data can be requested from the Medicare program, which covers 95% of people aged 65 years and older, or the Medicaid program, which covers children from low-income families, pregnant women, people with disabilities, and some elderly and nonelderly adults, although coverage differs by state, since Medicaid programs are state-run. Researchers can request the public-use data set, a limited data set with deidentified data, or a research-identifiable file with individual-level data. For use in trial follow-up, research-identifiable files are the only option; to obtain research-identifiable files for individuals, an investigator must obtain protected health information, such as Social Security number and date of birth, and send it to the CMS data distributor for linking. This adds difficulty to the process. Although these data are well curated, gaining access to the data can be expensive, it can be time consuming to go through the CMS request process, and data latency can be an issue (Marsolo 2019).

Application Programming Interfaces

As with EHRs, many administrative claims sources provide the ability for participants to obtain copies of their data. Blue Button was created by the US Department of Health and Human Services as an online tool that allows patients to view, print, and download their medical records (Turvey et al 2014) and was intended to help with coordination of care. Blue Button is available on the patient portal for Medicare beneficiaries (MyMedicare.gov), for veterans (MyHealtheVet), and on the patient portals of those practices and clinics that choose to use it. Medicare beneficiaries can download 3 years of claims data, and veterans can download “demographic information (age, gender, ethnicity and more), emergency contacts, a list of their prescription medications, clinical notes, and wellness reminders.” (From https://www.healthit.gov/topic/health-it-initiatives/blue-button). With Blue Button, the patient provides the data for research; however, the completeness of the data varies by site and EHR. The document is a structured, text-based document (an XML file) and needs to be parsed through an app to be used for research (such as Hugo), and a patient would need to request a file from each site where they receive care (Marsolo 2019).

The CMS Blue Button 2.0 application programming interface (API) enables Medicare beneficiaries to authorize third parties to obtain and use their Part A, Part B, and Part D claims data directly from CMS (as opposed to through the Blue Button patient portal), for coordinating care, services, and research. It uses the FHIR-standard API. A Final Rule from the ONC and CMS mandates that CMS-regulated payers make claims available via FHIR.

Collected Participant Bills

Explanatory trial economic and quality-of-life studies frequently collect and abstract participant bills from study sites. This process is expensive and requires specially trained individuals with expertise in hospital billing and accounting systems. Hence, it would not be preferred for a large pragmatic trial, though it may be necessary if the relevant information cannot be obtained in another way.

SECTIONS

CHAPTER SECTIONS

sections

  1. Introduction
  2. Common Real-World Data Sources
  3. Data Formats
  4. Acquiring Electronic Health Record Data
  5. Acquiring Claims Data and CMS Research-Identifiable Files
  6. Acquiring Patient-Reported Data
  7. Gaining Permission to Use Real-World Data
  8. Methods of Access
  9. Case Study: The IMPACT-AFib Trial

Resources

Data Infrastructure for Implementing the PROVEN Trial
NIH Collaboratory EHR Workshop video module. Vince Mor of Brown University describes the PROVEN trial’s use of electronic health records linked with Medicare claims to measure outcomes of a nursing home–based intervention.


Screenshot of Grand Rounds presentation
PCORnet COVID-19 Common Data Model Design and Results
NIH Pragmatic Trials Collaboratory PCT Grand Rounds; June 5, 2020

REFERENCES

back to top

Beckman AL, Gupta S. Empowering people with their healthcare data: an Interview with Harlan Krumholz. Healthc (Amst). 6:238-239. doi:10.1016/j.hjdsi.2018.08.002. PMID: 30143459.

Dhruva SS, Mena-Hurtado C, Curtis J, et al. 2019. Learning how to successfully enroll and engage people in a mobile sync-for-science platform to inform shared decision making. J Am Coll Cardiol. 73:3039. doi:10.1016/S0735-1097(19)33645-9.

Guimarães PO, Krishnamoorthy A, Kaltenbach LA, et al. 2017. Accuracy of medical claims for identifying cardiovascular and bleeding events after myocardial infarction : a secondary analysis of the TRANSLATE-ACS study. JAMA Cardiol. 2(7):750-757. doi: 10.1001/jamacardio.2017.1460. PMID: 28538984.

Marsolo K. 2019. Approaches to Patient Follow-Up for Clinical Trials: What’s the Right Choice for Your Study? NIH Pragmatic Trials Collaboratory PCT Grand Rounds; March 1, 2019. Available at: https://rethinkingclinicaltrials.org/news/approaches-to-patient-follow-up-for-clinical-trials-whats-the-right-choice-for-your-study-keith-marsolo-phd/. Accessed October 14, 2022.

back to top

McCarthy EP, Chang CH, Tilton N, Kabeto MU, Langa KM, Bynum JPW. 2022. Validation of claims algorithms to identify Alzheimer's disease and related dementias. J Gerontol A Biol Sci Med Sci. 77(6):1261-1271. doi: 10.1093/gerona/glab373. PMID: 3491968.

Turvey C, Klein D, Fix G, et al. 2014. Blue Button use by patients to access and share health record information using the Department of Veterans Affairs' online patient portal. J Am Med Inform Assoc. 21(4):657-63. doi: 10.1136/amiajnl-2014-002723. PMID: 24740865.


Version History

October 14, 2022: Made nonsubstantive changes to the text, added an image to the Resources sidebar, added Seils as a contributing editor, and reordered the section within the chapter as part of the annual content update (changes made by D. Seils).

February 7, 2022: Added example sentence on using CMS data to identify patients with dementia (changes made by K. Staman).

January 15, 2021: Moved this section from “Inpatient Endpoints” to “Acquiring Real-World Data”, added EHR Workshop video module to resource bar, updated proposed rule to final rule (changes made by K. Staman).

Published June 19, 2019.

current section :

Acquiring Claims Data and CMS Research-Identifiable Files

  1. Introduction
  2. Common Real-World Data Sources
  3. Data Formats
  4. Acquiring Electronic Health Record Data
  5. Acquiring Claims Data and CMS Research-Identifiable Files
  6. Acquiring Patient-Reported Data
  7. Gaining Permission to Use Real-World Data
  8. Methods of Access
  9. Case Study: The IMPACT-AFib Trial

Citation:

Eisenstein E, Anstrom K, Zozus M, et al. Acquiring Real-World Data: Acquiring Claims Data and CMS Research-Identifiable Files. In: Rethinking Clinical Trials: A Living Textbook of Pragmatic Clinical Trials. Bethesda, MD: NIH Pragmatic Trials Collaboratory. Available at: https://rethinkingclinicaltrials.org/chapters/conduct/acquiring-real-world-data/acquiring-claims-data-and-cms-research-identifiable-files/. Updated December 3, 2025. DOI: 10.28929/155.

Footer Menu

  • How to Use This Site
  • About NIH Collaboratory
  • Enrollment Reporting
  • Grand Rounds
  • Funding Statement
Link to Twitter Link to LinkedIn Link to Blue Sky Link to NIH Collaboratory email

Reference in this Web site to any specific commercial products, process, service, manufacturer, or company does not constitute its endorsement or recommendation by the U.S. Government or National Institutes of Health (NIH). NIH is not responsible for the contents of any “off-site” Web page referenced from this server.

Log in
Privacy Statement
WordPress is a content management system and should not be used to upload any PHI as it is not an environment for which we exercise oversight, meaning you the author are responsible for the content you post. Please use this system accordingly. Site Map