Skip to content

COVID-19 Resources

Access the latest information on COVID-19 for clinical researchers
  • Home
  • About
    • NIH Collaboratory
      • Coordinating Center
      • NIH Collaboratory Trials
      • Core Working Groups
      • Steering Committee
      • Distributed Research Network
      • Our Impact
    • Living Textbook
      • Table of Contents
      • How to Use This Site
  • Resources
    • Data and Resource Sharing
    • Training Resources
    • Tools for Researchers
    • Publications
    • Knowledge Repository
  • Webinar
  • Podcast
  • News
    • News Feed
    • Calendar
    • Subscribe
return to home
Subscribe to Newsletter go to twitter feed go to linkedin go to blue sky feed
Search
NIH Collaboratory
Living Textbook of
Pragmatic Clinical Trials

COVID-19 Resources

Access the latest information on COVID-19 for clinical researchers
home button

Rethinking Clinical Trials

A Living Textbook of Pragmatic Clinical Trials

  • Design
    • What is a Pragmatic Clinical Trial?
    • Decentralized Pragmatic Clinical Trials
    • Developing a Compelling Grant Application
    • Experimental Designs and Randomization Schemes
    • Endpoints and Outcomes
    • Analysis Plan
    • Using Electronic Health Record Data
    • Building Partnerships and Teams to Ensure a Successful Trial
    • Intervention Delivery and Complexity
    • Patient Engagement
  • Data, Tools & Conduct
    • Assessing Feasibility
    • Acquiring Real-World Data
    • Assessing Fitness-for-Use of Real-World Data
    • Study Startup
    • Participant Recruitment
    • Monitoring Intervention Fidelity and Adaptations
    • Patient-Reported Outcomes
    • Clinical Decision Support
    • Mobile Health
    • Electronic Health Records–Based Phenotyping
    • Navigating the Unknown
  • Dissemination & Implementation
    • Data Sharing and Embedded Research
    • Dissemination Approaches for Different Audiences
    • Implementation
    • End-of-Trial Decision-Making
  • Ethics & Regulatory
    • Privacy Considerations
    • Identifying Those Engaged in Research
    • Collateral Findings
    • Consent, Disclosure, and Non-Disclosure
    • Data and Safety Monitoring
    • Ethical Considerations of Data Sharing in Pragmatic Clinical Trials
    • Ethics for AI and ML
    • IRB Responsibilities and Procedures

Training Data Generation

CHAPTER SECTIONS

Ethics for Artificial Intelligence and Machine Learning in Pragmatic Clinical Trials


Section 4

Training Data Generation

Expand Contributors

Vasiliki N. Rahimzadeh, PhD
Kaitlyn Jaffe, PhD
Kayte Spector-Bagdady, JD, MBE

Contributing Editor

Elizabeth McCamic, MA

Choices regarding what data are procured and how they are used to build algorithmic models are critical to the integrity of the resulting AI/ML system. To be more equity-enhancing, researchers should develop AI systems with the practical realities of point of care in mind (McCradden et al 2022) and therefore confirm that training data are representative of the intended populations and—if not—assess any biases that might result. Researchers must also ascertain who is (and is not) represented in training data and the effects, if any, this will have on scientific endpoints.

Investigator tip: If patient data from electronic health records are used, consider whether healthcare access points affect representation. Biases can arise because training datasets generated from “patients” only represent people with access to healthcare, and are much more likely to be drawn from major academic medical centers with the most up-to-date data technology. Likewise, datasets generated from “research participants” reflect only those recruited to enroll and who subsequently consent to participate. Studies show there are known differences in consent rates between historically included and excluded populations (Spector-Bagdady et al 2021).

Previous Section Next Section

SECTIONS

CHAPTER SECTIONS

sections

  1. Introduction
  2. Institutional Review Board Approval
  3. Data Procurement and Consent
  4. Training Data Generation
  5. Conclusion

REFERENCES

back to top

McCradden MD, Anderson JA, Stephenson EA, et al. 2022. A research ethics framework for the clinical translation of healthcare machine learning. Am J Bioeth. 22(5):8-22. doi:10.1080/15265161.2021.2013977. PMID: 35048782.

Spector-Bagdady K, Tang S, Jabbour S, et al. 2021. Respecting autonomy and enabling diversity: The effect of eligibility and enrollment on research data demographics. Health Aff. 40(12):1892-1899. doi:10.1377/hlthaff.2021.01197. PMID: 34871076.

back to top


Version History

Published November 7, 2023

current section :

Training Data Generation

  1. Introduction
  2. Institutional Review Board Approval
  3. Data Procurement and Consent
  4. Training Data Generation
  5. Conclusion

Citation:

Rahimzadeh V, Jaffe K, Spector-Bagdady K. Ethics for Artificial Intelligence and Machine Learning in Pragmatic Clinical Trials: Training Data Generation. In: Rethinking Clinical Trials: A Living Textbook of Pragmatic Clinical Trials. Bethesda, MD: NIH Pragmatic Trials Collaboratory. Available at: https://rethinkingclinicaltrials.org/chapters/ethics-and-regulatory/ethics-and-equity-for-ai-and-ml/training-data-generation/. Updated December 3, 2025. DOI: 10.28929/234.

Footer Menu

  • How to Use This Site
  • About NIH Collaboratory
  • Enrollment Reporting
  • Grand Rounds
  • Funding Statement
Link to Twitter Link to LinkedIn Link to Blue Sky Link to NIH Collaboratory email

Reference in this Web site to any specific commercial products, process, service, manufacturer, or company does not constitute its endorsement or recommendation by the U.S. Government or National Institutes of Health (NIH). NIH is not responsible for the contents of any “off-site” Web page referenced from this server.

Log in
Privacy Statement
WordPress is a content management system and should not be used to upload any PHI as it is not an environment for which we exercise oversight, meaning you the author are responsible for the content you post. Please use this system accordingly. Site Map