Ethics for Artificial Intelligence and Machine Learning in Pragmatic Clinical Trials
Section 4
Training Data Generation
Choices regarding what data are procured and how they are used to build algorithmic models are critical to the integrity of the resulting AI/ML system. To be more equity-enhancing, researchers should develop AI systems with the practical realities of point of care in mind (McCradden et al 2022) and therefore confirm that training data are representative of the intended populations and—if not—assess any biases that might result. Researchers must also ascertain who is (and is not) represented in training data and the effects, if any, this will have on scientific endpoints.
Investigator tip: If patient data from electronic health records are used, consider whether healthcare access points affect representation. Biases can arise because training datasets generated from “patients” only represent people with access to healthcare, and are much more likely to be drawn from major academic medical centers with the most up-to-date data technology. Likewise, datasets generated from “research participants” reflect only those recruited to enroll and who subsequently consent to participate. Studies show there are known differences in consent rates between historically included and excluded populations (Spector-Bagdady et al 2021).
SECTIONS
REFERENCES
McCradden MD, Anderson JA, Stephenson EA, et al. 2022. A research ethics framework for the clinical translation of healthcare machine learning. Am J Bioeth. 22(5):8-22. doi:10.1080/15265161.2021.2013977. PMID: 35048782.
Spector-Bagdady K, Tang S, Jabbour S, et al. 2021. Respecting autonomy and enabling diversity: The effect of eligibility and enrollment on research data demographics. Health Aff. 40(12):1892-1899. doi:10.1377/hlthaff.2021.01197. PMID: 34871076.