Investigating Presence of Ethnoracial Bias in Clinical Data using Machine Learning

descriptionPublicationkeyboard_double_arrow_right Article , Conference object , Other literature type 03 Sep 2021Publisher:openRxivFunded by:EC | WideHealth, NIH | Critical Care Informatics...

Authors: Velichkovska, Bojana; Gjoreski, Hristijan; Denkovski, Daniel; Kalendar, Marija; Anthony Celi, Leo; Osmani, Venet;

doi: 10.1101/2021.09.01.21262949 , 10.5281/zenodo.10038004 , 10.5281/zenodo.10038005

Investigating Presence of Ethnoracial Bias in Clinical Data using Machine Learning

- Summary
- Subjects
- Metrics

Abstract

Abstract An important target for machine learning research is obtaining unbiased results, which require addressing bias that might be present in the data as well as the methodology. This is of utmost importance in medical applications of machine learning, where trained models should be unbiased so as to result in systems that are widely applicable, reliable and fair. Since bias can sometimes be introduced through the data itself, in this paper we investigate the presence of ethnoracial bias in patients’ clinical data. We focus primarily on vital signs and demographic information and classify patient ethnoraces in subsets of two from the three ethnoracial groups (African Americans, Caucasians, and Hispanics). Our results show that ethnorace can be identified in two out of three patients, setting the initial base for further investigation of the complex issue of ehtnoracial bias.

Related Organizations

Fondazione Bruno Kessler
Italy
Massachusetts Institute of Technology
United States
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
Saints Cyril and Methodius University of Skopje
Former Yugoslav Republic of Macedonia

Keywords

machine learning, clinical data, ethnoracial bias, vital signs

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	1
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

1

Average

Green

Fields of Science (3) View all

medical and health sciences

basic medicine

Fields of Science

medical and health sciences

basic medicine

View all

Funded by

EC| WideHealth, NIH| Critical Care Informatics: Ethical considerations around the use and sharing of health-related data