publication . Preprint . 2018

Predicting Race and Ethnicity From the Sequence of Characters in a Name

Sood, Gaurav; Laohaprapanon, Suriyan;
Open Access English
  • Published: 05 May 2018
Abstract
To answer questions about racial inequality, we often need a way to infer race and ethnicity from a name. Until now, a bulk of the focus has been on optimally exploiting the last names list provided by the Census Bureau. But there is more information in the first names, especially for African Americans. To estimate the relationship between full names and race, we exploit the Florida voter registration data and the Wikipedia data. In particular, we model the relationship between the sequence of characters in a name, and race and ethnicity using Long Short Term Memory Networks. Our out of sample (OOS) precision and recall for the full name model estimated on the F...
Subjects
free text keywords: Statistics - Applications, Statistics - Machine Learning
Download from

Abadi, Mart´ın, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard et al. 2016. TensorFlow: A System for Large-Scale Machine Learning. In OSDI. Vol. 16 pp. 265-283.

Sood, Gaurav. 2017. “Florida Voter Registration Data.”. URL: https://doi.org/10.7910/DVN/UBIG3F

Srivastava, Nitish, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever and Ruslan Salakhutdinov. 2014. “Dropout: A simple way to prevent neural networks from overfitting.” The Journal of Machine Learning Research 15(1):1929-1958.

Abstract
To answer questions about racial inequality, we often need a way to infer race and ethnicity from a name. Until now, a bulk of the focus has been on optimally exploiting the last names list provided by the Census Bureau. But there is more information in the first names, especially for African Americans. To estimate the relationship between full names and race, we exploit the Florida voter registration data and the Wikipedia data. In particular, we model the relationship between the sequence of characters in a name, and race and ethnicity using Long Short Term Memory Networks. Our out of sample (OOS) precision and recall for the full name model estimated on the F...
Subjects
free text keywords: Statistics - Applications, Statistics - Machine Learning
Download from

Abadi, Mart´ın, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard et al. 2016. TensorFlow: A System for Large-Scale Machine Learning. In OSDI. Vol. 16 pp. 265-283.

Sood, Gaurav. 2017. “Florida Voter Registration Data.”. URL: https://doi.org/10.7910/DVN/UBIG3F

Srivastava, Nitish, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever and Ruslan Salakhutdinov. 2014. “Dropout: A simple way to prevent neural networks from overfitting.” The Journal of Machine Learning Research 15(1):1929-1958.

Powered by OpenAIRE Open Research Graph
Any information missing or wrong?Report an Issue