Eliminating Racial Bias in AI/ML: Solving the Training Data Problem

Learn about the approach and impact of the model we built to predict skin tone from face images and videos using a unique combination of computer vision and machine learning to help organizations monitor for racial biases in large datasets.

The data science community is well aware of the risk that dataset imbalances can lead to biased models that negatively impact downstream predictions [1-3] and create unfair or undesirable outcomes for underrepresented groups. Racial bias is one of the most discussed contributors to poor model performance on underrepresented populations [4-9]; however, analyzing racial biases in image and video datasets can be challenging without a labor-intensive manual review effort to quantify demographic diversity.

To measure racial biases in image and video datasets and understand how dataset demographics relate to machine learning model performance, we developed a unique approach to quantify skin tones from facial images.

Our Approach

Fundamentally, our method combines computer vision with machine learning models to categorize skin tone into one of the six Fitzpatrick skin types [10].

We first process video frames using a face detection algorithm to locate faces within individual frames. Then, we apply a face landmark prediction model and a convex hull masking approach to strategically extract areas of the face only containing skin, thus eliminating color interference from areas like the eyes, nostrils and mouth. To transform the resulting skin mask into meaningful skin tone colors, we utilize unsupervised machine learning to cluster the population of all skin pixels into a summary set of 10 colors.  Finally, our concluding model maps the 10-color summary to one of six Fitzpatrick skin types that correspond to the amount of melanin in a person’s skin.


Ensuring racial bias is not baked into machine learning models due to imbalanced training data is essential, but the time required to manually estimate the demographic makeup of faces in a large dataset is prohibitive. Execution of our model takes advantage of state-of-the-art cloud computing technology, enabling effective quantification of demographic diversity in video and image datasets that are orders of magnitude faster than manual review. Monitoring diversity during a company’s dataset and model development phases is a significant mechanism by which to exclude racial bias from model performance.

Furthermore, we utilized our model to compile over 10,000 faces into a novel face dataset that is unique for its approximately uniform distribution across the six Fitzpatrick skin tones. This broad dataset enabled us to study differences in the performance of common facial recognition models across demographic groups and work to improve performance across the board. It also led us to conduct experiments on emotion perception between people with different demographics, further informing our model design.

Our continued work focuses on improving the robustness of our model as well as examining biases in facial recognition and face emotion detection algorithms. Extremely high or low amounts of illumination in a video can interfere with accurate skin tone predictions, so we are currently incorporating a robust illumination normalization technique to combat outlier video lighting and coloration. Lastly, skin tone diversity is rich, and thus the Fitzpatrick scale is a simplified and suboptimal representation of all skin complexions and undertones. Our team is currently testing alternatives such as the individual topology angle for skin tone categorization [11].  


We believe it’s important to apply our expertise in computer vision and machine learning towards building tools for fair AI. Many of our clients are interested in leveraging video and image data to develop AI solutions in the technology and healthcare sectors. With insight from our racial bias quantification tool, we can inform clients on where additional data needs to be collected and apply nuanced modeling to achieve more accurate predictions for all clients and use cases.

Connect with us

We build data science applications to support drug development, digital & molecular biomarker discovery, and digital health. Our team works at the intersection of biology and technology to accelerate innovation. If you have an AI/ML-related question or would like to discuss your AI strategy, we’d love to hear from you! Reach out today at inquire@mercuryds.com, on Twitter @mercurydatasci, or on LinkedIn.


  1. A. J. Larrazabal, N. Nieto, V. Peterson, D.H. Milone, & E. Ferrante (2020). Gender imbalance in medical imaging datasets produces biased classifiers for computer-aided diagnosis. Proceedings of the National Academy of Sciences, 117(23), 12592-12594.
  2. B. Mac Namee, P. Cunningham, S. Byrne, and O.I. Corrigan. The problem of bias in training data in regression problems in medical decision support. Artificial intelligence in medicine, 24(1):51–70, 2002.
  3. C. Cardie and N. Howe. Improving minority class prediction using case-specific feature weights. In ICML, pages 57–65, 1997.
  4. D.A. Vyas, L.G. Eisenstein, D.S. Jones, “Hidden in Plain Sight — Reconsidering the Use of Race Correction in Clinical Algorithms.” N Engl J Med 2020. DOI: 10.1056/NEJMms2004740.
  5. J. Vincent, “Google ‘fixed’ its racist algorithm by removing gorillas from its image-labeling tech,” The Verge, Jan. 12, 2018. [Online]. Available: https:// www.theverge.com/2018/1/12/16882408/google -racist-gorillas-photo-recognition-algorithm-ai
  6. I. D. Raji and J. Buolamwini, “Actionable auditing: Investigating the impact of publicly naming biased performance results of commercial AI products,” in Proc. AAAI/ACM Conf. AI Ethics and Society, 2019. [Online]. Available: http://www.aies -conference.com/wp-content/uploads/2019/01/ AIES-19_paper_223.pdf
  7. J. Angwin, J. Larson, S. Mattu, and L. Kirchner, “Machine bias: There’s software used across the country to predict future criminals. And it’s biased against blacks,” ProPublica, May 23, 2016. [Online]. Available: https://www.propublica.org/ article/machine-bias-risk-assessments-in-criminal -sentencing
  8. A. Howard and J. Borenstein, “The ugly truth about ourselves and our robot creations: The problem of bias and social inequity,” Sci. Eng. Ethics, vol. 24, no. 5, pp. 1521–1536, Oct. 2017. doi: 10.1007/s11948-017-9975-2.
  9. B. Wilson, J. Hoffman, and J. Morgenstern, Predictive inequity in object detection. 2019. [Online]. Available: https://arxiv.org/abs/1902.11097
  10. S. Sachdeva, “Fitzpatrick skin typing: Applications in dermatology.”  Indian J Dermatol Venereol Leprol 2009;75:93-6. Available: http://www.bioline.org.br/pdf?dv09029
  11. “Colorimetry - Pigmentation of the skin.” Dermatest. Available: https://www.dermatest.de/en/measuring-techniques/colorimetry
Written by:
Published on:
July 14, 2021
Back to All Blog Posts
View more recent blog posts