What does an AI ethicist do?

AI ethicists work with AI researchers and data science teams to ensure the safety of algorithms. The specific role of an AI ethicist depends on context. Some AI ethicists detect algorithmic bias or immoral behavior and work to mitigate it. Research ethicists often focus on long-term existential risks posed by AI.

Can machines be moral?

The moral status of machines is heavily debated in philosophy and computer science. In general, the morality of an AI system reflects its data and design. AI systems in use today often have discriminatory effects or reasoning, which can be mitigated. The resulting mitigated systems are usually viewed as fairer.

Is artificial intelligence moral?

Whether AI is moral is heavily debated in philosophy and computer science. It’s safe to say that the morality of an AI system reflects its data and design. AI systems in use today often have discriminatory effects or reasoning. Systems in which the design mitigates these effects are usually viewed as safer.

What is high bias in machine learning?

Unwanted bias in AI can be measured in different ways depending on context. High bias is sometimes measured as a disparate impact score lower than 0.8 or higher than 1.2.

How does machine learning deal with bias?

Sophisticated methods exist to reduce unwanted bias in machine learning. State-of-the-art methods such as disparate impact removers or adversarial debiasing are implemented in the AIF360 toolkit.

AI bias is the propensity of AI tools to create unfair outcomes, such as privileging one user group over others. It often arises from biases that exist in the data used to train an AI model. AI bias can lead to discriminatory decisions and practices, potentially affecting healthcare, lending, public safety, and hiring.

Skip to Navigation
Skip to Content
Skip to Article
Skip to Footer

Related Skills:

Back-end13-minute read

Machines and Trust: How to Mitigate AI Bias

Unwanted AI bias is already a widespread problem. Machine learning models can replicate or exacerbate existing biases, often in ways that are not detected until release. So what can be done about it?

Last updated: May 28, 2026

authors are vetted experts in their fields and write on topics in which they have demonstrated experience. All of our content is peer reviewed and validated by Toptal experts in the same field.

Unwanted AI bias is already a widespread problem. Machine learning models can replicate or exacerbate existing biases, often in ways that are not detected until release. So what can be done about it?

Last updated: May 28, 2026

authors are vetted experts in their fields and write on topics in which they have demonstrated experience. All of our content is peer reviewed and validated by Toptal experts in the same field.

Michael McKenna

Verified Expert in Engineering

8 Years of Experience

Mike is a data scientist, data ethicist, and machine learning engineer specializing in health and retail. He currently serves as the Director of Data Ethics at Services Australia. As a senior data scientist at CVS, Mike led COVID-19 vaccine demand forecasting, liaising closely with the White House and the CDC as part of Operation Warp Speed.

Expertise

Artificial Intelligence

Previous Role

Director of Data Science

Previously At

As reported by McKinsey & Company, among others, we are experiencing the fourth wave of the Industrial Revolution: automation using cyber-physical systems. Key elements of this wave include machine intelligence, blockchain-based decentralized governance, and genome editing. As has been the case with previous waves, these technologies reduce the need for human labor but pose new ethical challenges, especially for artificial intelligence development companies and their clients.

The purpose of this article is to review recent ideas on detecting and mitigating unwanted bias in machine learning models. We will discuss recently created guidelines around trustworthy AI, review examples of bias in AI arising from both model choice and underlying societal bias, suggest business and technical practices to detect and reduce bias in AI models, and discuss legal obligations as they currently exist under the GDPR and where they might develop in the future.

What Is the Role of Bias in AI Models?

All models are made by humans and reflect human biases. Machine learning models can reflect the biases of organizational teams, of the designers in those teams, the data scientists who implement the models, and the data engineers that gather data. Naturally, they also reflect the bias inherent in the data itself. Just as we expect a level of trustworthiness from human decision-makers, we should expect and deliver a level of trustworthiness from our models.

A trustworthy model will still contain many biases because bias (in its broadest sense) is the backbone of machine learning. A breast cancer prediction model will correctly predict that patients with a history of breast cancer are biased towards a positive result. Depending on the design, it may learn that women are biased towards a positive result. The final model may have different levels of accuracy for women and men, and be biased in that way. The key question to ask is not Is my model biased?, because the answer will always be yes.

Searching for better questions, the European Union High Level Expert Group on Artificial Intelligence has produced guidelines applicable to model building. In general, machine learning models should be:

Lawful—respecting all applicable laws and regulations
Ethical—respecting ethical principles and values
Robust—both from a technical perspective while taking into account its social environment

These short requirements, and their longer form, include and go beyond issues of bias, acting as a checklist for engineers and teams. We can develop more trustworthy AI systems by examining those biases within our models that could be unlawful, unethical, or un-robust, in the context of the problem statement and domain.

Historical Cases of Bias in AI

Below are three historical models with dubious trustworthiness, owing to AI bias that is unlawful, unethical, or un-robust. The first and most famous case, the COMPAS model, shows how even the simplest models can discriminate unethically according to race. The second case illustrates a flaw in most natural language processing (NLP) models: They are not robust to racial, sexual and other prejudices. The final case, the Allegheny Family Screening Tool, shows an example of a model fundamentally flawed by biased data, and some best practices in mitigating those flaws.

COMPAS

The canonical example of biased, untrustworthy AI is the COMPAS system, used in Florida and other states in the US. The COMPAS system used a regression model to predict whether or not a perpetrator was likely to recidivate. Though optimized for overall accuracy, the model predicted double the number of false positives for recidivism for African American ethnicities than for Caucasian ethnicities.

The COMPAS example shows how unwanted bias can creep into our models no matter how comfortable our methodology. From a technical perspective, the approach taken to COMPAS data was extremely ordinary, though the underlying survey data contained questions with questionable relevance. A small supervised model was trained on a dataset with a small number of features. (In my practice, I have followed a similar technical procedure dozens of times, as is likely the case for any data scientist or ML engineer.) Yet, ordinary design choices produced a model that contained unwanted, racially discriminatory bias.

The biggest issue in the COMPAS case was not with the simple model choice, or even that the data was flawed. Rather, the COMPAS team failed to consider that the domain (sentencing), the question (detecting recidivism), and the answers (recidivism scores) are known to involve disparities on racial, sexual, and other axes even when algorithms are not involved. Had the team looked for bias, they would have found it. With that awareness, the COMPAS team might have been able to test different approaches and recreate the model while adjusting for bias. This would have then worked to reduce unfair incarceration of African Americans, rather than exacerbating it.

Corpus Bias in Pre-trained Models

Large, pre-trained models form the base for most NLP tasks. Unless these base models are specially designed to avoid bias along a particular axis, they are certain to be imbued with the inherent prejudices of the corpora they are trained with—for the same reason that these models work at all. The results of this bias, along racial and gendered lines, have been shown on Word2Vec and GloVe models trained on Common Crawl and Google News respectively.

Large language models (LLMs) have since displaced BERT-style contextual models as the dominant paradigm for NLP, trained on orders of magnitude more web-scraped data from similar sources. Research consistently shows that even models explicitly designed to reduce bias continue to exhibit implicit racial and gender biases, with post-training alignment techniques such as reinforcement learning from human feedback shifting how bias surfaces rather than reliably eliminating it.

Although the best model architectures for any NLP problem are imbued with discriminatory sentiment, the solution is not to abandon pretrained models but rather to consider the particular domain in question, the problem statement, and the data in totality with the team. If an application is one where discriminatory prejudice by humans is known to play a significant part, developers should be aware that models are likely to perpetuate that discrimination.

Allegheny Family Screening Tool: Unfairly Biased, But Well-designed and Mitigated

In this final example, we discuss a model built from unfairly discriminatory data, but the unwanted bias is mitigated in several ways. The Allegheny Family Screening Tool is a model designed to assist humans in deciding whether a child should be removed from their family because of abusive circumstances. The tool was designed openly and transparently with public forums and opportunities to find flaws and inequities in the software.

The unwanted bias in the model stems from a public dataset that reflects broader societal prejudices. Middle- and upper-class families have a higher ability to “hide” abuse by using private health providers. Referrals to Allegheny County occur over three times as often for African-American and biracial families than white families. Commentators like Virginia Eubanks and Ellen Broad have claimed that data issues like these can only be fixed if society is fixed, a task beyond any single engineer.

In production, the county combats inequities in its model by using it only as an advisory tool for frontline workers, and designs training programs so that frontline workers are aware of the failings of the advisory model when they make their decisions. With new developments in debiasing algorithms, Allegheny County has new opportunities to reduce bias in the AI model.

The development of the Allegheny tool has much to teach engineers about the limits of algorithms to overcome latent discrimination in data and the societal discrimination that underlies that data. It provides engineers and designers with an example of consultative model building which can mitigate the real-world impact of potential discriminatory bias in a model.

Avoiding and Mitigating AI Bias: Key Business Awareness

Fortunately, there are some debiasing approaches and methods—many of which use the COMPAS dataset as a benchmark.

Improve Diversity, Mitigate Diversity Deficits

Maintaining diverse teams, both in terms of demographics and in terms of skillsets, is important for avoiding and mitigating unwanted AI bias. Despite continuous lip service paid to diversity by tech executives, women and people of color remain under-represented.

Various ML models perform poorer on statistical minorities within the AI industry itself, and the people to first notice these issues are users who are female and/or people of color. With more diversity in AI teams, issues around unwanted bias can be noticed and mitigated before release into production.

Be Aware of Proxies: Removing Protected Class Labels from a Model May Not Work!

A common, naïve approach to removing bias related to protected classes (such as sex or race) from data is to delete the labels marking race or sex from the models. In many cases, this will not work, because the model can build up understandings of these protected classes from other labels, such as postal codes. The usual practice involves removing these labels as well, both to improve the results of the models in production but also due to legal requirements. The recent development of debiasing algorithms, which we will discuss below, represents a way to mitigate bias in AI algorithms without removing labels.

Be Aware of Technical Limitations

Even best practices in product design and model building will not be enough to remove the risks of unwanted bias, particularly in cases of biased data. It is important to recognize the limitations of our data, models, and technical solutions to bias, both for awareness’ sake, and so that human methods of limiting machine learning bias such as human-in-the-loop can be considered.

Avoiding and Mitigating AI Bias: Key Technical Tools for Awareness and Debiasing

Data scientists have a growing number of technical awareness and debiasing tools available to them, which supplement a team’s capacity to avoid and mitigate AI bias. Currently, awareness tools are more sophisticated and cover a wide range of model choices and bias measures, while debiasing tools are nascent and can mitigate bias in models only in specific cases.

Awareness and Debiasing Tools for Supervised Learning Algorithms

IBM has released a suite of awareness and debiasing tools for binary classifiers under the AI Fairness project. To detect AI bias and mitigate against it, all methods require a class label (e.g., race, sexual orientation). Against this class label, a range of metrics can be run (e.g., disparate impact and equal opportunity difference) that quantify the model’s bias toward particular members of the class. We include an explanation of these metrics at the bottom of the article.

Once bias is detected, the AI Fairness 360 library (AIF360) has 10 debiasing approaches (and counting) that can be applied to models ranging from simple classifiers to deep neural networks. Some are preprocessing algorithms, which aim to balance the data itself. Others are in-processing algorithms which penalize unwanted bias while building the model. Yet others apply postprocessing steps to balance favorable outcomes after a prediction. The particular best choice will depend on your problem.

AIF360 has a significant practical limitation in that the bias detection and mitigation algorithms are designed for binary classification problems, and need to be extended to multiclass and regression problems. Other libraries, such as Aequitas and LIME, have good metrics for some more complicated models—but they only detect bias. They aren’t capable of fixing it. But even just the knowledge that a model is biased before it goes into production is still very useful, as it should lead to testing alternative approaches before release.

General Awareness Tool: LIME

The Local Interpretable Model-agnostic Explanations (LIME) toolkit can be used to measure feature importance and explain the local behavior of most models—multiclass classification, regression, and deep learning applications included. The general idea is to fit a highly interpretable linear or tree-based model to the predictions of the model being tested for bias.

For instance, deep CNNs for image recognition are very powerful but not very interpretable. By training a linear model to emulate the behavior of the network, we can gain some insight into how it works. Optionally, human decision-makers can review the reasons behind the model’s decision in specific cases through LIME and make a final decision on top of that. This process in a medical context is demonstrated with the image below.

Explaining individual predictions to a human decision-maker. The model predicts that a patient has the flu based on symptoms or lack thereof. The explainer, LIME, reveals to the doctor the weighting behind each symptom and how it fits the data. The doctor still makes the final decision but is better informed about the model's reasoning. Based on an image made by Marco Tulio Ribeiro

Debiasing NLP Models

Earlier, we discussed the biases latent in most corpora used for training NLP models. For systems still built on static word embeddings, readily available debiased embeddings remain a practical starting point. For LLMs, now the dominant paradigm, bias assessment extends to output-level auditing: methods such as WEAT and SEAT address representational bias in encoders, while prompt-based counterfactual testing and tools such as Amazon SageMaker Clarify and Microsoft’s Azure Responsible AI tooling provide accessible pipelines for evaluating generative models.

Debiasing Convolutional Neural Networks (CNNs)

Although LIME can explain the importance of individual features and provide local explanations of behavior on particular image inputs, LIME does not explain a CNN’s overall behavior or allow data scientists to search for unwanted bias.

In famous cases where unwanted CNN bias was found, members of the public (such as Joy Buolamwini) noticed instances of bias based on their membership of an underprivileged group. Hence the best approaches in mitigation combine technical and business approaches: Test often, and build diverse teams that can find unwanted bias in AI through testing before production.

Legal Obligations and Future Directions Around AI Ethics

In this section, we focus on the European Union’s General Data Protection Regulation (GDPR). The GDPR is globally the de facto standard in data protection legislation. (But it’s not the only legislation—there’s also China’s Personal Information Security Specification, for example.) The scope and meaning of the GDPR are highly debatable, so we’re not offering legal advice in this article, by any means. Nevertheless, it’s said that it’s in the interests of organizations globally to comply, as the GDPR applies not only to European organizations but any organizations handling data belonging to European citizens or residents.

The GDPR is separated into binding articles and non-binding recitals. While the articles impose some burdens on engineers and organizations using personal data, the most stringent provisions for bias mitigation are under Recital 71, and not binding. Recital 71 is among the most likely future regulations as it has already been contemplated by legislators. Commentaries explore GDPR obligations in further detail.

We will zoom in on two key requirements and what they mean for model builders.

1. Prevention of Discriminatory Effects

The GDPR imposes requirements on the technical approaches to any modeling on personal data. Data scientists working with sensitive personal data will want to read the text of Article 9, which forbids many uses of particularly sensitive personal data (such as racial identifiers). More general requirements can be found in Recital 71:

[. . .] use appropriate mathematical or statistical procedures, [. . .] ensure that the risk of errors is minimised [. . .], and prevent discriminatory effects on the basis of racial or ethnic origin, political opinion, religion or beliefs, trade union membership, genetic or health status, or sexual orientation.
GDPR (emphasis mine)

Much of this recital is accepted as fundamental to good model building: Reducing the risk of errors is the first principle. However, under this recital, data scientists are obliged not only to create accurate models but models which do not discriminate! As outlined above, this may not be possible in all cases. The key remains to be sensitive to the discriminatory effects which might arise from the question at hand and its domain, using business and technical resources to detect and mitigate unwanted bias in AI models.

2. The Right to an Explanation

Rights to “meaningful information about the logic involved” in automated decision-making can be found throughout GDPR articles 13-15. Recital 71 explicitly calls for “the right [. . .] to obtain an explanation” (emphasis mine) of automated decisions. (However, debate continues as to the extent of any binding right to an explanation.)

As we have discussed, some tools for providing explanations for model behavior do exist, but complex models (such as those involving computer vision or NLP) cannot be easily made explainable without losing accuracy. Debate continues as to what an explanation would look like. As a minimum best practice, for models likely to be used in the future, LIME or other interpretation methods should be developed and tested for production.

Ethics and AI: A Worthy and Necessary Challenge

In this post, we have reviewed the problems of unwanted bias in our models, discussed some historical examples, provided some guidelines for businesses and tools for technologists, and discussed key regulations relating to unwanted bias.

As the intelligence of machine learning models surpasses human intelligence, they also surpass human understanding. But, as long as models are designed by humans and trained on data gathered by humans, they will inherit human prejudices.

Managing these human prejudices requires careful attention to data, using AI to help detect and combat unwanted bias when necessary, building sufficiently diverse teams, and having a shared sense of empathy for the users and targets of a given problem space. Ensuring that AI is fair is a fundamental challenge of automation. As the humans and engineers behind that automation, it is our ethical and legal obligation to ensure AI acts as a force for fairness.

Debiasing Conference Papers and Journal Articles

Definitions of AI Bias Metrics

Disparate Impact

Disparate impact is defined as “the ratio in the probability of favorable outcomes between the unprivileged and privileged groups.” For instance, if women are 70% as likely to receive a perfect credit rating as men, this represents a disparate impact. Disparate impact may be present both in the training data and in the model’s predictions: in these cases, it is important to look deeper into the underlying training data and decide if disparate impact is acceptable or should be mitigated.

Equal Opportunity Difference

Equal opportunity difference is defined (in the AI Fairness 360 article found above) as “the difference in true positive rates [recall] between unprivileged and privileged groups.” The famous example discussed in the paper of high equal opportunity difference is the COMPAS case. As discussed above, African-Americans were being erroneously assessed as high-risk at a higher rate than Caucasian offenders. This discrepancy constitutes an equal opportunity difference.

Special thanks to Jonas Schuett for providing some useful pointers about the GDPR section.

Understanding the basics

AI ethicists work with AI researchers and data science teams to ensure the safety of algorithms. The specific role of an AI ethicist depends on context. Some AI ethicists detect algorithmic bias or immoral behavior and work to mitigate it. Research ethicists often focus on long-term existential risks posed by AI.
The moral status of machines is heavily debated in philosophy and computer science. In general, the morality of an AI system reflects its data and design. AI systems in use today often have discriminatory effects or reasoning, which can be mitigated. The resulting mitigated systems are usually viewed as fairer.
Whether AI is moral is heavily debated in philosophy and computer science. It’s safe to say that the morality of an AI system reflects its data and design. AI systems in use today often have discriminatory effects or reasoning. Systems in which the design mitigates these effects are usually viewed as safer.
Unwanted bias in AI can be measured in different ways depending on context. High bias is sometimes measured as a disparate impact score lower than 0.8 or higher than 1.2.
Sophisticated methods exist to reduce unwanted bias in machine learning. State-of-the-art methods such as disparate impact removers or adversarial debiasing are implemented in the AIF360 toolkit.
AI bias is the propensity of AI tools to create unfair outcomes, such as privileging one user group over others. It often arises from biases that exist in the data used to train an AI model. AI bias can lead to discriminatory decisions and practices, potentially affecting healthcare, lending, public safety, and hiring.

Hire a Toptal expert on this topic.

Hire Now

Michael McKenna

Verified Expert in Engineering

8 Years of Experience

Melbourne, Victoria, Australia

Member since July 16, 2019

About the author

authors are vetted experts in their fields and write on topics in which they have demonstrated experience. All of our content is peer reviewed and validated by Toptal experts in the same field.

Expertise

Artificial Intelligence

Previous Role

Director of Data Science

PREVIOUSLY AT

Hire Michael

World-class articles, delivered weekly.

Join the Toptal^® community.

Hire a Developer or Apply as a Developer

Machines and Trust: How to Mitigate AI Bias

What Is the Role of Bias in AI Models?

Historical Cases of Bias in AI

COMPAS

Corpus Bias in Pre-trained Models

Allegheny Family Screening Tool: Unfairly Biased, But Well-designed and Mitigated

Avoiding and Mitigating AI Bias: Key Business Awareness

Improve Diversity, Mitigate Diversity Deficits

Be Aware of Proxies: Removing Protected Class Labels from a Model May Not Work!

Be Aware of Technical Limitations

Avoiding and Mitigating AI Bias: Key Technical Tools for Awareness and Debiasing

Awareness and Debiasing Tools for Supervised Learning Algorithms

General Awareness Tool: LIME

Debiasing NLP Models

Debiasing Convolutional Neural Networks (CNNs)

Legal Obligations and Future Directions Around AI Ethics

1. Prevention of Discriminatory Effects

2. The Right to an Explanation

Ethics and AI: A Worthy and Necessary Challenge

Further Reading on AI Ethics and Machine Learning Bias

Books on AI Bias

Machine Learning Resources

AI Bias Organizations

Debiasing Conference Papers and Journal Articles

Definitions of AI Bias Metrics

Disparate Impact

Equal Opportunity Difference

Further Reading on the Toptal Blog:

Understanding the basics

What does an AI ethicist do?

Can machines be moral?

Is artificial intelligence moral?

What is high bias in machine learning?

How does machine learning deal with bias?

What is AI bias?

Michael McKenna

About the author

Expertise

Previous Role

PREVIOUSLY AT