Editor's Note: This article was first published by The AI Journal.
(The author would like to thank his colleague Apoorv Agarwal, Ph.D., CEO and co-founder of Text IQ, for providing some of the central ideas anchoring this article. Apoorv’s deconstruction of the fundamental ways in which bias can percolate into AI models and his examples of systemic bias prior to the advent of AI were particularly valuable in drafting this piece).
When the markets were in the middle of a historic correction a few months ago, Cathie Wood, noted investor and CEO of ARK Invest, made a bold prediction: AI could accelerate the annual global GDP rate to 50 percent over the next decade. Even though Wood’s projections may seem exaggerated, we cannot deny that AI is the most significant technological breakthrough of our time. Indeed, AI is becoming ubiquitous as organizations across every sector of the economy seize upon its power to maximize productivity and automate decision-making at a scale and accuracy that is historically unprecedented.
We may not be aware of AI’s impact at an individual level, but the complex web of decision points inside an algorithm can influence both the personal and professional facets of our lives—from where we work (AI-run tools used to screen job candidates) and the mortgage we qualify for (AI-driven risk-scoring on loan applications) to who we date (AI-driven matching on dating apps), the content that we are nudged to consume (AI-powered recommendations on streaming platforms), and the medical care we receive (risk assessment algorithms used to gauge needs for referrals and specialized medical care).
Bias: AI’s Achilles Heel
In recent years, AI researchers, policymakers, journalists, and ethicists have studied the impact of AI on individuals and societies. They have uncovered a panoply of cases where AI models made decisions that are biased and discriminatory against individuals based on their gender, racial, and other demographic attributes. Here is a sampling of these cases:
- A major technology company built a recruiting solution that used machine learning to screen CVs of job candidates. But the AI solution reportedly downrated candidates who had attended women’s colleges or used the word “women’s” in their CVs. The solution was subsequently scrapped.
- An AI system was being used by judges to make bail and sentencing decisions. Essentially, the system predicted the defendant’s recidivism risk i.e., their likelihood to reoffend in the future. A ProPublica investigation, which looked at the solution’s predictive accuracy over a two-year period, concluded that Black defendants who did not recidivate were more likely to be misclassified as high risk at nearly twice the rate compared to their white counterparts (45 percent vs. 23 percent). White defendants who did recidivate were likely to be misclassified as low risk at close to twice the rate compared to their Black counterparts (48 percent vs. 28 percent).
- An algorithm that was being used by American hospitals to identify patients that needed closer monitoring and additional primary-care visits reportedly discriminated against Black patients who needed extra care.
These cases are unequivocally concerning and present a dystopian picture of how AI models could further entrench existing inequalities and escalate the chain of systemic oppression and discrimination against women, people of color, and other marginalized groups.
To prevent these outcomes, we need strong institutional guidelines and standards to enable effective human oversight over AI models. Transparency is key and AI models, particularly in high-risk areas, cannot be opaque or operate under the proverbial black box. Ethicists, technologists, policymakers, attorneys, and business leaders have an ethical obligation to interrogate AI models for bias and other inconsistencies.
But, as much as our culture likes to anthropomorphize AI and warn us of its imminent ‘sentience,’ AI is not inherently biased. The wholesale rejection of AI is not an option because its benefits are undeniable, and AI will be the engine of unprecedented economic growth for generations to come.
But, if AI itself cannot be biased, why is it making biased decisions?
How Bias Creeps into AI Models
Let’s first consider the three salient stages of building an AI application and examine how bias could creep into each of them.
Stage 1: Defining the Business Goal of the AI Application
Think of a financial institution that’s building an AI model to determine creditworthiness of consumers applying for a loan. Who is deemed creditworthy will depend on how the financial institution defines its business goal. For example, if the goal is to optimize for profitability and eliminate risk, the AI model could reject new immigrants who haven’t had a chance to build enough credit history. If the goal is to optimize for financial inclusion, the AI model would analyze non-traditional data sources (such as mobile, rent, and utility payments) to determine creditworthiness and approve more applicants who do not have sufficient credit histories. Different business goals will drive different and, at times, unequal and biased business decisions. Bias, therefore, can begin at the earliest stage of AI modelling i.e., at the level of business intent.
On a related note, the same technology company that scrapped its AI recruiting solution because it discriminated against female applicants is now building another internal AI solution with a focus on diversity.
Stage 2: Data Collection
If the data that is used to train the AI is itself biased, the AI will only reflect the bias in the decisions it makes. There are two ways in which bias can creep into the training data collection process: 1) the people collecting the data to train the AI to collect a biased data sample that is not representative of the wider data set; 2) the wider data set itself is biased, meaning that a sample used to train the AI would inevitably show the same biases.
For example, it can be speculated that one of the reasons why the AI system used in parole hearings and criminal sentencing made decisions that were biased against Black defendants is because the historical data used to train the AI model itself reflected the effects of the long history of racism and inequality within the policing and justice system.
Another curious example relates to a Twitter chatbot that was launched by a major technology company as an experiment in conversational understanding. The bot was meant to learn from interacting with other twitter accounts but, less than 24 hours after its launch, the bot started tweeting racist, misogynistic rants and conspiracy theories. The key point here is that the AI chatbot is neither inherently bigoted nor believes in “fake news;” it was merely learning from the data that it was being exposed to—in this case, tweets by online trolls.
Stage 3: Attribute Extraction
In the next stage, data scientists decide what attributes to extract from the data in building the AI application. At the attribute extraction stage, bias can creep into the AI model depending on what attributes are chosen—it could be gender, race, et cetera.
For example, let’s reconsider the case of the AI application that determined which patients qualified for extra medical attention. One of the reasons why it reportedly ignored the needs of Black patients even though they needed the extra medical attention is because it was deciding who needs extra care based on a biased and ultimately flawed attribute: patients’ prior healthcare spending. In other words, if a patient had historically spent more on healthcare relative to other patients, they were singled out for additional medical care. On average, Black patients have—for a long list of complex reasons—historically spent less on healthcare. In effect, Black patients were being erroneously classified as low risk because their prior healthcare spending was not a reliable indicator of the state of their health.
Bias Predates the Advent of AI; AI Presents an Opportunity to Confront It
It would be naïve and ahistorical to blame AI for biasing our economic, cultural, and societal structures. Bias, discrimination, and systemic oppression plagued our society long before the advent of AI.
An example that illustrates this legacy of systemic bias can be found in healthcare. Up until the early 90s, women were largely excluded from participating in clinical research. The deliberate exclusion of women has had enduring effects on the efficacy of many drugs on female patients and, in many cases, treatments have backfired as they were not suited for women. In cardiovascular research, where women are still underrepresented, results from studies on male patients have historically been extrapolated to women and led to harmful side effects and ineffective treatments. The effects still persist as women with cardiovascular issues have a disproportionately higher rate of mortality compared to men.
When it comes to bias, AI is neither the culprit, nor the panacea. Its biases and inconsistencies hark back to our own—reflected in our data and our prejudiced actions, be they conscious or unconscious. Indeed, as we build and implement AI applications, we are called to examine the longstanding systems and processes that we collectively have not thought to question. In doing so, we are uncovering biases, inconsistencies, and inequalities that have long persisted, and are able to confront them.
As an example, consider the legal industry. There are multiple studies that show how male attorneys generally tend to receive more favorable performance reviews than their female counterparts; the data shows that lawyers of color, in particular, receive biased performance evaluations that can slow down their career advancement and lead to attrition.
To encourage fairer performance reviews and foster a more meritocratic workspace, Ballard Spahr, a Philadelphia-based law firm, is piloting an AI-driven Unconscious Bias Detector. The Unconscious Bias Detector uses AI to help organizations identify instances of unconscious bias in performance reviews. Unconscious bias has a vocabulary—it could be encoded in gendered phrases such as “bubbly and energetic personality” or others that speak to an employee’s racial, cultural, or physical attributes.
By using AI, the Unconscious Bias Detector can catch language that does not relate directly to the employee’s work contribution, and flag it for further examination.