A recent report claims that popular AI models are 1.5 times more likely to flag tweets by black users as “offensive” compared to tweets from other users.
Recode reports that while platforms such as Facebook, YouTube, and Twitter are betting on AI technology monitoring their platforms for them, there have been significant issues with how the algorithms detect racist content. Two new studies show that AI software trained to identify hate speech online may actually further increase racial bias.
One of the studies showed that AI models used to process hate speech were 1.5 times more likely to flag tweets as offensive or hateful if they were posted by black people and 2.2 times more likely to flag tweets written in black dialect as offensive. The other study found widespread racial bias against black speech in five popular academic data sets for studying hate speech.
The issue arises from the concept of social context, something which AI is unable to understand. Terms that can often be used as slurs such as the “n-word” or the term “queer” could be offensive used in certain contexts but not in others. The two papers were presented at a recent conference for computational linguistics to show how natural language processing AI can amplify certain biases that people already have.
Maarten Sap, a PhD student in computer science and engineering and an author of one of the papers, stated: “The academic and tech sector are pushing ahead with saying, ‘let’s create automated tools of hate detection,’ but we need to be more mindful of minority group language that could be considered ‘bad’ by outside members.”
Thomas Davidson, a researcher at Cornell University, ran a similar study to Sap’s and commented on his finding stating: “What we’re drawing attention to is the quality of the data coming into these models. You can have the most sophisticated neural network model, but the data is biased because humans are deciding what’s hate speech and what’s not.”
Read the full study in Recode here.