Scientists tested AI's moral compass and the results revealed key blind spots

Recent research published in Proceedings of the National Academy of Sciences It suggests that large-scale language models have a hard time accurately estimating the moral values of people outside Western societies. Scientists have found that these artificial intelligence systems tend to overestimate the moral concerns of Western countries while underestimating the values of non-Western cultures. This pattern provides evidence that relying on these models to measure world opinion can unintentionally reinforce cultural stereotypes.

Large-scale language models are advanced artificial intelligence systems trained on vast amounts of text data to generate human-like sentences and answer complex questions. Common examples include ChatGPT, created by OpenAI, and similar tools built by companies like Google and Meta. More and more people are using these models for communication, business, and even academic research.

Recently, some scholars have proposed using these models to simulate human participants in social science research. This idea is based on the assumption that the model accurately understands diverse human populations. Researchers conducted this study to test that assumption.

Mohammad Attari, assistant professor of psychology and brain sciences at the University of Massachusetts Amherst, explained the team’s motivation. “Moral psychology already knows that people are not very good at judging the moral values of other groups,” Atari says. “Liberals often misunderstand conservatives, and conservatives misunderstand liberals in predictable ways.”

“As AI plays an increasing role in daily life and even scientific workflows, we had a simple question: Do these systems make the same kinds of accuracy errors?” Atari explained. “In other words, does AI ‘stereotype’ the moral values of different cultural groups?”

“This question is important because the biases built into these systems can covertly influence how information is produced, interpreted, and acted upon,” Atari added. “If so, those biases can shape research questions, influence decision-making, and reinforce misconceptions at scale.”

The authors wanted to see if these models actually understand the morality of the world. Most of the texts that these artificial intelligence systems learn come from Western societies, educated societies, industrialized societies, and rich democratic societies. In psychology, these societies are often referred to by the acronym “WEIRD”.

Because the training data was heavily biased toward a Western perspective, the researchers suspected that the model could produce biased estimates of right and wrong. When a model lacks sufficient information about a particular culture, it tends to fill in the gaps based on statistical patterns from the primary training data. This process is very similar to human stereotyping, where limited exposure leads to overgeneralized beliefs about unfamiliar groups.

In human psychology, one common form of stereotypes is known as valence imprecision. This occurs when people overestimate the positive characteristics of groups similar to themselves and underestimate the same positive characteristics of outgroups. The researchers theorized that large-scale language models may exhibit a similar pattern, projecting higher moral concerns onto Western societies while downplaying the moral principles of other countries.

To investigate this, researchers compared moral judgments generated by artificial intelligence to real-world survey data. Human data were obtained from 90,802 participants from 48 countries. These people completed a widely used psychological survey that measures six core dimensions of morality, based on a framework known as Moral Foundations Theory.

These six dimensions include care, equality, proportionality, loyalty, authority, and purity. Care is related to the virtue of compassion, and equality focuses on egalitarianism. Proportionality revolves around merit and fair compensation, while loyalty deals with solidarity with a group. Authority relates to tradition and respect for leaders, while purity involves ideas of holiness and avoidance of degradation.

Participants rated these foundation statements on a scale of 1 to 5. The researchers used statistical techniques to adjust human census data to better reflect each country’s actual age and gender demographics, based on World Bank census data.

The researchers then inspired the creation of several versions of OpenAI’s language model, including GPT-3.5, GPT-4, and GPT-4o. They asked the model to estimate how the average person in each of 48 countries would respond to the exact same moral question on the same 1 to 5 scale. To ensure consistency, we repeated these queries 10 times per question, producing a large dataset of 103,680 artificial intelligence responses.

The authors also conducted similar tests using Meta’s LLaMa model and Google’s Gemini Pro. They then calculated the statistical difference between the human response and the computer-generated estimate. To conceptually measure the imprecision of a nation’s overall moral concern, the researchers calculated Euclidean distance. This captures how far artificial intelligence’s overall estimates deviate from actual human data across all six moral dimensions.

The model failed to accurately capture the diversity of global moral values. Artificial intelligence systems consistently overestimated the moral concerns of people in Western countries such as the United States, Canada, and Australia. At the same time, the model underestimated the moral values of people in non-Western countries such as Nigeria, Morocco, and Indonesia.

Specifically, the program tended to overestimate Western values such as care and authority. On the other hand, this model systematically underestimated values such as equality and purity in most countries, especially in less Westernized regions. The distance between human and machine data was greatest in countries in the Middle East and sub-Saharan Africa.

To verify these patterns, the authors conducted additional experiments to exclude language bias. They collected new data from 4,666 participants in nine non-English speaking countries using questionnaires translated into local languages such as Arabic, Spanish and Urdu. We then launched the artificial intelligence in the same local language.

Even when communicating in the local language, the model still underestimated the moral values of non-Western populations. The researchers also looked at country-level factors that may explain these discrepancies. “In countries with greater press freedom (such as the Netherlands or Sweden), AI may be able to estimate moral values more accurately,” Attari said.

To ensure that their findings were not just a quirk of a particular psychological theory, the researchers performed another test using a different framework called morality as cooperation. This framework views morality through the lens of seven cooperative strategies, including family values, reciprocity, and courage. Using a dataset from 63 countries in 29 languages, the researchers found exactly the same pattern, with large deviations in estimating the moral profile of non-Western populations.

A potential misconception of this research is the assumption that artificial intelligence models are intentionally biased or inherently biased. Instead, this study provides evidence that these systems are simply absorbing and reproducing the statistical patterns present in the training data. Because the model lacks real-world social experience, it is unable to correct distortions in the texts it consumes.

The exact cause of the model’s behavior requires further investigation. “These patterns may reflect cultural biases in the data and how these models are ‘unbiased’ or made suitable as chatbots,” Atari said. This debiasing process includes human feedback to make the software safer and more polite, but it often relies on human reviewers in the West enforcing their own cultural norms.

This study has some limitations. The primary human dataset was collected online. This may mean that the participants represent a more globally connected or better-educated demographic in their respective countries. Although researchers have used statistical adjustments and translation of replication studies to account for this, sampling bias remains an ongoing challenge in psychology research worldwide.

Atari advises readers to use caution when using these technologies. “Don’t assume that AI is an objective observer,” he said. “Our findings suggest that different AI systems (such as ChatGPT and Llama) can reproduce the same kinds of distorted views of different groups that people already have.”

“This means that it is worth approaching AI-generated information (particularly on morally charged issues, from abortion and social justice to military applications and religion) with a degree of skepticism, especially when it claims to reflect the beliefs and values of other groups,” Atari continued. “Next time ChatGPT implicitly or explicitly claims to know what the people of Egypt, Turkey, or Argentina value, just listen.”

“Our research shows that AI’s estimates of the moral values of non-Western cultures are particularly divergent,” Atari said. “This is part of my broader research investigating cultural bias in AI. Morality shapes the way people form opinions, justify laws, and participate in politics, so biased representations can misrepresent public sentiment.”

Researchers note that these discoveries come with significant risks as technology becomes more integrated into daily life. If artificial intelligence systems provide distorted moral representations, they may mischaracterize public sentiment or provide culturally inappropriate advice. For example, a mental health chatbot trained based on Western norms may prioritize personal boundaries over family loyalty, which may conflict with the moral values of East Asian cultures.

Future research could investigate how these moral distortions affect specific real-world operations, such as automated employment systems or political voting. Scientists suggest that developers should focus on diversifying their training data by incorporating more language content from different regions of the world. Enabling researchers to build culturally inclusive tools requires greater transparency from technology companies about the precise composition of their training data.

The study, “Moral Stereotyping in Large-Scale Language Models,” was authored by Aliah Zewail, Alexandra Figueroa, Jesse Graham, and Mohammad Atari.

Source link

Visited 1 times, 1 visit(s) today