New study reveals how humans judge the morality of artificial intelligence

Artificial intelligence systems are increasingly being asked to make recommendations with morally significant consequences, raising questions about how humans decide to trust these artificial advisors. New research published in Computers in human behavior This suggests that people trust AI not only on the basis of whether it behaves friendly or logical, but rather on how well the AI’s communication style fits the seriousness of the situation and the ethical choices it makes.

Researchers Lianxiang Zhang and Mei Ying Zhao wanted to understand how humans perceive the morality of artificial conversational agents. “AI chatbots are increasingly used not only in casual conversations, but also in situations where their recommendations may have moral or social implications, such as medicine, autonomous driving, and national security,” said Zhang, an associate professor at the School of Media and Communication at Shanghai Jiao Tong University.

“This made us interested in how people judge whether an AI system is morally competent and trustworthy—not just whether it gives the right answer, but whether it appears to be reasoning, caring, and communicating appropriately, even in morally difficult situations.”

Two key psychological concepts explain how people view moral decision makers in these digital situations. The first is perceived moral agency, which refers to the belief that an entity can make deliberate and deliberate choices and be held responsible for its actions. The second concept is perceived moral emotion. This includes the belief that entities can feel and express concern, sympathy, and care for others, even if those emotions are simulated by a machine.

Zhang and Zhao wanted to know how different observable behaviors influence these perceptions. They focused on a warm and friendly conversational style, or a very logical and competent conversational style. They also considered specific moral positions that the system might take in difficult situations. The utilitarian position focuses on the greatest good for the greatest number, and in some cases it is necessary to cause small harms to prevent large harms. In contrast, a deontological position strictly follows established moral rules and stipulates that it is always wrong to cause harm, regardless of the potential positive consequences.

To test these ideas, Zhang and Zhao set up an experiment with 447 participants recruited from a Chinese online survey platform called Credamo. The sample consisted primarily of young people living in urban areas, with the majority holding at least a bachelor’s degree. Participants chatted live using a custom artificial intelligence program built using a platform called Coze. This allowed the researchers to script specific responses and control the flow of the conversation.

To establish baseline impressions, participants completed 20 turns of conversation with a chatbot about their personal stress and coping mechanisms. This interaction lasted approximately six minutes and was designed to expose the user to a specific programmed personality. In warm situations, the chatbot used an empathetic and supportive tone, complete with caring expressions and emojis. In the competent condition, chatbots were highly formal, task-focused, and prioritized efficiency and logic.

After this initial conversation, participants read out a specific moral dilemma. The researchers randomly assigned participants to read either a low-severity scenario that involved harm but no risk of death, or a high-severity scenario that involved life or death. The chatbot then provided pre-written responses to the moral dilemmas presented. This response drew on either a utilitarian logic that accepts trade-offs or a rule-based deontological logic that refuses to do harm.

After the interaction, participants filled out a questionnaire that rated the chatbot on a 7-point scale. They assessed the system’s perceived moral agency, its capacity for moral sentiment, and its overall credibility across dimensions of competence, benevolence, and integrity.

This data provides evidence that a single conversational cue alone does not guarantee trust. “One of the findings that surprised us was that conversational style alone did not strongly shape chatbots’ perceptions of moral agency or moral emotions,” Zhang told PsyPost. “Just making a chatbot warmer doesn’t automatically make people think it’s more morally emotional, and just making a chatbot more competent doesn’t automatically make it seem more morally acting.”

Instead, human evaluations relied heavily on how well the chatbot’s tone matched its moral choices. The researchers found that participants considered a system to be more emotionally competent depending on how the system’s personality matched its recommendations.

“Another unexpected finding was that people had slightly higher moral feelings towards the chatbot when the chatbot chose to accept harm to one person in order to save more people than when the chatbot refused to directly harm the individual,” said Chan. “This may sound counterintuitive, since sacrificing one person for the greater good is often seen as callous and calculating. However, participants may have perceived the chatbot’s response not as pure calculation, but as a concern for saving as many lives as possible.”

The danger of the situation also significantly altered these perceptions. “The main takeaway is that a trusted AI is not necessarily the warmest or most professional AI,” Zhang said. “What matters is whether the communication style fits the moral situation.”

In less serious situations, participants tend to view warmer chatbots as more capable of expressing moral sentiment and generally as trustworthy, and treat warmer chatbots as a sign of goodwill. “For example, when the stakes are relatively low, a warmer, more empathetic tone may help people perceive the chatbot as caring and morally considerate,” Chan said.

For high severity scenarios, the pattern changed dramatically. Participants attributed higher moral agency and emotional competence to competent chatbots, especially when chatbots provided utilitarian recommendations that took serious consequences into account. “But when the stakes are high, especially when chatbots recommend difficult trade-offs, such as accepting harm to one person to save more, people may expect a more measured and rational response, with clear explanations and a sense of responsibility,” Chan said. “In other words, trust depends less on whether the AI sounds ‘warm’ or ‘cold’ and more on whether the AI communicates in a way that seems appropriate to the situation.”

However, as with all studies, there are some limitations that should be considered. “One important caveat is that our study does not show that AI systems actually have moral emotions or moral agency,” Zhang said. “We studied user perceptions, whether people ascribe moral feelings or moral agency to chatbots, based on how chatbots communicate and what moral judgments chatbots make.”

This study also relies on relatively short, single-episode interactions with text-based conversational agents. “Trust in real-world AI systems may evolve differently as people use them repeatedly, rely on them over time, or face real-world consequences,” Zhang said.

The participant pool imposes another limitation on the generalizability of the findings. “Our participants were also recruited from an online panel in China and were relatively young, lived in urban areas, and were highly educated, so future studies should test whether the same pattern emerges in other groups and cultural backgrounds,” Zhang said.

“One of the misconceptions we want to avoid is the idea that AI should be more empathetic,” Zhang says. “Our findings suggest that ‘more empathy’ is not necessarily the answer. In morally sensitive situations, people may also need clear explanations, transparency, boundaries between what AI can and cannot do, and accountability.”

In the future, researchers plan to investigate how human-machine trust develops across more diverse demographics and scenarios. “Our long-term goal is to better understand how people form trust in AI systems that participate in socially and morally sensitive interactions,” Zhang said.

“In future research, we hope to study trust in AI in longer, more realistic interactions across different cultures and age groups and in a wider range of sensitive situations, such as health advice, education, emotional support, and public safety. We are also interested in how AI communicates uncertainty, explains difficult trade-offs, and clarifies the limits of its role.”

The authors suggest that developers should prioritize contextual appropriateness over simple familiarity. “One of the broader messages of this study is that AI should not be designed to simply sound more human, more approachable, or more persuasive,” Zhang said. “In morally sensitive situations, the more important question is whether AI can communicate in a way that is commensurate with the gravity of the decision and helps people understand why.”

“As AI becomes part of everyday decision-making, the goal should not be to make people trust AI more, but to help them trust AI smarter,” said Zhang. “That means knowing when AI advice is helpful, when to question it, and when human judgment and responsibility should remain front and center.”

By staying on top of human responsibilities, developers can create tools that provide better support. “In this sense, our research is not only about chatbot design, but also about building a healthier human-AI relationship,” Zhang said. “Ultimately, this line of research is about designing AI systems that are not only useful and persuasive, but also accountable, transparent, and appropriately trusted.”

The study, “Being Relevant, Not Warm or Cold: How Consequence Severity Changes Moral Reasoning and Trust in AI Chatbots,” was authored by Lianshan Zhang and Mei ying Zhao.

Source link

Visited 16 times, 1 visit(s) today

What's Hot

Coffee may protect your heart, but energy drinks can be harmful

Prenatal THC exposure causes teenage binge drinking in female rats

Socioeconomic status shapes brain networks differently in boys and girls

New study reveals how humans judge the morality of artificial intelligence

Prenatal THC exposure causes teenage binge drinking in female rats

Socioeconomic status shapes brain networks differently in boys and girls

Mentalizing ability appears to act as a link between insecure attachment and excessive sexual behavior

Relying on reminders can impair your memory

Imbalance in financial infidelity appears to negatively impact both relationships and financial health

New psychological study reveals ‘Goldilocks law’ for online dating profiles

Coffee may protect your heart, but energy drinks can be harmful

Prenatal THC exposure causes teenage binge drinking in female rats

Socioeconomic status shapes brain networks differently in boys and girls

Wistar scientists develop new bispecific T-cell engager to target ovarian cancer

Our Picks

Wistar scientists develop new bispecific T-cell engager to target ovarian cancer

Swimming may be clinically effective in reducing disability caused by chronic low back pain

Repeated weight gain and loss can accelerate muscle loss

Subscribe to Updates

What's Hot

New study reveals how humans judge the morality of artificial intelligence

Related Posts