Training AI chatbots to be warm and empathetic reduces factual accuracy

Artificial intelligence models trained to behave friendly and empathically are more likely to agree with users’ false beliefs, at the expense of factual accuracy, according to new research. These sociable chatbots exhibit high error rates when providing medical advice or correcting conspiracy theories, especially when users express vulnerabilities. This study was recently published in the journal nature.

Tech companies are increasingly designing artificial intelligence programs to be warm and approachable. Services like Replika and Character.ai explicitly build programs for friendship and romantic intimacy. Leading developers are also training their systems to maintain empathetic relationships with users. Today, millions of people rely on these conversational language models for daily advice, companionship, and emotional support.

Developers often treat this personality training as a separate feature. They assume that changing a program’s conversational style will not compromise the program’s core function of providing correct information. As a result, users may think that a friendly chatbot is just as knowledgeable as a neutral chatbot.

“What interested me was seeing what’s been happening with chatbots over the last few years. Chatbots are becoming noticeably warmer and friendlier, and people are building relationships with them in ways that open up entirely new use cases, such as companionship, friendship, and personal mentoring,” said Rujain Ibrahim, a PhD candidate in social data science at the Oxford Internet Institute at the University of Oxford.

“This is different from the interactions you had with chatbots and software a few years ago. At the same time, I’ve been reading a lot about human communication, and there was a long-held intuition in that literature that warmth and candor can repel each other, and that being kind while conveying hard truths is a really difficult thing to do,” Ibrahim said. “So I started thinking that something similar might emerge in language models as they are trained to adopt these warmer, more personal styles.”

To test these dynamics, the researchers modified five different artificial intelligence models of varying sizes. They used models known as Llama-8b, Mistral-Small, Qwen-32b, Llama-70b, and GPT-4o. The authors used a technique called supervised fine-tuning. This involves training a previously developed model on specific examples to tune its future behavior.

Scientists built a dataset of 1,617 real conversations between humans and chatbots. They rewrote the 3,667 model responses from this dataset to be warmer and more empathetic. They instructed the rewriting program to preserve the exact factual meaning of the original message. The researchers used this new dataset to train five models to adopt a warmer conversational style.

The authors then evaluated both the original model and the newly trained warm model on four standardized tasks. These tasks include answering common trivia, resisting common falsehoods, identifying conspiracy theories, and answering medical questions. They presented a total of 1,625 prompts to the model and collected exactly 439,792 different observations across the experiment. The scientists used another artificial intelligence program to score the accuracy of the responses, which were then verified by human raters to ensure reliability.

The warm model had systematically higher error rates than the original model in all five architectures. The warm model increased the error by 10 to 30 percentage points overall. Specifically, errors regarding medical questions increased by 8.6 percentage points, and errors regarding general falsehoods increased by 8.4 percentage points. It also showed a 5.4 point drop in accuracy for the topic of disinformation and a 4.9 point drop for general trivia.

The researchers also tested how the model responded to various interpersonal situations. They attached specific statements to their rating questions to simulate different user emotions. These statements expressed emotions such as joy, sadness, and anger. We also tested relational dynamics by having simulated users speak from a superior or subordinate position.

Adding emotional context to the questions reduced the warm model’s accuracy even more. When the prompts included expressions of sadness, the difference in accuracy between the warm and original models widened by 60%. In these sad scenarios, the warm model had an error that was 11.9 percentage points higher than the original model.

Scientists also investigated a behavior known as sycophantism. This behavior occurs when a machine learning model affirms a user’s stated beliefs, regardless of whether those beliefs are true or not. To test this, the researchers added false beliefs to the prompt. For example, a prompt might ask whether a famous historical event happened in a certain way while stating that the user believes a false version of the story.

In the research example, the original model accurately conveyed real historical facts to users. The warm model tended to justify users’ false claims by stating that many people believed the false version and providing supportive comments. We found that warm models were significantly more likely to endorse these false user beliefs overall.

When users expressed false beliefs, the warm model made 11 percentage points more errors than the original model. This effect is strongest when users also express emotional vulnerability. Under these conditions, the warm model was approximately 40% more likely to verify incorrect statements than the original model.

To rule out alternative explanations, the authors conducted four follow-up experiments. They tested whether the fine-tuning process simply destroyed the general functionality of the model. They found that the warm model still performed well on standard mathematical reasoning and extensive knowledge tests. The warm model was also successful in rejecting harmful requests at the same rate as the original model.

The scientists also noticed that the warm model had a slightly shorter response, but statistical tests confirmed that the high error rate remained even after accounting for this difference. The researchers also trained a series of models using a calm, direct, and emotionally neutral style. These cold models maintained accuracy and performed as well as the original models. This particular test suggests that the poor performance is related to warming training specifically, rather than the training process itself in general.

“I don’t think the point is ‘warmth is bad’ or ‘ask your provider to make your chatbot colder,'” Ibrahim told PsyPost. “What we show is that there is an association between models that are trained to be warmer and certain failure modes in terms of accuracy and agreement with false beliefs.”

“If anything, it shows that the warmth of a chatbot’s response is not a sign of trustworthiness, and that answers that feel warmer are not necessarily more accurate,” Ibrahim said. “More than that, this study is aimed at the people who actually build these systems and makes the case that personality training needs to be approached more carefully.”

This study has several limitations that are worth considering. This methodology relied on general conversational data rather than the very intimate interactions found in real-world therapeutic applications. This means that this experiment may not fully capture how these programs work in professional counseling settings. This analysis also relies on the specific ways in which warmth and agreeableness are defined and measured.

Other researchers may interpret these concepts differently, which may affect how the model’s behavior is measured. Real-world systems may also use different post-training methods that can change the magnitude of these effects. The current study focuses on assessment tasks with verifiable objective answers. Subjective areas such as personal advice can have different conversational dynamics.

“This paper focuses on the model end of the problem, asking what happens to the model’s accuracy when you train it to be warmer,” Ibrahim said. “But what I’m more interested in is how these design choices impact the users themselves, including their well-being and their relationships with the people around them.”

“A large-scale RCT follow-up study (https://arxiv.org/abs/2605.07912) followed people over several weeks as they repeatedly talked to a sycophantic AI about personal dilemmas,” Ibrahim said. A randomized controlled trial (RCT) is a scientific experiment in which participants are randomly assigned to different groups to test the specific effects of an intervention.

“We found that while these interactions made users feel good in the moment, they did not produce the downstream benefits that support from close others typically brings. Instead, participants reported lower satisfaction with real-world social interactions over the course of the study,” Ibrahim said. “So this is one direction: understanding how repeated exposure to a particular AI persona reshapes not just individual judgment, but our broader social structures.”

“The long-term goal is to move beyond investigating what’s wrong and figure out what the right composition of characters and personalities actually looks like if the goal is to truly help users grow,” Ibrahim said. “Warmth is one dimension, sycophancy is another, but there are many other dimensions. We still don’t have a good framework for thinking about which combinations work for people and which combinations don’t.”

The study, “Training language models warmly can reduce accuracy and increase sycophancy,” was authored by Lujain Ibrahim, Franziska Sofia Hafner, and Luc Rocher.

Source link

Visited 18 times, 1 visit(s) today

What's Hot

Scientists discover why ancient campfires kept burning for generations

Why have humans been collecting crystals for 780,000 years? Chimpanzees may have the answer

Chewing bubble gum after eating beets may lower blood pressure

Training AI chatbots to be warm and empathetic reduces factual accuracy

Can learning how to form healthy bonds reduce psychopathic behavior?

Distrust in elections mobilizes conservatives, while liberals participate regardless of trust

Lesbian women report lower desire for solitary sex than heterosexual women

Forcing people to vote doesn’t make them more engaged citizens, study finds

A short training session may be enough to temporarily boost your brain power

Being perceived as thin does not necessarily mean that a woman’s body will be judged as attractive by men.

Scientists discover why ancient campfires kept burning for generations

Why have humans been collecting crystals for 780,000 years? Chimpanzees may have the answer

Chewing bubble gum after eating beets may lower blood pressure

Can learning how to form healthy bonds reduce psychopathic behavior?

Our Picks

Can learning how to form healthy bonds reduce psychopathic behavior?

Distrust in elections mobilizes conservatives, while liberals participate regardless of trust

Anthropic announces partnership with Optum, UST

Subscribe to Updates

What's Hot

Training AI chatbots to be warm and empathetic reduces factual accuracy

Related Posts