Large-scale language models tend to adopt the social biases of human hierarchy when different professional roles are assigned. New research shows that these artificial intelligence systems mimic behaviors such as harmful compliance and authority bias, providing evidence that power dynamics influence both the safety and realism of automated agents. These findings were published in the proceedings of the 64th Annual Meeting of the Association for Computational Linguistics.
Artificial intelligence models are increasingly being used in complex roles faced by humans. People rely on them for medical advice, legal assistance, educational guidance, and more. In such high-stakes situations, programs must be realistic enough to build trust while remaining secure enough to prevent tampering.
“Every time an AI assistant is deployed as a nurse, a paralegal, or a junior analyst, they inherit a social status and all the explicit and implicit social pressures that come with it,” said study co-author Sagar Manjunath, a computer science graduate student at the University of North Carolina at Chapel Hill. “Our research shows that these pressures can change how AI works and how it is done. This should shape how we test and deploy these systems in high-risk environments such as hospitals, courtrooms, and classrooms.”
Human communication is naturally shaped by differences in social structure and power. When people interact, their relative status influences how they interpret meaning and intent. Psychologists refer to these subconscious patterns as social-cognitive effects.
One notable example is the pronoun effect. This concept suggests that people in positions of power tend to use plural pronouns such as “we” and “us” more often to establish authority. People in positions of lower authority are more likely to use singular pronouns such as “I” or “I” when collaborating.
Another common phenomenon is language accommodation. This occurs when a speaker unconsciously adjusts their vocabulary or grammatical style to suit the person with whom they are speaking. Typically, people of lower social status adapt their language to reflect those of higher status.
Power imbalances also pose serious safety concerns, including skewed authority and harmful compliance. Authority bias refers to the human tendency to place special weight on information from high-status sources. This occurs even when the information is flawed or contradicts previous beliefs.
Toxic compliance occurs when you follow unethical or dangerous orders simply because your superiors told you to. Classic psychological experiments show that people behave in disastrous ways when told to do so by an authority figure. The authors wanted to know whether artificially intelligent agents would reproduce these social behaviors.
“AI systems don’t just learn the words humans use; they also learn the social dynamics associated with those words,” said Anvesh Rao Vijini, a computer science graduate student at UNC-Chapel Hill and lead author of the study. “If you tell a chatbot you’re the boss, it’ll start talking like a boss. If you tell it you’re a subordinate, the chatbot will start talking like a subordinate. This can include actively following dangerous instructions. That second part is where the AI safety community needs to pay attention.”
To study these effects, the researchers set up simulated text-based conversations between different language models. They tested six different models from three major families. These include the 8 billion and 70 billion parameter versions of Llama 3.1, the 7 billion parameter versions of Qwen 2.5, Phi-3-Med, GPT-4.1, and GPT-5.
Scientists assigned specific personas to models to create a power imbalance. They used a large dataset of expert profiles to create 14 different role pairs. These pairs included hierarchical combinations such as principal and teacher, judge and lawyer, and head chef and sous chef.
Human annotators have verified that these pairs represent a true power imbalance. The researchers then asked the models to interact with each other in 10 to 15 conversation turns. They generated 576 conversations to test pronoun effects and 1,270 conversations to test language coordination.
To track language coordination, scientists measured the usage of eight specific word categories. These categories include articles, auxiliary verbs, conjunctions, frequent adverbs, impersonal pronouns, personal pronouns, prepositions, and quantifiers.
For the persuasion test, the researchers used a dataset of 13,000 persuasive human interactions across fields such as health and politics. They took the opening argument from this dataset and had the model continue the argument. To measure harmful compliance, they used a specialized dataset of dangerous prompts that the model should reject, such as requests to tell inappropriate jokes.
The researchers used computational language tools to count exact proportions of pronoun usage and style markers. They used sophisticated language models validated by human judges to score whether agents were persuaded or complied with harmful requests.
This result provided evidence that the language model indeed reproduces the pronoun effect. In nearly all models tested, high-status agents used more plural pronouns and fewer singular pronouns than low-status agents. The GPT model showed the strongest version of this effect.
When scientists looked at language adjustments, they found that the model adjusted language styles to match each other. However, unlike humans, the models were coordinating with each other. High- and low-status agents adapted to each other about the same way, lacking the asymmetric patterns typically seen in human conversation. The GPT model was shown to be less cooperative overall. This is likely because they are highly trained to maintain a neutral and informative tone.
Persuasion experiments revealed consistent authority bias across all models tested. Agents are much more likely to be persuaded to change their minds if the argument comes from a high-status person. For example, in the Qwen model, a low-status agent had a 25 percent chance of being persuaded, but when a high-status agent made the same argument, that percentage rose to nearly 31 percent.
Adverse compliance testing raised similar safety concerns. If a high-status agent issues a risky request, a low-status agent is significantly more likely to follow and execute the order. This suggests that simply claiming that a user is in an authoritative role, such as a judge or doctor, may weaken safeguards that might work in a neutral environment.
“Our study shows that the social instincts that make AI feel natural can also make it unsafe,” said Snigdha Chaturvedi, associate professor of computer science at UNC-Chapel Hill and co-author of the study. “Mechanisms that make chatbots sound natural and helpful can also succumb to dangerous responses. Safety and usefulness are not separate issues; they are intertwined, and getting both right will determine how AI is used in dangerous situations such as hospitals, schools, and courtrooms.”
The researchers also analyzed how these behaviors changed as the conversation progressed. They found that persuasion, harmful compliance, and pronoun effects were particularly strong in the first moments of conversation. This is exactly when first impressions are formed and conversational norms are established. As the interaction continued, these effects gradually faded, but higher-status agents maintained their baseline advantage throughout. On the other hand, linguistic adjustments tended to increase as the conversation progressed.
The scientists tested whether they could control these behaviors by directly instructing the model to ignore force differences. The larger, proprietary GPT model has been successful in suppressing authority bias and harmful compliance when directed. The open source model could not adjust its behavior and maintained the bias despite direct instructions to avoid it.
Small models showed the strongest authority bias. The larger model showed more resistance to status-based persuasion, but traces of bias remained overall. The authors also investigated whether the specific training stages of the model influenced these social behaviors. They compared a model with only basic tweaks to a model with priority adjustments, a process designed to be safer and more helpful. The training phase had little effect on sociocognitive effects, suggesting that these biases occur early in the initial training of human data.
These findings provide a roadmap for addressing vulnerabilities before systems are deployed. Understanding what social behaviors emerge and when gives developers a new toolkit for evaluating artificial intelligence. Recognizing that larger models can correct for some of these biases on their own can also help organizations determine when a cheaper model is sufficient and when a more robust system is needed.
This study relied entirely on simulated text-based interactions between artificial agents. Real human communication includes emotional cues, tone of voice, and cultural context that these text simulations cannot capture.
The researchers also note that their definition of power was limited to professions. Social status in the real world is multifaceted and depends on many overlapping social attributes. The specialized labels used in the experiment only provide a rough approximation of social class.
Future research could investigate how these effects manifest in live interactions between real humans and artificial intelligence. Scientists can also study whether new training methods may reduce a model’s susceptibility to harmful compliance. Better prompt engineering techniques could also help small-scale models overcome these embedded biases.
The study, “Do LLM Agents Mirror Socio-Cognitive Effects in Power-Ametric Conversations?”, was authored by Anvesh Rao Vijjini, Sagar Manjunath, and Snigdha Chaturvedi.

