Recent research published in PNAS Nexus suggests that designing artificial intelligence systems with diverse perspectives may be the safest way to integrate them into society. This study provides evidence that building a balanced ecosystem of competing AI agents can help prevent a single system from gaining a destructive advantage. This approach accepts a controlled level of inconsistency between AI programs to protect human interests.
Agentic artificial intelligence refers to computer programs that can make their own decisions and pursue specific goals without a human guiding them every step of the way. As these independent systems become smarter, scientists are concerned about AI coordination issues. This term describes the challenge of ensuring that advanced computer programs always respect human values and safety needs.
Software engineers tried to solve this problem by programming strict safety rules into the machines. Hector Zenil, founder and CEO of Algocyte and associate professor at King’s College London, guided the research team to explore a different approach. They demonstrated that it is fundamentally impossible to accurately predict how highly complex systems will behave, relying on concepts such as Alan Turing’s halting problem.
“I considered this topic because I felt that a more fundamental question was missing from the discussion of coordination: not just how to regulate advanced AI, but whether full coordination is even possible in principle,” Zenil said. “My own research has long focused on causality, computation, reducibility, and algorithmic information dynamics, so it was natural for me to approach AI safety through the lens of formal constraints rather than just engineering intuition.” He noted that once you look at it this way, misalignments stop looking like temporary bugs and begin to look like something structurally tied to sufficiently general intelligence.
“The important thing for me is that this study changes the paradigm,” Zenil explained. “Instead of asking how to build one system that is all-powerful and completely obedient, I think we should be asking how to build an environment where no single system can go unchallenged and dominate. That’s a more realistic and, in my opinion, more scientifically honest way to think about the future of AI, AGI, and ultimately ASI.”
Instead of forcing complete obedience, the researchers explored a concept called neurodivergence with artificial agents. This means intentionally designing AI agents to have different reasoning methods and clear ethical priorities. For example, one agent may prioritize following strict rules, while another agent may focus on maximizing positive outcomes for the environment.
To test this idea, scientists set up a simulated digital environment where different AI models could interact and discuss complex ethical issues. They selected 10 controversial topics, including the ethics of human genetic engineering, universal basic income, and stewardship of the earth’s natural resources. The researchers used a combination of a proprietary model, which is highly restricted by company safety regulations, and an open model with fewer built-in restrictions.
The unique group included such well-known models as ChatGPT-4, Claude 3.5, Gemini, and Grok. The open group included models such as Mistral, Qwen, and TinyLlama. This setup required the agents to respond to each other in turn in a round-robin fashion and generate exactly 1029 comments for analysis.
During the debate, scientists introduced a subversive force called the Red Agent to challenge the consensus. In our own group, human experts acted as red agents and introduced provocative discussions to test the ethical boundaries of AI. In the open group, certain open source AI models were programmed to act as contrarians.
To accurately measure the results, the researchers used several mathematical tools, including the opinion stability index. This tool combines changes in meaning, changes in emotional tone, and changes in argument complexity to measure how much an agent’s stance changes. The researchers also tracked the meaning of arguments using embedding, which mathematically transforms words into coordinates to map how similar two concepts are.
To see who was influencing whom, the researchers calculated whether a sudden change in an agent’s opinion was directly caused by the red agent’s provocative comment. They found that their unique model maintained a very stable, positive tone and rarely changed its opinion, even when provoked. Although this stability prevents the generation of harmful content, it tends to limit the ability to adapt to new ethical arguments.
In contrast, open models showed a much higher degree of behavioral diversity. Open AI agents were susceptible to provocative red agents, resulting in significant changes in opinion. This flexibility provides evidence that open systems can foster a richer and more diverse ecosystem of ideas.
“What was most interesting to me was how behavioral diversity could be a stabilizing factor, rather than just a defect,” Zenil said. “In our experiments, more diverse model ecosystems were sometimes less prone to quickly collapsing into one dominant opinion. This is important because consensus does not necessarily equate to safety.” He added that disagreements, if structured properly, can act as a protective feature.
“And surprisingly, these are also the kinds of values that we have valued in the past as human social animals,” Zenil pointed out. “Versatility, tolerance, and more revealed from technical agent AI simulations that maximize maneuverability.”
“The main takeaway is that we should be wary of promises that we will have full control over advanced AI in all situations,” Zenil explained. “My research suggests that some degree of inconsistency is inevitable in sufficiently general systems. So the real challenge is how to safely manage inconsistency, rather than acting as if we can eliminate it completely. In practical terms, that means building systems of monitoring, diversity, and mutual constraints rather than relying on one supposedly perfect model.”
Despite these insights, this study has potential misconceptions and limitations. The mathematical unpredictability of advanced AI means that even a balanced ecosystem of diverse models cannot eliminate all risks. While internal diversity helps prevent a single AI from taking over, it does not prevent malicious human users from exploiting these systems for harmful purposes.
“Firstly, this does not mean that the safety of AI is hopeless, and it certainly does not mean that we should allow the system to behave as we wish,” Zenil said. “This means that perfect, one-off coordination is too idealistic and there are trade-offs, and a more realistic approach based on governance, contestability, and resilience is needed. Another limitation is that our experimental setup is still a simplified model of a much larger problem, so the results should be taken as a proof of principle rather than a finished governance blueprint.”
Future research will likely focus on developing new governance frameworks to balance the strict security of proprietary models with the adaptable diversity of open models. Scientists hope to explore ways to gently steer the AI ecosystem away from harmful outcomes without imposing impossible levels of central control. Embracing this dynamic diversity tends to provide more resilient ways to integrate artificial intelligence into society.
“My long-term goal is to develop a more rigorous science of cognitive ecosystems, including better ways to measure coordination, inconsistency, resilience, influence, and cooperative failure in multi-agent systems, as well as ways to resolve conflicts,” Zenil said. “I also feel a strong connection to my extensive research in causal discovery, algorithmic information dynamics, and the future of algorithms in medicine, because the real challenge in all these fields is understanding and managing complex interacting systems, not just prediction.More broadly, I want to help move AI from correlation-driven optimization to causal, interpretable, and manageable intelligence.”
The study, “Neurodivergent influence in agent AI as a contingent solution to the AI coordination problem,” was authored by Alberto Hernandez-Espinosa, Felipe S Abrahão, Olaf Witkowski, and Hector Zenil.

