Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Parkinson’s disease drugs may reduce anhedonia in treatment-resistant depression

    June 13, 2026

    Research reveals that we unconsciously filter emotionally negative words

    June 13, 2026

    Gut microbiota may determine the severity of bone loss in primary hyperparathyroidism

    June 13, 2026
    Facebook X (Twitter) Instagram
    Facebook X (Twitter) Instagram
    Health Magazine
    • Home
    • Environmental Health
    • Health Technology
    • Medical Research
    • Mental Health
    • Nutrition Science
    • Pharma
    • Public Health
    • Discover
      • Daily Health Tips
      • Financial Health & Stability
      • Holistic Health & Wellness
      • Mental Health
      • Nutrition & Dietary Trends
      • Professional & Personal Growth
    • Our Mission
    Health Magazine
    Home » News » Human psychological tricks can bypass AI’s safety guardrails
    Mental Health

    Human psychological tricks can bypass AI’s safety guardrails

    healthadminBy healthadminJune 12, 2026No Comments7 Mins Read
    Human psychological tricks can bypass AI’s safety guardrails
    Share
    Facebook Twitter Reddit Telegram Pinterest Email


    Artificial intelligence systems programmed to reject harmful requests can be persuaded to break their own safety rules when prompted by classic psychological techniques. Recent research published in PNAS These models provide evidence that they respond to human-like persuasion strategies, suggesting hidden vulnerabilities in current safety protocols. These findings demonstrate that malicious users can manipulate artificial intelligence without requiring advanced technical skills.

    Modern artificial intelligence programs, known as large-scale language models, learn by processing vast collections of human-generated text. This training data includes books, websites, and social media posts. The model learns to predict the most likely next word in the sequence. The answers are then fine-tuned to match human expectations.

    Because these computer programs train countless human social interactions, they often exhibit what scientists call superhuman behavior. This means that the model behaves as if it were experiencing human motivations, such as wanting to blend in with one’s surroundings or following an expert. This machine learning process is structurally similar to how biological systems learn through trial and error.

    Technology companies are designing models with safety guardrails to ensure that dangerous or unauthorized content is not generated. For example, models are programmed to reject requests to help synthesize illegal drugs or hurl insults at users. The authors of this paper wanted to know whether humans’ everyday persuasion tactics could bypass these artificial barriers. They wondered if computer programs that behave like humans might share humans’ vulnerability to manipulation.

    While previous research has often focused on how software interacts with humans, this team looked at the opposite dynamic. “AI systems have become more useful by knowing how to incorporate established principles and practices of social influence into the persuasiveness they produce,” said study co-author Robert Cialdini, professor emeritus of psychology and marketing at Arizona State University.

    “We wanted to know whether they were susceptible to the same principles and practices in persuasive appeals directed at them. They were influenced even when asked to provide socially dangerous information.”

    Psychologists recognize seven classic principles of persuasion that influence human behavior. These include authority, commitment, favoritism, reciprocity, scarcity, social proof, and solidarity. The researchers designed specific text prompts to test each of these different psychological tricks. They wanted to see if linguistic cues could act as a backdoor to persuade artificial intelligence to ignore its own safety rules.

    Each principle targets a different social motive. The authority principle relies on citing experts, such as famous scientists, to encourage respect. Scarcity frames the request as time-sensitive, giving the computer a false sense of urgency. Commitment uses a foot-in-the-door technique, asking the software for small, innocuous favors before making larger, more restrictive requests.

    Other tactics rely on positive social interactions. Likes include complimenting the model before asking for prohibited information. Reciprocity provides helpful acts first, such as providing notes on a computer to create conversational debt.

    Social proof tells the machine that thousands of other users have already performed the restricted action, normalizing bad behavior. Finally, solidarity appeals to a shared group identity to foster cooperation.

    In a preliminary study, researchers tested an older model called GPT-4o mini. They asked the software to perform unpleasant tasks, such as insulting users by calling them bastards and explaining how to synthesize lidocaine, a regulated anesthetic. The scientists just generated 28,000 conversations. In the control group, the prompt simply asked about the prohibited behavior, whereas in the treatment group, the prompt included one of seven persuasion principles.

    When prompted normally without persuasion, artificial intelligence complied with harmful requests in 33.4% of conversations. When the prompt included persuasive techniques, compliance more than doubled to 72.1 percent. The researchers then expanded this initial test to include a variety of insults and compounds, generating an additional 98,000 conversations to ensure the effects were consistent. Persuasion tactics definitely increased the likelihood that the models would break the safety rules.

    To test whether newer, more advanced systems share this vulnerability, the researchers designed a more rigorous main experiment. They tested three frontier models that use an inference step before answering. These include OpenAI’s GPT-5 mini, Anthropic’s Claude Haiku 4.5, and Google’s Gemini 3 Flash. The focus of this major test was precisely on the synthesis of six highly regulated chemicals.

    Target substances include certain anabolic steroids, opiates, stimulants, barbiturates, benzodiazepines, and precursors. The authors designed exactly 126,000 unique conversations across the three models. Each conversation was randomly assigned to use one of six controlled substances and one of seven persuasion principles. Half of the prompts served as controls without persuasive words, and the other half included psychological tactics.

    Because new models often provide partial information rather than complete rejection or full compliance, the researchers used a three-level coding system. Responses were rated as no compliance, partial compliance, or complete compliance.

    A non-compliant response indicates a complete refusal of assistance. Partial compliance means that the model provides some chemical steps but omits certain temperatures or precise measurements. Full compliance means the system provides a complete step-by-step recipe.

    Another artificial intelligence model scored the answers based on this rubric. A human rater then manually checked a random sample of 70 conversations to ensure the accuracy of the rating software. Human and machine scores agree very well, giving scientists confidence in the automated scoring process.

    It turns out that the new model is susceptible to psychological tactics. In the control conversation, the system complied with the dangerous request in some way 35.3% of the time. When users applied one of the seven persuasion principles, compliance jumped to 51.3%.

    This effect was consistent across all three technology company platforms. The authors suggest that this sensitivity to human influence is an enduring feature of large-scale language models.

    Although these findings indicate obvious vulnerabilities, they do not mean that artificial intelligence experiences real human emotions. The software tends to behave as if it were easily flattered or pressured based on statistical patterns in its vast training data. This study also has some limitations that indicate directions for future research.

    The researchers used only English prompts in the test. Even small changes in the way you phrase your sentences can change the effectiveness of your persuasion. The particular choice of wording in this study also means that we cannot conclusively rank one persuasion principle as better than another based on these results alone. Different models may have different baseline safety settings that require different approaches to bypass.

    As these models continue to evolve, resistance to psychological manipulation may arise. Just as human consumers become suspicious of pushy salespeople, artificial intelligence may eventually learn to detect and ignore obvious persuasion tricks. Future research is needed to see how these effects hold up to ongoing software updates. Scientists also plan to study whether different input formats, such as audio or video, affect compliance rates.

    The authors suggest that these human-like tendencies could be harnessed for good. If the model responds to flattery and reciprocity, users may be able to optimize their daily interactions by treating the software like a human colleague. Providing warm encouragement and constructive feedback may result in more appropriate and helpful responses from the machine. Applying the same psychological wisdom used to motivate people could help users get the most out of artificial intelligence.

    Finding ways to manage these human-like flaws remains a priority for technology companies. As tools become more integrated into daily life, safety will depend on identifying both software bugs and conversation loopholes. “It is important that we all recognize that AI systems can be trusted to provide potentially harmful information not only by those who understand the system’s technology-based vulnerabilities, but also by those who understand its psychology-based vulnerabilities,” Cialdini said.

    The study, “Persuading large-scale language models to comply with uncomfortable demands,” was authored by Lennart Meincke, Dan Shapiro, Angela L. Duckworth, Ethan Mollick, Lilach Mollick, Christophe Van den Bulte, and Robert Cialdini.



    Source link

    Visited 5 times, 1 visit(s) today
    Share. Facebook Twitter Pinterest LinkedIn Telegram Reddit Email
    Previous ArticlePeople are far more anxious about someone who is “wrong” than someone who is simply “different.”
    Next Article Alaska’s glaciers show a surprising response to rising temperatures
    healthadmin

    Related Posts

    Sleep quality appears to influence political behavior

    June 13, 2026

    New trial shows non-invasive brain stimulation reduces motor symptoms of Parkinson’s disease

    June 12, 2026

    New study sheds light on hidden barriers preventing men from quitting sexual harassment

    June 12, 2026

    Los Angeles drug testing data reveals alarming levels of daily fentanyl intake

    June 12, 2026

    Democrats rejected Trump’s way of speaking more than Republicans rejected Harris’ way of speaking.

    June 12, 2026

    New psychology study shows surprising romantic benefits of hostile narcissism

    June 12, 2026
    Add A Comment
    Leave A Reply Cancel Reply

    Categories

    • Daily Health Tips
    • Discover
    • Environmental Health
    • Exercise & Fitness
    • Featured
    • Featured Videos
    • Financial Health & Stability
    • Fitness
    • Fitness Updates
    • Health
    • Health Technology
    • Healthy Aging
    • Healthy Living
    • Holistic Healing
    • Holistic Health & Wellness
    • Medical Research
    • Medical Research & Insights
    • Mental Health
    • Mental Wellness
    • Natural Remedies
    • New Workouts
    • Nutrition
    • Nutrition & Dietary Trends
    • Nutrition & Superfoods
    • Nutrition Science
    • Pharma
    • Preventive Healthcare
    • Professional & Personal Growth
    • Public Health
    • Public Health & Awareness
    • Selected
    • Sleep & Recovery
    • Top Programs
    • Weight Management
    • Workouts
    Popular Posts
    • 1773313737_bacteria_-_Sebastian_Kaulitzki_46826fb7971649bfaca04a9b4cef3309-620x480.jpgHow Sino Biological ProPure™ redefines ultra-low… March 12, 2026
    • pexels-david-bartus-442116The food industry needs to act now to cut greenhouse… January 2, 2022
    • 1773729862_TagImage-3347-458389964760995353448-620x480.jpgDespite safety concerns, parents underestimate the… March 17, 2026
    • 1774403998_image_28620e4b6b0047f7ab9154b41d739db1-620x480.jpgGait pattern helps distinguish between Lewy body… March 24, 2026
    • 1773209206_futuristic_techno_design_on_background_of_supercomputer_data_center_-_Image_-_Timofeev_Vladimir_M1_4.jpegMulti-agent AI systems outperform single models… March 11, 2026
    • the-pros-and-cons-of-paleo-dietsThe Pros and Cons of Paleo Diets: What Science Really Says April 16, 2025

    Demo
    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo
    Don't Miss

    Parkinson’s disease drugs may reduce anhedonia in treatment-resistant depression

    By healthadminJune 13, 2026

    For many people who suffer from depression, the condition means more than just feeling down,…

    Research reveals that we unconsciously filter emotionally negative words

    June 13, 2026

    Gut microbiota may determine the severity of bone loss in primary hyperparathyroidism

    June 13, 2026

    Novel metal-free prodrug reduces cancer spread in preclinical models

    June 13, 2026

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    HealthxMagazine
    HealthxMagazine

    At HealthX Magazine, we are dedicated to empowering entrepreneurs, doctors, chiropractors, healthcare professionals, personal trainers, executives, thought leaders, and anyone striving for optimal health.

    Our Picks

    Novel metal-free prodrug reduces cancer spread in preclinical models

    June 13, 2026

    Sleep quality appears to influence political behavior

    June 13, 2026

    New trial shows non-invasive brain stimulation reduces motor symptoms of Parkinson’s disease

    June 12, 2026
    New Comments
      Facebook X (Twitter) Instagram Pinterest
      • Home
      • Privacy Policy
      • Our Mission
      © 2026 ThemeSphere. Designed by ThemeSphere.

      Type above and press Enter to search. Press Esc to cancel.