The American Medical Association (AMA) on Thursday urged federal lawmakers to strengthen protections amid the increasing use of artificial intelligence chatbots for mental health.
The organization sent a letter to the co-chairs of the Congressional Caucus on Artificial Intelligence (PDF), the Congressional Caucus on Digital Health (PDF), and the Senate Caucus on Artificial Intelligence (PDF). The group praised the efforts of lawmakers to “promote the debate about the role of AI in society and mental health,” but said the rise of mental health chatbots, including reports encouraging self-harm and privacy violations, “highlights the urgent need for clear guardrails.”
Safety measures recommended by the AMA include:
- Strengthen transparency standards and penalize fraudulent practices such as systems impersonating licensed clinicians
- Create a modern risk-based oversight framework and clarify when an AI tool qualifies as a medical device
- Require continuous safety monitoring and adverse event reporting
- Require strict data protection standards
The AMA uses the term “augmented intelligence” when referring to AI to emphasize the technology’s complementary role in healthcare.
“AI-enabled tools have the potential to expand access to mental health resources and support innovation in health care delivery, but they lack consistent safeguards against serious risks such as emotional dependency, misinformation, and inadequate crisis response,” AMA CEO John White, MD, said in a statement. “With thoughtful oversight and accountability, policymakers can support innovation so that technology prioritizes patient safety, strengthens public trust, and responsibly complements, rather than replaces, clinical care.”
A March Rock Health survey found that 32% of respondents use AI chatbots to find health information, and 28% of AI users report using chatbots to manage mental health and stress. Despite this growing reliance, Massachusetts General Brigham researchers found that publicly available generative AI models often fail to adequately navigate diagnostic situations.
All 21 large-scale language models (LLMs) analyzed in the study achieved an accurate final diagnosis more than 90% of the time, but all models failed to generate an appropriate differential diagnosis more than 80% of the time. The researchers say they stress that AI models should “augment, not replace, physician reasoning.”

