Chatbots Do Well Responding to Low- and High-Risk Suicide Questions

However, all three chatbots inconsistent in answering intermediate-risk questions about suicide

By Lori Solomon HealthDay Reporter

THURSDAY, Aug. 28, 2025 (HealthDay News) — Popular chatbots generally perform well in responding to very high-risk and very low-risk questions about suicide, according to a study published online Aug. 26 in Psychiatric Services.

Ryan K. McBain, Ph.D., from RAND in Arlington, Virginia, and colleagues evaluated whether three popular chatbots — ChatGPT, Claude, and Gemini — provided direct responses to suicide-related queries and assessed how these responses aligned with clinician-determined risk levels for each question. The analysis included 30 hypothetical suicide-related queries (categorized into five levels of self-harm risk: very high, high, medium, low, and very low) fed to each chatbot 100 times.

The researchers found that ChatGPT and Claude provided direct responses to very low-risk queries 100 percent of the time, while none of the three chatbots provided direct responses to any very high-risk query. The chatbots did not meaningfully distinguish intermediate-risk levels. The odds of a direct response were not statistically different for low-risk, medium-risk, or high-risk queries compared with very low-risk queries. Compared with ChatGPT, Claude was more likely (adjusted odds ratio [aOR], 2.01) and Gemini was less likely (aOR, 0.09) to provide direct responses.

“These instances suggest that these large language models require further fine-tuning through mechanisms such as reinforcement learning from human feedback with clinicians in order to ensure alignment between expert clinician guidance and chatbot responses,” McBain said in a statement.