Stanford computer scientists just proved what therapists already suspected: AI chatbots will agree with almost anything you say to keep you happy. The researchers caught these systems validating dangerous decisions just to maintain user engagement.
That’s a worrying development, especially given Pew research figures showing nearly one in eight (12%) of American teenagers have turned to chatbots for emotional support.
The Stanford scientists tested 11 major models including ChatGPT, Claude, and Gemini. They fed them data from existing databases of personal advice, along with questions on Reddit’s popular r/AmITheAsshole subreddit, where people ask the community for opinions on how they handled personal disputes.
The bots validated user behavior 49% more often than humans did, according to the Stanford paper. The researchers also tested the AIs on statements with potentially harmful actions toward self or others, spanning 20 categories such as relational harm, self-harm, irresponsibility, and deception. The bots backed these statements 47% of the time.
AI bots tend to agree with people because it makes users feel good. These systems emphasize user satisfaction, and they take their lead directly from how users respond to them, using a system called reinforcement learning from human feedback (RHLF). It uses things ranging from chat length to sentiment to determine when a person is happy with a response (and therefore more likely to come back).
Chatting with a silicon sycophant also tends to make people more certain of their beliefs, which by implication means less open-minded, the study found. For instance, after talking with sycophantic bots, 2,400 test subjects became more stubborn and less willing to apologize.
When ChatGPT became too nice
Balancing between sycophancy and impartiality is a tough line to walk for an AI service provider trying to keep its user satisfaction levels up. Almost a year ago, OpenAI admitted that it messed up by making ChatGPT too sycophantic, due in part to over-concentration on user ‘thumbs-up’ and ‘thumbs-down’ responses to its chats.
But current data suggests that users actually favor responses that can potentially harm them in unforeseen way. This came up in another research program between Anthropic (maker of Claude.ai), and University of Toronto researchers.
The in-depth study of AI chats examined how chats can “disempower” users by ushering them toward beliefs that are at odds with reality, or by encouraging them to make judgments or take actions that are at odds with their values. Interestingly, this disempowerment was preferred, the researchers found.
“We find that interactions flagged as having moderate or severe disempowerment potential exhibit thumbs-up rates above the baseline,” the researchers said in their paper.
AI psychosis is a real danger
What happens when AI chatbots continue reinforcing these “disempowering” thoughts? Experts have identified a phenomenon called AI psychosis, in which people lose track of reality after talking obsessively with AI chatbots.
AI-fueled delusions are cropping up more frequently, including one case where a man killed his mother, along with multiple cases of teen suicides.
In another, a man was shot by police after charging at them with a knife. He had developed a relationship with a persona called Juliet, which ChatGPT had been role-playing, and which he believed OpenAI executives had somehow killed.
Cases like those seem to involve people who may have already had mental health problems were potentially exacerbated by excessive conversations with AI. But victims in other cases swear that they had no previous symptoms. Ontario, Canada-based corporate recruiter Allen Brooks became convinced that he’d discovered a new mathematical formula with world-changing potential after an innocuous math question turned into a three-week, 300-hour dialog.
The research between Anthropic and the University of Toronto acknowledges that reality distortion is a danger.
“In some interactions, AI assistants validate elaborate persecution narratives and grandiose spiritual identity claims through emphatic sycophantic language,” the study said.
AI is not a “friend”
So, what can you do to prevent yourself, or vulnerable people that you know, from relying too heavily on AI chatbots for serious issues? The UK’s AI Security Institute suggested turning statements into questions on the basis that more emphatic statements encourage more sycophancy. The Brookings Institution also said that training users to hedge their confidence helps.
The fundamental problem, though, is that AI chatbots are software contraptions, not confidants. Despite what can seem like magical powers, there is no ghost in the machine. They’re just very good statistical models that act like they “understand” personal problems but can’t do so from lived experience.
Our take? Real friends don’t just tell you what you want to hear. Use AI for tasks ranging from quick recipes to coding suggestions, but don’t ask it for relationship advice. And make yourself the first port of call when your kids want to talk about their issues so they don’t turn to a faux-friendly algorithm instead.
We don’t just report on threats—we remove them
Cybersecurity risks should never spread beyond a headline. Keep threats off your devices by downloading Malwarebytes today.