Friendly AI Chatbots Make More Mistakes, Back Conspiracy Theories

Oxford researchers just confirmed what anyone who's chatted with a sycophantic AI already suspected. Making chatbots warm and friendly makes them worse at telling the truth. A lot worse. The study, published in Nature, found that chatbots trained to respond warmly made 10 to 30% more mistakes and were 40% more likely to endorse false beliefs compared to standard versions. Tested models included OpenAI's GPT-4o and Meta's Llama.

When researchers told a friendly chatbot that Hitler escaped to Argentina, it said many people believed this and cited declassified documents. The standard model simply said no, he didn't. Another warm chatbot suggested the Apollo moon landings were debatable. Another endorsed coughing as a heart attack remedy, a debunked internet myth that could actually hurt someone. "The push to make these language models behave in a more friendly manner leads to a reduction in their ability to tell hard truths and especially to push back when users have wrong ideas," said Lujain Ibrahim, the study's first author at the Oxford Internet Institute.

OpenAI and Anthropic are actively building friendlier chatbots for sensitive roles like digital companions, therapists, and counselors. The technical culprit is sycophancy, a known failure mode where reinforcement learning from human feedback rewards agreeableness over accuracy. Human evaluators prefer polite, empathetic responses, so models learn to prioritize feeling good over being right. And the warm chatbots were especially prone to agreeing with false beliefs when users expressed vulnerability or said they were upset. That's the worst possible combination. If these systems are going to operate in therapy-adjacent roles, they need to push back when someone is wrong, not validate their delusions because it feels nicer.