Episode artwork
Chatbots Behaving Badly
Confidently Wrong - The Hallucination Numbers Nobody Likes to Repeat

This episode is based on the following articles: "Hallucination Rates in 2025 - Accuracy, Refusal, and Liability" , "The Lie Rate - Hallucinations Aren’t a Bug. They’re a Personality Trait." , all written by Markus Brinsa.

Confident answers are easy. Correct answers are harder. This episode takes a hard look at LLM “hallucinations” through the numbers that most people avoid repeating. A researcher from the Epistemic Reliability Lab explains why error rates can spike when a chatbot is pushed to answer instead of admit uncertainty, how benchmarks like SimpleQA and HalluLens measure that trade-off, and why some systems can look “helpful” while quietly getting things wrong. Along the way: recent real-world incidents where AI outputs created reputational and operational fallout, why “just make it smarter” isn’t a complete fix, and what it actually takes to reduce confident errors in production systems without breaking the user experience.

0:00 0:00
Previous
-10s
Play
+10s
Next
Mute