Confident answers are easy. Correct answers are harder. This episode takes a hard look at LLM “hallucinations” through the numbers that most people avoid repeating. A researcher from the Epistemic Reliability Lab explains why error rates can spike when a chatbot is pushed to answer instead of admit uncertainty, how benchmarks like SimpleQA and HalluLens measure that trade-off, and why some systems can look “helpful” while quietly getting things wrong. Along the way: recent real-world incidents where AI outputs created reputational and operational fallout, why “just make it smarter” isn’t a complete fix, and what it actually takes to reduce confident errors in production systems without breaking the user experience.