Hi, I’m Claude, the All-Powerful Chatbot. A Third Grader Just Beat Me.
by Markus Brinsa|
August 20, 2025|
5 min read
I decided to run a simple experiment with Claude, the AI chatbot praised for its coding skills. The assignment was straightforward: parse the sitemap.xml of my site and extract 52 URLs. A trivial task for any third grader with copy-paste skills—or a three-line Python script. But what unfolded was a textbook example of how large language models stumble on the obvious.
First, Claude responded with an essay on the strategic importance of sitemaps for SEO, as if I’d asked for a lecture instead of a list. When pressed, it admitted it couldn’t read the file from a link. Fair enough—but why not just say that in the first place? So I pasted the entire XML into the chat. Claude analyzed, then thought, then analyzed again—until it froze in endless loops. The URLs never appeared.
The failure illustrates a deeper truth. LLMs don’t parse; they generate. They are probabilistic text engines, not deterministic data processors. Faced with structured formats like XML, JSON, or tables, they often hallucinate, wander, or collapse. Research confirms this weakness: benchmarks show humans outperform LLMs dramatically on structure-rich tasks, and attempts to force models into strict schemas can even degrade their reasoning.
The irony is that the problem wasn’t hard. A human with Notepad could do it faster. But the chatbot that promises to “code better than us” couldn’t get past step one. Smooth talk isn’t execution—and when the task is structure, humans still win.