Article imageLogo
Chatbots Behaving Badly™

Ninety-Five Percent Nothing - MIT’s Brutal Reality Check for Enterprise AI

By Markus Brinsa  |  September 5, 2025

Sources

The headline that launched a thousand “our AI strategy is fine” memos

MIT just threw a cold bucket of water on the hype parade: according to its new GenAI Divide: State of AI in Business 2025 report, roughly 95% of enterprise generative-AI pilots deliver zero measurable ROI. In other words, most corporate AI isn’t failing quietly — it’s failing expensively. The claim ricocheted through business media, sent “AI bubble” think pieces into overdrive, and allegedly helped nudge tech stocks downward. The uncomfortable part is not that models are weak; it’s that businesses still haven’t figured out how to turn shiny demos into P&L. 

Wait — is the 95% number real, or just spectacular clickbait?

The report is attributed to MIT’s NANDA initiative at the Media Lab. MIT’s own site flagged Fortune’s coverage, lending it legitimacy. But access to the full PDF is gated; summaries and secondary write-ups have done most of the public messaging. Some analysts have pushed back on the methodology and the sweeping nature of the conclusion. That skepticism is healthy — and familiar in a field that loves big, round numbers — but even the critiques tend to agree on the bigger truth: most enterprise AI still isn’t paying its own way. Treat 95% as a directional reality check, not gospel — and note that it aligns with the daily experience of teams stuck in pilot purgatory. 

What’s actually failing: not intelligence, but integration

If you read past the headline, the diagnosis is painfully specific. The majority of enterprise AI tools — custom builds and vendor “platforms” alike — aren’t learning from feedback, don’t retain context across sessions, and don’t fit how people really work. When systems can’t remember, managers won’t trust them with high-stakes tasks. So pilots linger on the edge of production, soaking up budget and goodwill until everyone quietly moves on. The report calls this a “learning gap.” It’s less about model horsepower than the stubborn lack of persistent memory, workflow fit, and outcomes-based measurement. 

The verification tax: when “AI time savings” disappear into double-checking

Here’s another inconvenient truth: employees end up re-doing what the model did, to make sure it didn’t hallucinate, fabricate, or miss the obvious. That verification tax turns promised productivity gains into net losses. Leaders who’ve been enamored with benchmarks discover that human trust, not token accuracy, decides whether a tool is truly helpful. That erosion of trust is precisely why pilots stall and dashboards stay red. 

Shadow AI is thriving — which makes enterprise AI look even worse

There’s a twist: people love general-purpose tools such as ChatGPT and Copilot for personal work. Adoption is massive at the individual level, but that comfort hasn’t translated into enterprise-grade value. In fact, it’s raising the bar. Once teams get a taste of fluid AI in their browser, the brittle internal tool with no memory and a six-click UX feels like a step backward — and they refuse to use it. That’s not “employee resistance”; it’s user judgment. 

The five-percent club: what the winners do differently

The handful of organizations getting real money on the board share a pattern that feels almost boring in its pragmatism. They pick problems that actually matter to the business. They define success as dollars saved or earned, before they build. They integrate into existing systems instead of bolting on a novelty app. And they focus where returns are provable — think back-office document flows, service operations, and vendor-heavy processes — rather than chasing flashy front-office experiments that crumble under scrutiny. When they buy, they demand accountability; when they build, they treat change management as part of the product. Meanwhile, they accept friction as the price of learning rather than a sign to quit. That’s how you cross the gap from a demo to a dividend. 

The market heard the message — maybe too loudly

Within days of the report’s circulation, coverage tied it to a broader wobble in AI-exposed stocks and renewed “is this a bubble?” anxiety. Whether you believe the market move was cause-and-effect or coincidence, the sentiment shift is instructive: investors are getting tired of PowerPoints that can’t find their way to production. Narratives change fast when numbers don’t. 

No, this doesn’t mean AI is over

A useful correction is not the end of a technology cycle; it’s the end of a fantasy. The report’s most provocative line isn’t the 95%. It’s the implication that memory, adaptation, and workflow fidelity are the real bottlenecks to enterprise value. You don’t fix that with a bigger model alone. You fix it with systems that learn in context, improve with use, and disappear into the flow of work. If that sounds like the emerging “agentic” approach everyone’s buzzing about, that’s because it is — and it’s where the next serious gains will come from. 

So what now? Fewer proofs, more profits

If you’re a leader who just discovered your pilots belong to the 95%, the remedy is not another pilot. It’s an audit of where value actually lives in your operation. Follow the invoices. Anywhere you pay outsiders to do pattern-driven work, there’s a near-term AI opportunity. Anywhere your teams already double-check AI output, there’s a trust problem begging for design fixes and grounded evaluation. And anywhere success is defined by a slide, not a savings line, you already know the ending.

The enterprise playbook that works looks less like “AI transformation” and more like operations reform with AI inside. Put a price on accuracy and a number on trust. Tie your vendors to outcomes, not feature roadmaps. Stop buying platforms that freeze on contact with reality. And please, retire the demo-to-nowhere pipeline; your budget has suffered enough.

A note on sources — and why nuance matters

Forbes, Fortune, and others amplified the findings; MIT’s own news page points to that coverage, and the NANDA site confirms the program behind the report, though the full PDF isn’t openly posted. Product-side analyses have echoed the headline and the underlying problems of brittle workflows and stalled adoption. Meanwhile, a few commentators have questioned the breadth and precision of the 95% claim. Take the number as a credible, directional warning — and let your own results be the judge

Further reading for skeptics in Legal, Finance, and your board

Fortune’s original breakdown of the “95% fail” stat; Mind the Product’s summary of the report’s methods and findings; Forbes’ exploration of what the successful 5% actually did; and coverage tying the study to broader market jitters. If nothing else, it’s a reminder that the only AI story that matters is the one you can measure. 

About the Author

Markus Brinsa is the Founder and CEO of SEIKOURI Inc., an international strategy consulting firm specializing in early-stage innovation discovery and AI Matchmaking. He is also the creator of Chatbots Behaving Badly, a platform and podcast that investigates the real-world failures, risks, and ethical challenges of artificial intelligence. With over 15 years of experience bridging technology, business strategy, and market expansion in the U.S. and Europe, Markus works with executives, investors, and developers to turn AI’s potential into sustainable, real-world impact.

©2025 Copyright by Markus Brinsa | Chatbots Behaving Badly™