There’s a certain tone medical-device companies use when they add AI to something that already exists. It’s the tone of a person who just installed a smart thermostat and now speaks fluent “revolution.” Everything becomes faster, safer, more precise, more modern. The future arrives in a press release, usually with a glossy video and at least one sentence that sounds like it was written by a blender full of buzzwords.
Then reality shows up wearing scrubs.
A recent Reuters investigation threads together something the AI industry hates to discuss in public: the gap between “AI-enhanced” and “clinically trustworthy,” especially when the tool isn’t recommending a movie—it’s guiding instruments inside a patient’s head.
The story Reuters tells isn’t “AI is evil” or “doctors are careless.” It’s messier—and more useful than a morality play.
A sinus-navigation system called TruDi (originally marketed by Acclarent, later tied to Johnson & Johnson and then acquired by Integra LifeSciences) was updated with machine-learning features. After that, regulators received a lot more reports describing malfunctions and adverse events associated with the device’s use. Reuters reports allegations that, in some cases, the system misinformed surgeons about where instruments were located during procedures, with reported injuries that include cerebrospinal fluid leaks, skull-base punctures, and strokes.
If you’re waiting for the clean “AI caused this” sentence, you don’t get it—because that’s not how these systems fail in the real world. Device-event reports are often incomplete. They’re not designed to prove causality. They’re a smoke alarm, not a forensic report. Sometimes they ring because there’s a fire. Sometimes they ring because someone burned toast. The problem is: in an operating room, toast does not exist.
Here’s the part that should make any sober adult sit up: the allegations that AI was pushed as a marketing tool, and that internal goals like “80% accuracy” were considered acceptable for certain new features before integration.
Think about what “80%” means in your normal life. If your music app guesses your mood wrong two out of ten times, you shrug. If your spellcheck makes you look mildly illiterate in two out of ten emails, you blame autocorrect and move on. If a system that helps orient surgical tools inside the head is wrong two out of ten times, the word “upgrade” becomes performance art.
To be fair, not every feature is making a binary life-or-death call. But that’s exactly the point: once you slap AI into a workflow, people stop being able to tell which parts are “assistive” and which parts are “authoritative.” A screen in an OR doesn’t feel like a suggestion. It feels like reality with a UI.
Modern AI products behave like software because they are software. They iterate. They patch. They add features. They get “improved.” And that cadence collides head-on with how medical devices have traditionally been regulated, documented, and monitored.
The FDA’s own materials are blunt about what the public list of AI-enabled devices is—and what it is not. It’s a transparency effort, not a comprehensive registry, and it relies on what manufacturers and summaries disclose. The FDA also signals that it’s still working on how to identify and tag devices that incorporate foundation models and LLM-based functionality in the future. Translation: we’re still figuring out how to label what you’re selling, even as you sell more of it.
Reuters adds another uncomfortable layer: the flood. The number of authorized AI-enabled devices has grown dramatically, and the agency’s ability to keep pace is strained. You don’t need a conspiracy. You just need math: more products, more complexity, more edge cases, more incentives to move fast.
If you want a single metric that cuts through vibes, look at recalls. Not because recalls prove malice or incompetence, but because they reveal the practical burden of reality: the moment the real world starts disagreeing with the brochure.
A research letter in JAMA Health Forum looked at AI-enabled medical device recalls and found a sizable share occurred within the first year after clearance—an early-life failure pattern that should make anyone nervous about “rushed to market” incentives.
This isn’t a gotcha. It’s the normal consequence of deploying complex systems without strong clinical validation, robust post-market surveillance, and strict change control. In consumer software, the penalty is a bug report and an annoyed user. In medicine, the penalty can be a patient.
The most seductive lie in “AI-assisted medicine” is that the model is the product. It isn’t. The product is the system: training data, validation, user training, UI design, alerting behavior, logging, update procedures, and the rules for when the machine is allowed to speak with confidence.
When something goes wrong, everyone immediately looks for a villain: the surgeon, the vendor, the regulator, the algorithm. But the repeatable pattern in stories like this is organizational: unclear accountability paired with a technology that produces outputs people psychologically over-trust.
In other words: the tool doesn’t have to be “super intelligent” to be dangerous. It just has to be confident enough that humans stop treating it like a tool.
The fix is not “ban AI from medicine.” That’s fantasy. The fix is to treat AI-enabled behavior like a high-risk subsystem with governance that can survive audits, lawsuits, and bad days.
That means clinical validation that reflects real operating conditions, not demo conditions. It means human factors testing that anticipates how clinicians actually behave under time pressure. It means post-market monitoring that can detect drift, UI confusion, and systematic error patterns quickly. It means change control that treats “model update” as a safety event, not a product launch. And it means documentation that doesn’t hide behind “trade secrets” when the question is whether a device is behaving predictably.
None of that is as sexy as “AI enters the operating room.” But it’s the difference between innovation and liability.
AI in healthcare isn’t arriving as one big robot doctor with a stethoscope and a personality. It’s arriving as a thousand little “smart” features stapled onto devices, dashboards, and workflows—often sold as an upgrade, sometimes implemented like one, and occasionally governed like one.
And that’s the problem. Because when software gets the scalpel, “move fast and break things” stops being a metaphor.