Article imageLogo
Chatbots Behaving Badly™

The FDA’s Rapid AI Integration - A Critical Perspective

By Markus Brinsa  |  May 9, 2025

Sources

A Historic First at Breakneck Speed In a move that has seasoned regulators spitting out their coffee, the U.S. Food and Drug Administration is going all-in on generative AI—and fast. FDA Commissioner Dr. Martin Makary unveiled an “aggressive timeline” to deploy generative artificial intelligence across all FDA centers by June 30, 2025, after a pilot program’s smashing success.

This is a historic first for the agency: an internal AI rollout touching every corner of its drug, device, biologics, and food oversight divisions, all in a matter of weeks.

Makary was positively giddy about the pilot’s results. “I was blown away by the success of our first AI-assisted scientific review pilot,” he declared, lauding how the tool slashed tedious review tasks from three days to mere minutes. He’s not kidding – a senior FDA scientist, Jinzhong (Jin) Liu, Deputy Director, Office of Drug Evaluation Sciences, Office of New Drugs in FDA’s Center for Drug Evaluation and Research (CDER), marveled that the AI “enabled me to perform scientific review tasks in minutes that used to take three days”. For an agency often chided for slow approvals, this must feel like strapping a rocket to a tricycle.

But it’s not just the speed of the AI – it’s the speed of the rollout itself. Rather than a cautious expansion, the FDA is treating this like an emergency mission. Makary has ordered every center to begin deploying the AI immediately to meet the June 30 deadline, effectively giving regulatory staff about as long to integrate AI into their workflows as it takes to get a dentist appointment. By that date, all centers are supposed to be humming along on a shared, secure generative AI platform hooked into the FDA’s internal data.

In bureaucratic terms, this is warp speed.

If you’ve ever watched a federal agency deliberate over font sizes in a guidance document, you’ll appreciate how unprecedented a six-week agency-wide tech integration truly is. The ambition is admirable – arguably necessary even – as the FDA faces mounting workload pressures. (Not incidentally, the agency has been hit by staffing cuts and review delays in recent months, with thousands of jobs slashed in a sweeping HHS reorganization, raising fears of backlogs. Little wonder the FDA is looking for a digital savior to pick up the slack.) So yes, chalk one up for bold leadership: the FDA is stepping on the gas and embracing AI with historic fervor.

The Allure of AI – and the Elephant in the Room

What’s striking, though, is what the FDA isn’t saying about this grand AI experiment. The agency’s public announcements have been brimming with urgency and optimism, but conspicuously silent on the risks and guardrails. It’s as if the FDA is so enamored with freeing its scientists from “non-productive busywork” that it forgot to mention how it will prevent the productive work from going off the rails. Dr. Makary’s messaging has a techno-utopian zeal – “years of talk… we cannot afford to keep talking. It is time to take action … too important to delay” – which sounds inspiring until you remember this isn’t a Silicon Valley app rollout; it’s an agency whose decisions can literally make the difference between life and death. The irony isn’t lost on those of us who follow the FDA: an organization famous for caution and exhaustive review is now barreling into uncharted AI territory with a rallying cry of full steam ahead, and barely a peep about safety nets.

Let’s address the elephant in the room: generative AI may be powerful, but it’s not exactly known for its infallibility or humility.

By design, large language models like the ones the FDA is deploying generate content, and sometimes they generate wrong content with equal aplomb. In casual tech lingo, we call these mistakes “hallucinations.” The FDA’s glowing press release didn’t mention the word once. Yet the risk is very real. Generative AI models sometimes “fabricate facts, deliver inaccurate assertions and misrepresent reality,” leading to flawed assessments and poor decision-making. An ex-FDA staffer who experimented with ChatGPT issued a polite warning: AI’s propensity to “fabricate convincing information” calls into question “how reliable such a chatbot might be” for regulatory work. In other words, these tools can speak very confidently while being very wrong. That’s a nightmare combo in a setting where scientific accuracy and evidence are paramount. One shudders to imagine an AI that drafts a section of a drug review incorrectly – perhaps overlooking a subtle safety signal or mis-summarizing trial data – and a human reviewer, swamped with other tasks, gives it a rubber stamp.

Over-reliance on AI is a genuine concern.

Makary says the AI will free scientists for higher-level thinking, but history warns us that humans can become complacent, especially if the AI usually sounds right. As one risk expert put it, “Nobody is going to establish guardrails for you. It’s critical that humans verify content from an AI system. A lack of oversight can lead to breakdowns with real-world repercussions.” In the FDA’s rush to automate the drudgery, will they maintain the discipline of double-checking the machine? The agency hasn’t exactly spelled that out.

Another challenge is what you might call the “unknown unknowns” – how will this AI handle truly novel situations or rapid shifts in the scientific landscape?

Regulatory science is not a static field; new therapies, new methodologies, and even new policies can emerge quickly. If the FDA’s AI was trained on yesterday’s data and rules, how swiftly can it adapt to today’s paradigm? Imagine an AI model happily citing decade-old clinical trial conventions that a new policy or guidance has since updated – a real possibility given that AI models can have outdated knowledge baked in. The FDA said it will keep improving the system post-June and “adapt to evolving needs”, but the devil is in those details. Continuous reinforcement learning or re-training for an AI in a regulated environment is no small feat. If a breakthrough gene therapy comes along that doesn’t fit the patterns the AI learned, will the system gracefully flag, “I haven’t seen this before, proceed with caution,” or will it forge ahead with a false sense of competence? The public updates promised for June are likely to tout more use cases, but we’ll be listening for signs of an early warning system or contingency plan for when the AI is out of its depth. Thus far, we’ve heard crickets on that.

High Stakes and Hidden Pitfalls

The stakes could not be higher. The FDA isn’t implementing AI to recommend Netflix movies; it’s using it to help decide whether drugs and medical devices are safe and effective. A mistake or oversight in this context can have far-reaching consequences – patients put at risk, companies derailed, public trust damaged. And yet, nowhere in the FDA’s splashy rollout did we see mention of an ethics review, an oversight framework, or an error-tracking protocol. There was no talk of transparency either. Will the FDA disclose which parts of a review were drafted or assisted by AI? Will there be an audit trail when the AI is used, so that any decisions can be traced back and examined for AI influence? Your guess is as good as ours; the FDA hasn’t said. Transparency is more than just a nicety here – it’s crucial for accountability. If (or when) an AI-driven slip-up occurs, we need to know how it happened. Did a human override the AI or blindly trust it? Was the AI operating on partial information? None of these questions were addressed in Dr. Makary’s enthusiastic memo to the world.

It’s a remarkable contrast: when the FDA evaluates AI algorithms made by industry (for instance, AI-based medical devices), it demands rigorous evidence, risk assessments, bias checks, and more.

Advisory committees debate how to manage “performance bias or hallucinations,” and the FDA openly grapples with how to fit generative AI into its risk-based regulatory frameworks. Yet when it comes to the FDA’s own internal use of generative AI, that same level of scrutiny hasn’t been visible publicly. The agency is essentially giving itself a free pass to fast-track this tech without sharing how it’s ensuring the system is safe and effective for its intended use. It’s as if the FDA wrote itself an emergency use authorization for AI without showing us the data. Even the initial pilot, for all its time-saving glory, came with no published details on methodology or outcome quality – “the agency did not provide specific details about the project” during the announcement. For a scientific regulator, that lack of peer-reviewable detail is noteworthy (and not exactly confidence-inspiring).

Accountability is another glaring question. The Commissioner’s statement makes it sound like the AI is a trusty new deputy that will streamline work. But if something goes wrong – say an important flaw in a drug application is missed because the AI failed to flag it – who takes responsibility? The human reviewers are still ostensibly in charge, but the agency’s culture may need to evolve to ensure people aren’t deferential to AI output. It’s easy to imagine an exhausted reviewer thinking, “The AI summarized this section; it looks reasonable, let’s move on.”

FDA leaders haven’t described any specific checks and balances, like requiring a second person to validate AI-generated portions, or instituting formal error logging when the AI’s suggestions are off-base.

Without such measures, the blame for errors could become a hot potato – was it the tool’s fault or the human who relied on it? In regulatory science, ambiguity in decision processes is dangerous. Clarity and documentation have long been the FDA’s safeguards. This rollout threatens to muddy those waters unless oversight is tightened in tandem.

Racing Ahead, Leaving Caution Behind

To be clear, Nobody is outright opposing the FDA’s use of AI. Even long-time industry and academic observers (myself included) agree that AI could be transformative for regulatory efficiency. We’ve all seen drug review packages that stand taller than a person; if an AI can digest and summarize data, highlight key points, and cross-check references, that’s a big win. The FDA’s pilot evidently showed major productivity gains, and it would be foolish to ignore such a tool.

But there’s a world of difference between cautiously integrating AI as an assistive tool and leaning on it as the new backbone of the review process overnight.

The tone of the FDA’s public communication suggests the latter. Makary’s directive wasn’t “proceed thoughtfully with AI where appropriate” – it was essentially “everybody, turn this on now or be left behind.” That kind of urgency in a high-stakes environment can be worrisome. As one pharmaceutical industry group representative diplomatically put it, “While AI is still developing, harnessing it requires a thoughtful and risk-based approach with patients at the center.” Exactly. A thoughtful approach means anticipating failure modes and building in protections. Did FDA staff get training not just in how to use AI but also in when to distrust or override it? Will there be guidelines on the limits of AI assistance (for example, maybe AI drafts are fine for routine sections, but a human must independently analyze any critical safety findings)? We simply don’t know – the FDA hasn’t shared such nuance publicly.

There’s also the broader issue of whether the FDA’s historical frameworks can keep up with this AI acceleration.

The agency’s processes and regulations were built in an era of paper submissions and, later, static algorithms. A learning system that evolves (and possibly improves or degrades) over time is something new. Consider things like auditability: FDA inspections comb through trial data and analytics from sponsors; will FDA now have to inspect its own AI’s “thought process”? One could argue that the FDA should hold itself to the same standards it holds in the industry: validate the tool, document its performance, and continuously monitor it. Indeed, external experts have urged that any AI used in regulatory decisions undergo thorough validation for accuracy, robustness, and reproducibility. It’s imperative to know that the AI isn’t just fast, but also reliable. So far, we’ve heard about speed, not reliability. And that’s concerning. A turbo-charged engine is great – unless the steering is faulty. Historically, the FDA has been the adult in the room, telling eager drug developers, “prove it” when they claim a new technology is ready. Now, with its own AI, the FDA risks looking like a teenager with a new sports car: this is so cool, let’s floor it – when what we want is a seasoned driver who checks the mirrors and wears a seatbelt.

Interestingly, the FDA does have an internal AI Governance Board – a fact buried in agency planning documents, not trumpeted in press releases.

This board’s mission is to “advance the safe, ethical, and effective deployment of AI at the FDA,” and it includes advisors in legal, ethics, privacy, and security. That’s reassuring on paper. One imagines this very board must be hard at work now, drafting guardrails for the June rollout. But the stark reality is that none of that careful governance talk has made it into the public narrative. The FDA’s communications have felt almost tech-bro in their hype, without the tempering voice of the cautious regulator we know and (usually) trust. It’s a jarring juxtaposition: inside FDA, the grown-ups may well be hashing out risk matrices and contingency plans, but outside FDA, the message is “we’re moving fast and breaking (no, not breaking – reinventing) things.” Perhaps FDA leadership calculated that broadcasting any hint of doubt or caution would undercut the “historic” urgency of this initiative. But in doing so, they’ve fed a perception that the agency is glossing over the very real pitfalls of generative AI.

Balancing Innovation with Vigilance

So, where does that leave us? On one hand, the FDA deserves credit for pushing its institutional boundaries. After all, if the regulatory bodies don’t innovate, they risk falling behind the industries they oversee. No one wants the FDA stuck in a quill-and-parchment era while drug development races ahead. Embracing AI to augment human expertise could herald a new age of efficient, science-driven regulation – truly a big deal. The initial pilot’s apparent success in speeding up reviews validates that there is gold in them thar algorithms.

On the other hand, the lack of visible guardrails is troubling.

It’s a bit like watching a tightrope walker sprint across the wire without a net: thrilling, yes, but you can’t help but cringe at what might happen if they slip. The FDA has given us the thrill; it owes us assurance that it has a net. Explicit commitments to things like human-in-the-loop oversight, transparency of AI involvement in decisions, rigorous validation and auditing, and accountability protocols would go a long way to calming nerves. So far, we’ve heard none.

Dr. Makary’s bold language of urgency – “we cannot afford to keep talking… too important to delay” – captures a certain Silicon Valley-esque impatience that has now reached the halls of Washington. It’s almost unprecedented to hear an FDA Commissioner talk like a startup CEO exhorting his team to ship a product. That urgency can be double-edged. Yes, it might jolt the bureaucracy into action (frankly, it already has). However, it could also signal to staff that speed is king, potentially sidelining the very culture of thoroughness that has long defined FDA’s credibility. The high-stakes nature of the FDA’s work makes this a risky tightrope to walk.

We can admire the agency’s willingness to take a moonshot with AI, and at the same time, we can lampoon the apparent lack of an accompanying safety manual.

A famous saying in regulation is “make haste slowly” – ensure progress, but deliberately and with care. The FDA’s current AI charge feels more like “make haste hastily.” It leaves observers impressed and uneasy in equal parts.

In the coming weeks, as June 30 approaches, you can bet investors, health tech executives, and regulators around the world will be watching the FDA’s experiment closely. If the agency pulls this off – seamlessly integrating generative AI, improving turnaround times without any major snafus – it will indeed mark a new chapter in regulatory operations, one that others will be eager to follow. But if cracks start to show – an embarrassing AI error here, a whistleblower report of over-reliance there – the FDA will have a mini crisis of its own making.

The balance between innovation and vigilance is tricky, and right now, the FDA has tilted decidedly toward innovation.

It’s time to tilt back, at least a little, toward vigilance. This could mean the FDA proactively publishes its AI governance plan, openly acknowledging the risks and how it’s mitigating them. It could mean phasing the AI use such that for critical decisions,human review remains the gold standard until the AI has proven itself over time. It almost certainly means educating all FDA reviewers that the AI is a tool, not an oracle.

The bottom line is that the FDA’s generative AI rollout is audacious and historic, but it is also fraught with unanswered questions.

It’s a daring gamble with huge potential upside for efficiency and equally huge potential downside if handled poorly. As a (mostly) optimistic analyst, I’ll root for the FDA to get this right. But I’ll do so with a healthy dose of skepticism and a plea to the agency: don’t let the seductive glow of AI blind you to its blind spots. In the quest to eliminate “busywork,” make sure you’re not also eliminating the caution and rigor that keep the public safe. Moving fast is fine – as long as you’ve got your ethical seatbelt on. Right now, it looks like the FDA is racing down the AI highway with no headlights and no seatbelt, and that’s a ride only the most foolhardy would take. Let’s hope they prove us worrywarts wrong and that this story becomes one of responsible innovation rather than a cautionary tale. Because if the FDA can’t do this right, who realistically can?

&nbso;

About the Author

Markus Brinsa is the Founder and CEO of SEIKOURI Inc., an international strategy consulting firm specializing in early-stage innovation discovery and AI Matchmaking. He is also the creator of Chatbots Behaving Badly, a platform and podcast that investigates the real-world failures, risks, and ethical challenges of artificial intelligence. With over 15 years of experience bridging technology, business strategy, and market expansion in the U.S. and Europe, Markus works with executives, investors, and developers to turn AI’s potential into sustainable, real-world impact.

©2025 Copyright by Markus Brinsa | Chatbots Behaving Badly™