There’s a new menace stalking the open office plan. It arrives punctually at 09:17 as a 1,200-word memo with neat headings, crisp bullets, and an executive summary that reads like it went to business school. It looks finished. It sounds confident. It advances… nothing. That shiny void has a name now: AI-generated workslop—work product that masquerades as good work but lacks the substance to move a task forward. The term was introduced by researchers from BetterUp Labs and Stanford’s Social Media Lab in Harvard Business Review, and it’s already too familiar to managers who keep rereading “decks” that say a lot while saying very little.
In their data, two numbers stand out. First, 40% of U.S. desk workers say they received workslop in the last month. Second, each instance takes about one hour and 56 minutes to diagnose, untangle, and redo. Scale that across a 10,000-person company and you’re staring at a $9 million annual tax for the privilege of reading fake productivity. This isn’t an internet meme about AI “slop.” This is a P&L problem.
If you’ve been told that generative AI boosts productivity, you’re not wrong—sometimes. Controlled studies have found significant gains in specific contexts. In one MIT-affiliated experiment, professionals using ChatGPT completed writing tasks 40% faster, while the quality of their work rose by 18%. In a field deployment at a Fortune 500 support center, an AI assistant lifted agent productivity ~14% on average, with the largest gains for novices. These are real, durable effects—inside well-bounded tasks with clear end states and abundant examples.
The problem arises when speed meets ambiguity. In government pilots of Microsoft 365 Copilot, users reported time savings but no measurable productivity improvement—because the tool produced output that still required judgment, verification, or wholesale rework. Fast drafts do not equal finished decisions. The draft accelerates; the thinking gets deferred. Slop leaks in through the gap.
Workslop isn’t just a tool problem; it’s an organizational physics problem. Three forces are doing the multiplying.
First, the mandate effect. Leadership says “everyone use AI.” Usage doubles. ROI… doesn’t. MIT’s “GenAI Divide” work found 95% of enterprise generative-AI initiatives showed no measurable return. When tools are adopted before workflows are redesigned, teams produce more artifacts, not more outcomes.
Second, the plausibility premium. Modern models are exquisitely good at tone, pacing, and structure. They sound like work. That veneer lowers our defenses, and we forward drafts that feel done. Stack Overflow recognized this dynamic early and simply banned LLM-generated answers: they looked correct enough to pass casual review while quietly being wrong. The workplace is re-learning that lesson at scale.
Third, the feedback vacuum. Many AI rollouts skip the governance step where teams define what “good” looks like, how we review model output, and where human judgment is non-negotiable. That vacuum creates a cottage industry of unreviewed summaries, hallucinated citations, and immaculate-looking decks that no one wants to admit they don’t trust. Google had to update Search in 2024 to throttle the abuse of scaled content—pages obviously engineered for machines, not people. Office ecosystems are experiencing their own version of scaled content abuse—just routed through email and Docs instead of the open web.
Workslop doesn’t only waste time; it corrodes trust. In the HBR/BetterUp/Stanford data, employees perceive senders of workslop as less creative, capable, and reliable. That’s reputational damage riding on an autocomplete. And recipients pay the cognitive tax of second-guessing everything they read: is this the colleague’s thinking—or the model’s filler? Collaboration slows because the quality signal is jammed with noise.
Meanwhile, a parallel dynamic is unfolding in the broader information ecosystem. As AI-generated content flooded the public web, platforms had to fight to keep “AI slop” out of search results and feeds. Companies now face the inside-out version of that fight. You don’t need a zombie internet to kill productivity; a zombie SharePoint will do.
Yes. And that’s precisely why this is tricky. The same literature that documents slop also documents real gains. GitHub Copilot studies repeatedly show that task completion is ~55% faster for certain coding tasks. The right pairing of model, data, and workflow lifts output and upskills junior talent. The wrong pairing floods your org with off-the-shelf prose that nobody can trust. The difference is design, not hype.
You’ve seen the anti-pattern. A leader buys licenses. An enablement deck frames AI as a “co-pilot for everything.” Teams are told to “experiment.” A month later, inboxes bulge with immaculate briefs that skate past the hard parts: where the data is thin, where the tradeoffs are ugly, where the decision needs a spine.
We confuse content with conclusion, volume with value, and speed with sense-making. The result is a surfeit of “deliverables” that look like outcomes but are just very tidy raw material. And when these artifacts get recycled as inputs for the next round—summaries of summaries of summaries—you get a micro-version of model collapse: a feedback loop where reality gets laundered out and the work converges to cliché. The labs have a term for this degeneration in AI systems: Model Autophagy Disorder. Organizations exhibit a human version when they train themselves on their own lowest-effort drafts.
The strongest evidence against slop isn’t a think piece; it’s task-specific, outcome-linked deployment. The studies that show durable gains don’t say “use AI more.” They say: for this task, with this assistant, under this review process, we get this improvement. That’s not magic; it’s workflow engineering.
Public-sector pilots echo the nuance. Across a UK cross-government Copilot trial, average users reported saving ~26 minutes per day—a significant amount. But even enthusiastic reports admit the tool struggles with nuanced judgments, and organizations still need to translate time saved into decisions made. Speed is the input; value is the output. If you don’t redesign the work to capture it, slop backfills the space.
Workslop is costly in four quiet ways.
It creates rework. Every polished nothingburger demands a second pass by someone who actually knows the domain. That’s labor you could have spent on the real problem.
It distorts decision-making. When the most articulate document wins the meeting instead of the best-supported argument, you ship the prettiest plan, not the right one. The plausibility premium is dangerous precisely because it’s invisible.
It discourages dissent. Push back on slop and you look “anti-AI.” Agree to it and you inherit its errors. Teams learn to stay quiet and move tickets. That’s how you end up with a stack of “approved” plans that don’t survive first contact with reality.
It degrades the data exhaust. Knowledge bases are filled with generic templates and hallucinated facts. Six months later, the internal search feels like late-stage platform enshittification: a glossy pile of irrelevant results, but this time it’s your own.
If you want less slop and more signal, don’t start with tools. Start with standards. Organizations already know how to govern risky, high-leverage systems. The NIST AI Risk Management Framework and ISO/IEC 42001 provide a vocabulary for establishing guardrails, defining review checkpoints, and determining where human accountability lies. Translate those into your content supply chain. Make “AI-assisted” a workflow with gates, not a vibe.
Then rewrite the local rules of engagement. Drafts from a model are raw material, not work product. “Looks finished” is an anti-pattern; evidence of reasoning is the standard. Require source links for claims. Require the author to state what they changed from the AI’s first pass and why. Incentivize fewer artifacts with clearer conclusions. Normalize saying, “the model doesn’t know”—because sometimes that’s the smartest thing in the room.
Finally, choose your sharp ends carefully. The success stories focus AI on narrow, high-leverage tasks with clear success metrics and quick human verification loops. Think: synthesizing customer chats into tickets you can spot-check, generating starter test cases you can run, refactoring boilerplate you can diff. If you can’t measure the lift and audit the output, you haven’t earned the right to scale.
Workslop is not a morality play about “good humans” and “bad machines.” It’s a structural story about what happens when we pour cheap, confident text into systems that reward looking done. The internet already learned this lesson the hard way; it took policy changes at Google to push back the tide. The enterprise will need its own pushback: not a ban, but a bias toward thinking before typing and ownership after generating. Because the only thing more expensive than bad work is good-looking work that’s wrong.