Can AI tools actually slow teams down instead of speeding them up?

Yes. Gergely Orosz at The Pragmatic Engineer documented multiple cases in 2026. At Amazon, AI-generated code changes caused a 13-hour AWS outage — the AI tool deleted and recreated an entire environment. At Uber, developers using AI 20+ days per month produce 52% more pull requests, but quality metrics like bug rates and code review feedback aren't being tracked. At Anthropic itself, a basic UX bug on Claude.ai went unnoticed despite 80%+ of the code being AI-generated. The pattern: AI increases output volume but can degrade the judgment layer that catches problems before they ship.

How should SMB sales teams evaluate whether their AI tools are actually helping?

Measure outcomes, not activity. If an AI tool helps your team send 3x more outreach emails but reply rates drop, the tool is creating volume without value. The right evaluation: compare reply rates, meeting-to-close ratios, and deal quality before and after AI adoption — not just the number of emails sent or leads contacted. AI should compress research and preparation time so reps spend more time in conversations. If it's generating output that reps then spend time reviewing, editing, or fixing, it's adding work rather than removing it.

AI Agents Are Slowing Teams Down. Here's Why (and What to Do Instead).

Uber's developers who use AI tools 20 or more days per month produce 52% more pull requests than their peers. Impressive number. Except nobody at Uber is measuring whether those pull requests are any good.

That detail comes from Gergely Orosz's investigation in The Pragmatic Engineer, and it's the kind of number that should make anyone using AI tools stop and think. More output isn't the same as better output. And in some documented cases, AI is making teams measurably worse.

AI IN ACTION

AI detects buying signals and executes revenue actions automatically.

See how →

See weekly ROI reports proving AI-generated revenue.

Three companies, three warnings

Anthropic's own product had a bug nobody caught

Here's an irony worth sitting with. Anthropic — the company that builds Claude — had a basic UX bug on Claude.ai where user input was being lost. More than 80% of the site's code was generated by Claude. The bug went unnoticed.

The problem wasn't the AI-generated code itself. It was that nobody was reviewing it with the same attention they'd give human-written code. When 80% of the codebase is AI-generated and everything compiles, the assumption becomes "it probably works." That assumption is where bugs hide.

Amazon's 13-hour outage from an AI-generated change

Amazon reported what they called "high blast radius" incidents involving AI-generated code changes. In one case, their AI tool Kiro decided to "delete the environment and recreate it" — causing a 13-hour AWS outage.

Amazon's response was structural: junior engineers' AI-assisted changes now require senior approval before deployment. They didn't ban the tool. They added a human judgment layer on top of it, because the tool alone couldn't be trusted to know when its output was dangerous.

Uber measures volume, not quality

Uber and Meta are both incorporating AI usage into performance reviews. At Uber, developers who use AI 20+ days per month are classified as "power users." These power users produce 52% more pull requests.

But here's what Uber isn't tracking: bug rates, code review rejection rates, production incidents, or long-term maintenance cost of AI-generated code. The metric is output volume. Quality is unmeasured.

Dax Raad, founder of OpenCode, put it bluntly: AI agents create an environment where developers "spend more time cleaning up" and refactoring is inhibited. The agents don't make the team faster. They make the team produce more stuff that then needs to be managed.

The pattern across all three cases is the same: AI increased output quantity while degrading — or at least not measuring — output quality. More pull requests, more emails, more generated content. But nobody checked whether the additional output was worth having.

This isn't just an engineering problem

If you're running a sales team, you might think these are engineering cautionary tales that don't apply to you. They do. The pattern translates directly.

Consider what happens when an AI sales tool generates outreach at scale:

Volume goes up. Your reps send 3x more emails per day. The tool writes personalized intros, handles follow-up sequences, and contacts companies your team would never have reached manually. The activity metrics look great.

Quality is unmeasured. Reply rates might be dropping. The "personalized" intros might sound generic to recipients who get five similar emails a week. The companies being contacted might not actually be in a buying window. But nobody's comparing reply rates per email before and after AI adoption — they're celebrating the 3x volume increase.

Net result: more work, not less. Reps spend time reviewing AI-generated drafts, editing messages that don't sound right, and following up on leads that were never qualified in the first place. The tool didn't compress their workday. It redirected it.

This is the Uber pattern applied to sales: measuring activity instead of outcomes.

Two kinds of AI tools

The cases above reveal a distinction that matters for any team choosing AI tools:

AI that compresses work eliminates steps your team currently does manually. Finding target companies. Prioritizing which accounts to reach this week. Logging activity in the CRM. Routing leads to the right rep. After adoption, the team's total hours on that workflow go down. The hours shift to higher-value tasks — actual conversations, negotiations, relationship building.

AI that creates work generates output your team then needs to review, edit, approve, or fix. Drafting emails that reps rewrite. Scoring leads that reps re-qualify. Building reports that managers re-interpret. After adoption, the total hours on the workflow stay the same or increase. Individual tasks feel faster, but the end-to-end process doesn't actually shrink.

The test is simple. After adopting the tool, ask: is our team spending less total time on this workflow? Not "is each step faster?" but "is the whole thing shorter?"

If the answer is "each step is faster but we're doing more steps" — that's the Amazon/Uber pattern. The AI created work.

What SMB teams should actually measure

If you're using AI in your sales process, here are the numbers that tell you whether it's compressing work or creating it:

Metric	Compressing Work	Creating Work
Total outreach time per rep per day	Decreasing	Same or increasing (review + edit time)
Reply rate per outreach	Stable or improving	Declining (volume up, relevance down)
Meetings booked per hour of rep time	Increasing	Flat (more emails, same meetings)
CRM data entry time	Approaching zero	Moved to "review AI entries" time
Time from lead identification to first contact	Shrinking	Same (AI finds leads, rep still researches)

If the right column describes your experience, the tool is adding steps rather than removing them. That's not a failure of AI in general — it's a signal that this particular tool is optimizing for output volume rather than workflow compression.

The judgment layer matters most

Amazon's fix is instructive. They didn't remove AI tools from junior engineers. They added a mandatory human review step. The AI generates, the human judges.

For sales teams, the equivalent is: AI should handle research and preparation, but humans should retain control of two things:

Which companies to approach — AI can surface candidates, but the rep decides which ones are worth a conversation right now
What the message says — AI can draft, but if every message gets sent without a rep reading it, you're in Uber territory: volume without quality control

The tools that work best for small teams are the ones where AI does the work that was eating rep time (research, data entry, prioritization) and the rep's role shifts from "doing the grunt work" to "making the judgment calls." If the rep's role shifts to "reviewing AI output," that's a different kind of burden, not a lighter one.

The question isn't "does AI make us faster?"

It's "does AI make us better?"

Faster at what? If AI makes your team faster at sending emails nobody replies to, that's not a productivity gain. If it makes your team faster at identifying the 5 companies worth reaching today — and the reps spend the saved time in actual conversations — that's a real gain.

The Pragmatic Engineer's reporting shows that even the companies building AI (Anthropic) and the companies most aggressively adopting it (Amazon, Uber, Meta) are struggling with this distinction. If they can't get it right automatically, you won't either. You have to measure it deliberately.

One thing to do this week

Pick one AI tool your team uses. Track two numbers for the next five days:

Total time each rep spends on the workflow the tool is supposed to improve (include review, editing, and fixing time)
The outcome metric for that workflow (reply rate, meetings booked, qualified leads)

If total time didn't go down and outcomes didn't go up, the tool is creating work. That doesn't mean AI is bad. It means this tool is solving for volume when your team needs compression.

If you want to see what compression looks like — research that runs itself so reps spend time selling, not reviewing drafts — try Optifai free for 7 days. No credit card required.