PD04

"5 More Minutes… That Was 5 Hours Ago" — The Almost-Right Code Trap

The AI generates something that looks correct. It compiles. It almost works. You think you're 5 minutes from done — so you prompt again. And again. Hours later, you're still prompting, each fix introducing a subtle new issue.

This is the almost-right code trap — the most universally experienced developer pain in AI-assisted development. AI models are optimized to produce plausible output, not correct output. The result: code that passes a visual check but fails under real conditions, trapping developers in an endless guess-and-patch loop.

What We Observe

"Almost right but not quite" — the #1 frustration for 66% of developers (Stack Overflow 2025)
"Debugging AI code takes longer than writing it myself" — reported by 45% of developers (Stack Overflow 2025)
"95% spend extra time correcting AI-generated code" — survey of 800 developers
"AI generated PRs contain an average of 10.8 issues, nearly double the 6.4 found in human-written PRs" — CodeRabbit
"AI actually slows experienced developers down by 19%" — 2025 study cited on LinkedIn

The debug loop pattern:

"The agent got maybe 10% wrong and I thought I could fix it in 5 more minutes… that was 5 hours ago." — Yoko Li, cited by Addy Osmani
"AI keeps 'fixing' the same error over and over, making no real progress." — AssurePath
"Corrections cascade. Fixing one AI-generated assumption often reveals three more downstream." — Noqta
"I spend most of my time babysitting agents... It's a different kind of exhausting." — developer cited by Addy Osmani

Developer language:

"Guess-and-patch loop." — r/vibecoding
"AI slop." — r/ExperiencedDevs
"Almost every time I've asked AI to refactor, it ruins the codebase while cheerfully claiming it made everything simpler." — r/ChatGPTCoding
"It took longer to vibe code it and make it work than it would if I wrote it myself." — r/webdev

The Structural Cause

AI models generate code that is statistically plausible, not logically verified. The output matches patterns from training data — which means it looks correct to a human reviewer. The issues are:

Edge cases ignored — The AI generates the happy path correctly and skips boundary conditions, error handling, and null checks
Business logic hallucinated — The AI invents plausible but wrong logic, especially in payments, math, and auth
Sycophantic execution — The AI never pushes back on bad ideas. It enthusiastically generates whatever is asked, even when the approach is flawed
Self-reinforcing tests — When AI writes both the code and the tests, the tests validate the wrong behavior. "Looks correct but isn't reliable."

Detection

The signal is velocity: if the time from "AI generated a solution" to "solution actually works correctly" consistently exceeds the time it would take to write the solution manually, the almost-right trap is active.

Track: how many prompt-fix cycles does each feature require before it works correctly? If the answer is consistently >3, the codebase has structural issues that make AI-generated code unreliable.

This Is a Symptom Of

Hidden Technical Debt (PF02) — Each "almost right" fix adds more debt to the codebase
Delivery Slowdown (PF08) — The debug loop directly causes the slowdown founders experience

FAQ

Is this just a problem with current AI models? Will it get better?

Models are improving, but the fundamental issue is structural: AI generates plausible patterns, not verified logic. Better models reduce the frequency of obvious errors but increase the subtlety of the remaining ones — making them harder to catch. The fix is structural verification (tests, contracts, type safety), not better generation.

How do I break the guess-and-patch loop?

Stop prompting the AI to fix its own output. Instead: (1) write a test that defines the correct behavior, (2) use the AI to generate an implementation, (3) verify against the test. If the test fails, write the implementation manually. This inverts the loop from "generate → fix → fix → fix" to "define → generate → verify."

What We Observe

The Structural Cause

Detection

This Is a Symptom Of

FAQ

Is This Happening in Your Codebase?