"5 More Minutes… That Was 5 Hours Ago" — The Almost-Right Code Trap
The AI generates something that looks correct. It compiles. It almost works. You think you're 5 minutes from done — so you prompt again. And again. Hours later, you're still prompting, each fix introducing a subtle new issue.
This is the almost-right code trap — the most universally experienced developer pain in AI-assisted development. AI models are optimized to produce plausible output, not correct output. The result: code that passes a visual check but fails under real conditions, trapping developers in an endless guess-and-patch loop.
What We Observe
- "Almost right but not quite" — the #1 frustration for 66% of developers (Stack Overflow 2025)
- "Debugging AI code takes longer than writing it myself" — reported by 45% of developers (Stack Overflow 2025)
- "95% spend extra time correcting AI-generated code" — survey of 800 developers
- "AI generated PRs contain an average of 10.8 issues, nearly double the 6.4 found in human-written PRs" — CodeRabbit
- "AI actually slows experienced developers down by 19%" — 2025 study cited on LinkedIn
The debug loop pattern:
- "The agent got maybe 10% wrong and I thought I could fix it in 5 more minutes… that was 5 hours ago." — Yoko Li, cited by Addy Osmani
- "AI keeps 'fixing' the same error over and over, making no real progress." — AssurePath
- "Corrections cascade. Fixing one AI-generated assumption often reveals three more downstream." — Noqta
- "I spend most of my time babysitting agents... It's a different kind of exhausting." — developer cited by Addy Osmani
Developer language:
- "Guess-and-patch loop." — r/vibecoding
- "AI slop." — r/ExperiencedDevs
- "Almost every time I've asked AI to refactor, it ruins the codebase while cheerfully claiming it made everything simpler." — r/ChatGPTCoding
- "It took longer to vibe code it and make it work than it would if I wrote it myself." — r/webdev
The Structural Cause
AI models generate code that is statistically plausible, not logically verified. The output matches patterns from training data — which means it looks correct to a human reviewer. The issues are:
- Edge cases ignored — The AI generates the happy path correctly and skips boundary conditions, error handling, and null checks
- Business logic hallucinated — The AI invents plausible but wrong logic, especially in payments, math, and auth
- Sycophantic execution — The AI never pushes back on bad ideas. It enthusiastically generates whatever is asked, even when the approach is flawed
- Self-reinforcing tests — When AI writes both the code and the tests, the tests validate the wrong behavior. "Looks correct but isn't reliable."
Detection
The signal is velocity: if the time from "AI generated a solution" to "solution actually works correctly" consistently exceeds the time it would take to write the solution manually, the almost-right trap is active.
Track: how many prompt-fix cycles does each feature require before it works correctly? If the answer is consistently >3, the codebase has structural issues that make AI-generated code unreliable.
This Is a Symptom Of
- Hidden Technical Debt (PF02) — Each "almost right" fix adds more debt to the codebase
- Delivery Slowdown (PF08) — The debug loop directly causes the slowdown founders experience
FAQ
Is this just a problem with current AI models? Will it get better?
Models are improving, but the fundamental issue is structural: AI generates plausible patterns, not verified logic. Better models reduce the frequency of obvious errors but increase the subtlety of the remaining ones — making them harder to catch. The fix is structural verification (tests, contracts, type safety), not better generation.
How do I break the guess-and-patch loop?
Stop prompting the AI to fix its own output. Instead: (1) write a test that defines the correct behavior, (2) use the AI to generate an implementation, (3) verify against the test. If the test fails, write the implementation manually. This inverts the loop from "generate → fix → fix → fix" to "define → generate → verify."