Why AI-Generated Apps Break After Every Change
In AI-generated codebases past Day 30, every change eventually breaks something else. A new feature lands, and two unrelated screens stop working. A bug fix in the payment flow corrupts the user session. A UI update causes a database query to fail.
This is not bad luck. It is a structural failure mode — known as structural fragility — that appears consistently across AI-generated codebases. The mechanism is understood, and it is detectable before it becomes critical.
This page explains what causes it, how to confirm it in your codebase, and what the remediation path looks like.
Who This Is For
Founders and developers who built their application with AI tools — Lovable, Bolt.new, Cursor, Replit, or v0 — and are now experiencing one or more of the following:
- Every pull request introduces a regression somewhere unexpected
- The team has stopped deploying on Fridays — or stopped deploying confidently at all
- A change in one module consistently breaks something in a different module
- The codebase "worked fine" for the first few months, and now every change is expensive
If this matches your situation, the root cause is almost certainly structural — not a bug, not a bad developer, not a bad AI tool.
What We Observe
In AI-generated codebases past Day 30, the failure mode appears when earlier structural decisions fall out of enforcement.
The observable signals are:
- Regression frequency increases over time — not because the team is making more mistakes, but because the blast radius of each change is growing
- Changes in one layer affect unrelated layers — a route handler change breaks a UI component; a database model change breaks an unrelated API endpoint
- The codebase has no predictable structure — developers cannot reliably predict which files will be affected by a given change
- Test coverage is absent or stale — there is no feedback loop to catch regressions before they reach production
These are not symptoms of a single bug. They are symptoms of a class of structural problems that compound over time.
The Structural Cause
Two root causes are typically present simultaneously.
RC01: Architecture Drift
Prompt-driven development optimizes locally without global structural enforcement. Each prompt produces code that solves the immediate problem — but without awareness of the broader architecture. Over time:
- Business logic migrates into wrong layers (database queries appear in route handlers; pricing logic appears in UI components)
- Files grow without ownership clarity — a single file accumulates logic from multiple domains
- The architecture slowly dissolves as each AI-generated change erodes the original boundaries
The structural cause is that AI-assisted development at scale has no built-in mechanism for enforcing architectural decisions across sessions. What was decided in session 1 is not enforced in session 47.
RC02: Dependency Graph Corruption
Without rules governing the direction of dependencies, modules begin importing each other's internals. Circular dependencies form silently. The import graph becomes a web rather than a tree.
The consequence: isolation becomes impossible. You cannot test one module without pulling in five others. You cannot refactor one file without cascading changes across the codebase. Every change has an unpredictable blast radius.
Detection: How to Confirm This in Your Codebase
The following checks are concrete and reproducible. Each one maps to a specific failure pattern.
FP001: Oversized Files (Architecture Drift signal)
# Find files over 500 lines (warning threshold)
find . -name "*.py" -o -name "*.ts" -o -name "*.tsx" | \
xargs wc -l 2>/dev/null | \
awk '$1 > 500 {print $1, $2}' | \
sort -rn
Interpretation:
>500 LOCper file: warning — file is accumulating logic it should not own>800 LOCper file: critical — boundary erosion is confirmed
In AI-generated codebases, oversized files are the most reliable early signal of architecture drift. A file that started as a route handler and grew to 900 lines almost certainly contains business logic, database queries, and UI state management mixed together.
FP006: Circular Dependencies (Dependency Graph Corruption signal)
# Python: detect circular imports using madge equivalent
pip install pydeps
pydeps your_package --max-bacon=3 --show-cycles
# TypeScript/JavaScript
npx madge --circular --extensions ts,tsx src/
Interpretation:
- 1–2 circular dependency chains: warning
- 3 or more: critical — isolation is no longer possible
Circular dependencies are the structural reason why a change in module A breaks module B, which breaks module C. The dependency graph has become a cycle, and changes propagate in all directions.
FP014: Low Test Coverage Ratio (missing feedback loop)
# Count test files vs production files
PROD=$(find . -name "*.py" -not -path "*/test*" -not -path "*/__pycache__/*" | wc -l)
TEST=$(find . -name "test_*.py" -o -name "*_test.py" | wc -l)
echo "Test ratio: $(echo "scale=1; $TEST * 100 / $PROD" | bc)%"
# TypeScript
PROD=$(find src -name "*.ts" -o -name "*.tsx" | grep -v "\.test\." | wc -l)
TEST=$(find src -name "*.test.ts" -o -name "*.test.tsx" | wc -l)
echo "Test ratio: $(echo "scale=1; $TEST * 100 / $PROD" | bc)%"
Interpretation:
<30%test-to-production ratio: warning — regressions will reach production<10%: critical — the system has no feedback loop
Without tests, there is no automated mechanism to detect that a change in one part of the system broke something in another. Every deployment is a manual verification exercise.
FP017: Missing CI/CD Configuration (no automated enforcement)
# Check for CI/CD configuration
ls .github/workflows/ 2>/dev/null || echo "No GitHub Actions"
ls .gitlab-ci.yml 2>/dev/null || echo "No GitLab CI"
ls Jenkinsfile 2>/dev/null || echo "No Jenkins"
Interpretation:
- Absence of any CI/CD configuration: critical
Even if RC01 and RC02 are addressed, without automated enforcement they return. The system has no immune system — unsafe changes pass directly to production without any automated check.
Why This Compounds Over Time
The failure mode appears when earlier decisions fall out of enforcement. In AI-assisted development at scale, this is the structural pattern:
Month 1: App works. Architecture is implicit but coherent.
Month 2: New features added. Boundaries start to blur.
Month 3: First regressions appear. Team attributes them to bugs.
Month 4: Every change is expensive. The team is afraid to touch anything.
Month 6: "We need to rewrite this."
The compounding mechanism is that each architectural violation makes the next one more likely. Once business logic leaks into a route handler, the next developer (human or AI) treats that as the established pattern and continues it. The architecture slowly dissolves — not through a single catastrophic event, but through accumulated small decisions.
By the time the fragility is visible, the system already has multiple overlapping structural problems.
Remediation Path
Addressing this failure mode does not require a rewrite. The remediation follows three phases:
Phase 1 — Diagnosis Confirm which failure patterns are present and at what severity. The AI Chaos Index (ACI) score quantifies the structural risk across all five root causes. This takes 24 hours.
Phase 2 — Stabilization (Core) Establish enforced boundaries. This means:
- Correcting the most critical boundary violations (logic in wrong layers)
- Setting up automated enforcement (CI/CD pipeline with boundary linter)
- Establishing a test baseline for the highest-risk modules
After Phase 2, the system begins protecting itself. Future unsafe changes are automatically blocked before they reach production.
Phase 3 — Controlled Growth New features are developed in isolated, independently testable modules. The legacy code is frozen — not rewritten — and new development happens in a clean architectural zone. This is the Cap & Grow methodology: the architecture becomes safe to scale without a big-bang rewrite.