PD06

AI Generated Code Regression: Why Every PR Becomes a Risk

Regression fear is a systemic failure pattern in AI-generated codebases past Day 30 where pull requests become structural risk events. The team reviews the diff, approves the change, merges — and something unrelated breaks in production. Not the code that was changed. Something else.

This is not a testing problem. It is not a developer skill problem. It is a structural failure mode — regression fear — that emerges when three root causes converge: architectural boundaries that have eroded, a dependency graph that has become circular, and a test infrastructure that was never built.

This page explains the mechanism, how to detect it in your codebase, and what the remediation path looks like.


Who This Is For

Founders and developers who built their application with AI tools — Lovable, Bolt.new, Cursor, Replit, or v0 — and are now experiencing one or more of the following:

  • Merging a PR requires manually testing features that were not touched
  • The team has an unwritten rule: "never deploy on Friday"
  • A change in the authentication flow broke the billing module
  • Developers spend more time verifying that nothing broke than writing new code
  • The codebase "worked fine" at launch, and now every change carries an invisible tax

If this matches your situation, the root cause is almost certainly structural — not a bad developer, not a bad AI tool, not insufficient testing discipline.


What We Observe

In AI-generated codebases past Day 30, regression fear appears when the blast radius of any change becomes unpredictable.

The observable signals are:

  • Regressions appear in unrelated modules — a change to the user profile endpoint breaks the payment confirmation screen
  • Full-file rewrites in git history — AI regeneration replaces entire files, silently overwriting custom logic from previous sessions
  • No automated safety net — CI/CD is absent or runs no tests; errors reach production without any automated check
  • Circular dependency chains — modules import each other's internals, so a change in one propagates in unexpected directions
  • Test files are absent, stale, or never run — there is no feedback loop to catch regressions before they reach production

These are not symptoms of a single bug. They are symptoms of a class of structural problems that compound over time.


The Structural Cause

Three root causes are typically present simultaneously in codebases where regression fear has set in.

RC01: Architecture Drift

Prompt-driven development optimizes locally without global structural enforcement. Over time, the architectural boundaries that were implicit in the original design erode. Business logic migrates into wrong layers. Files accumulate logic from multiple domains. The result: a change in one part of the system has an unpredictable blast radius because the boundaries that would contain it no longer exist.

The structural cause is that AI-assisted development at scale has no built-in mechanism for enforcing architectural decisions across sessions. What was decided in session 1 is not enforced in session 47. Each AI-generated change is locally coherent but globally erosive.

A specific failure pattern that amplifies this: full-file rewrites (FP005). When prompt-driven regeneration replaces an entire file in a single commit, it silently overwrites custom logic — bug fixes, business rules, integration patches — that were added in previous sessions. The git history shows the file as "changed," but the actual loss is invisible until a regression surfaces in production.

A particularly dangerous variant: security patches applied manually between two AI prompting sessions. A developer patches an authentication bypass, commits the fix, and moves on. Two sessions later, AI regenerates the same file to add a new feature. The security patch is gone. The git diff shows the new feature. The regression is silent until the vulnerability is exploited.

RC02: Dependency Graph Corruption

Without rules governing the direction of dependencies, modules begin importing each other's internals. Circular dependency chains form silently. The import graph becomes a web rather than a tree.

The consequence: isolation becomes impossible. A change in module A propagates to module B, which propagates to module C, which propagates back to module A. The blast radius of any change is no longer bounded by the change itself — it is bounded by the entire circular subgraph.

This is the structural reason why a change in the authentication flow breaks the billing module. They are not logically related. But they are structurally coupled through a chain of circular imports.

RC04: Test Infrastructure Failure

Without tests, there is no feedback loop. Regressions pass directly to production. The team discovers them through user reports, not automated checks.

In AI-generated codebases, test infrastructure failure is commonly present at launch. AI tools are optimized for the speed of the first ship — they generate application code efficiently, minimizing time-to-demo. Test suites are not part of that optimization target. The result: a codebase that ships fast and breaks silently.

The structural reason is local optimization: each prompt session produces working code for the immediate feature. Tests require a second pass — a deliberate decision to invest in feedback infrastructure that has no visible impact on the demo. In practice, that second pass rarely happens.

The cost compounds with codebase size. At 10k LOC, adding tests retroactively is a weekend project. At 50k LOC, it is a multi-week initiative that competes with feature development. At 80k LOC with circular dependencies already present, writing tests against the current behavior is actively misleading — the behavior will change unpredictably as the structural issues propagate. The correct sequence is: fix the structural cause first, then add tests against the stabilized architecture.


Detection: How to Confirm This in Your Codebase

The following checks are concrete and reproducible. Each maps to a specific failure pattern.

FP005: Full-File Rewrites (Architecture Drift signal)

# Find files where >80% of lines changed in a single commit (last 60 days)
git log --since="60 days ago" --name-only --format="" | \
  sort | uniq | while read f; do
    git log --since="60 days ago" --follow -p -- "$f" 2>/dev/null | \
    grep -c "^[+-]" | read changes
    total=$(wc -l < "$f" 2>/dev/null || echo 1)
    echo "$changes $total $f"
  done | awk '$1/$2 > 0.8 {print $3, $1"/"$2, "lines changed"}'

Simpler check:

# Files with large single-commit changes in the last 60 days
git log --since="60 days ago" --stat | \
  grep -E "^\s+[0-9]+ files? changed" | \
  awk '{if ($1 > 50) print}'

Interpretation:

  • ≥3 full-file rewrites in 60 days: finding — custom logic is being silently overwritten
  • Pattern in AI-generated codebases: AI regeneration replaces the entire file rather than patching specific sections

FP006: Circular Dependencies (Dependency Graph Corruption signal)

# Python: detect circular imports
pip install pydeps
pydeps your_package --max-bacon=3 --show-cycles

# TypeScript/JavaScript
npx madge --circular --extensions ts,tsx src/
# Note: for monorepos, ensure tsconfig.json paths are correctly configured

Interpretation:

  • 1–2 circular dependency chains: warning — blast radius is expanding
  • ≥3 circular chains: critical — isolation is no longer possible; every change is a regression risk

FP014: Low Test Coverage Ratio (missing feedback loop)

# Python: test-to-production file ratio
PROD=$(find . -name "*.py" -not -path "*/test*" -not -path "*/__pycache__/*" | wc -l)
TEST=$(find . -name "test_*.py" -o -name "*_test.py" | wc -l)
echo "Test ratio: $(echo "scale=1; $TEST * 100 / $PROD" | bc)%"

# TypeScript
PROD=$(find src -name "*.ts" -o -name "*.tsx" | grep -v "\.test\." | grep -v "\.spec\." | grep -v "__tests__" | wc -l)
TEST=$(find src -name "*.test.ts" -o -name "*.test.tsx" -o -name "*.spec.ts" -o -path "*/__tests__/*.ts" | wc -l)
echo "Test ratio: $(echo "scale=1; $TEST * 100 / $PROD" | bc)%"
# Note: some AI generators (Bolt.new) create __tests__/ directories — included above

Interpretation:

  • <30% test-to-production ratio: warning — regressions will reach production undetected
  • <10%: critical — the system has no feedback loop; every deployment is a manual verification exercise

FP017: Missing CI/CD Configuration (no automated enforcement)

# Check for CI/CD configuration
ls .github/workflows/ 2>/dev/null && echo "GitHub Actions: present" || echo "GitHub Actions: ABSENT"
ls .gitlab-ci.yml 2>/dev/null && echo "GitLab CI: present" || echo "GitLab CI: ABSENT"
ls Jenkinsfile 2>/dev/null && echo "Jenkins: present" || echo "Jenkins: ABSENT"
ls .circleci/config.yml 2>/dev/null && echo "CircleCI: present" || echo "CircleCI: ABSENT"

Interpretation:

  • Absence of any CI/CD configuration: critical — code goes from developer to production unchecked
  • CI/CD present but no test step: warning — the safety net exists but has no teeth

Why This Compounds Over Time

The failure mode follows a predictable trajectory in AI-generated codebases:

Month 1: App ships. No tests. No CI. Architecture is implicit but coherent.
Month 2: New features added via AI. Circular deps begin forming silently.
Month 3: First regressions appear. Team attributes them to "bugs."
Month 4: Every PR requires manual smoke testing. Velocity drops.
Month 5: A full-file rewrite overwrites a critical bug fix. Production incident.
Month 6: Team is afraid to merge anything. "We need to rewrite this."

The compounding mechanism is structural: each architectural violation makes the next regression more likely. Once circular dependencies form, every change has an unpredictable blast radius. Once tests are absent, every deployment is a gamble. Once full-file rewrites are the norm, no custom logic is safe.

By the time regression fear is visible, the system already has multiple overlapping structural problems. Addressing one in isolation — adding tests without fixing the circular dependencies, or fixing the circular dependencies without adding CI enforcement — provides temporary relief but does not break the cycle.


Remediation Path

Addressing regression fear does not require a rewrite. The remediation follows three phases:

Phase 1 — Diagnosis Confirm which failure patterns are present and at what severity. The AI Chaos Index (ACI) score quantifies the structural risk across all five root causes. A Quick Scan returns results in 24 hours. A full Production Readiness Audit provides a complete architectural map with a prioritized upgrade path.

Phase 2 — Stabilization (Core) Establish the safety net that prevents regressions from reaching production:

  • Correct the most critical circular dependency chains
  • Set up CI/CD with automated lint and test steps
  • Establish a test baseline for the highest-risk modules
  • Add marker-based code preservation to prevent full-file rewrites from overwriting custom logic

After Phase 2, the system begins protecting itself. Future unsafe changes are automatically blocked before they reach production.

Phase 3 — Controlled Growth New features are developed in isolated, independently testable modules. The legacy code is frozen — not rewritten — and new development happens in a clean architectural zone. The Cap & Grow methodology ensures that the architecture becomes safe to scale without a big-bang rewrite.


Is This Happening in Your Codebase?

Get a structural assessment with your AI Chaos Index score — delivered in 24 hours.