← AI Chaos Guide/failure patterns
low-test-coverage

Low Test Coverage in AI-Generated Code: Detection and Remediation

Low test coverage is a failure pattern in AI-generated codebases where the proportion of production code with corresponding automated tests is insufficient to provide a functioning feedback loop. Regressions are not caught before deployment. Business rule changes are applied incompletely. Structural interventions — refactoring, dependency breaking, layer extraction — carry unquantified regression risk.

The structural mechanism: prompt-driven development is optimized for the speed of the first ship. Test generation requires a separate session with a different optimization target — correctness verification rather than feature delivery. Without a CI/CD enforcement mechanism that requires passing tests before merge, that second session does not happen. The coverage gap accumulates with every feature shipped.

This page explains how to measure the actual state of the test feedback loop, how to distinguish between coverage that provides protection and coverage that provides false confidence, and what the remediation path looks like.


What We Observe

Low test coverage in AI-generated codebases presents with specific structural signatures:

  • Regressions discovered in production — a change that should have been caught by an automated test reaches production because no test exists for the affected code path
  • Refactoring paralysis — the team cannot safely refactor a module because there are no tests to verify that the refactoring preserved behavior
  • "Manual QA" as the primary feedback loop — the team relies on manual testing before every deployment, which is slow, inconsistent, and does not scale with codebase size
  • Coverage numbers that do not reflect protection — a CI/CD dashboard shows 35% coverage, but the covered code is utility functions and UI components; the business logic — pricing, authorization, data validation — has 0% coverage
  • Stale tests that always pass — test files exist but test the previous version of regenerated code; they pass because the function names still match but verify nothing about current behavior

The critical distinction: low test coverage is not just a quantity problem. A codebase with 40% coverage concentrated on the wrong code paths provides less protection than a codebase with 20% coverage concentrated on the highest-risk business logic.


Detection

Step 1: Test-to-Production File Ratio

echo "=== Test coverage ratio ==="

# Python
PROD_PY=$(find . -name "*.py" \
  -not -path "*/test*" -not -path "*/__pycache__/*" \
  -not -path "*/migrations/*" -not -name "conftest.py" \
  -not -name "setup.py" | wc -l)
TEST_PY=$(find . \( -name "test_*.py" -o -name "*_test.py" \) | wc -l)
echo "Python — Production: $PROD_PY, Tests: $TEST_PY"
[ "$PROD_PY" -gt 0 ] && \
  echo "  Ratio: $(echo "scale=1; $TEST_PY * 100 / $PROD_PY" | bc)%" || \
  echo "  No Python files found"

# TypeScript
PROD_TS=$(find src -name "*.ts" -o -name "*.tsx" 2>/dev/null | \
  grep -v "\.test\." | grep -v "\.spec\." | grep -v "__tests__" | wc -l)
TEST_TS=$(find src \( -name "*.test.ts" -o -name "*.test.tsx" \
  -o -name "*.spec.ts" -o -path "*/__tests__/*.ts" \) 2>/dev/null | wc -l)
echo "TypeScript — Production: $PROD_TS, Tests: $TEST_TS"
[ "$PROD_TS" -gt 0 ] && \
  echo "  Ratio: $(echo "scale=1; $TEST_TS * 100 / $PROD_TS" | bc)%" || \
  echo "  No TypeScript files found"

Step 2: Coverage by Layer (where is coverage concentrated?)

echo "=== Coverage distribution by layer ==="

# Check which layers have test files
for layer in "services" "repositories" "handlers" "routes" "components" "utils" "domains"; do
  prod=$(find . -path "*/$layer/*.py" -o -path "*/$layer/*.ts" \
    -o -path "*/$layer/*.tsx" 2>/dev/null | \
    grep -v "test\|spec" | wc -l)
  tests=$(find . -path "*/$layer/test_*" -o -path "*/$layer/*.test.*" \
    -o -path "*/$layer/*_test.*" 2>/dev/null | wc -l)
  if [ "$prod" -gt 0 ]; then
    ratio=$(echo "scale=0; $tests * 100 / $prod" | bc 2>/dev/null || echo "0")
    echo "  $layer: $tests/$prod files tested ($ratio%)"
  fi
done

# Python: run actual coverage measurement
echo ""
echo "=== Actual line coverage (Python) ==="
python -m pytest --cov=. --cov-report=term-missing --tb=no -q 2>/dev/null | \
  tail -20 || echo "pytest-cov not configured — run: pip install pytest-cov"

# TypeScript: run actual coverage measurement
echo ""
echo "=== Actual line coverage (TypeScript) ==="
npx jest --coverage --coverageReporters=text-summary --passWithNoTests \
  2>/dev/null | tail -10 || echo "jest not configured"

Step 3: Business Logic Coverage Gap

echo "=== Business logic coverage gap ==="
# Find service/domain files with no corresponding test file
echo "Service files with no tests:"
find . \( -path "*/services/*.py" -o -path "*/domains/*/service.py" \
  -o -path "*/services/*.ts" -o -path "*/domains/*/service.ts" \) \
  -not -path "*/test*" -not -path "*/__pycache__/*" 2>/dev/null | \
  while read f; do
    dir=$(dirname "$f")
    base=$(basename "$f" .py)
    base=$(basename "$base" .ts)
    has_test=$(find "$dir" -name "test_${base}*" -o -name "${base}.test.*" \
      -o -name "${base}_test.*" 2>/dev/null | wc -l)
    [ "$has_test" -eq 0 ] && echo "  NO TEST: $f"
  done | head -20

Step 4: Stale Test Detection

echo "=== Stale test detection ==="
# Python: test files importing non-existent modules
find . \( -name "test_*.py" -o -name "*_test.py" \) 2>/dev/null | \
  while read testfile; do
    python3 -c "
import ast, sys
try:
    tree = ast.parse(open('$testfile').read())
    imports = [n.names[0].name if isinstance(n, ast.Import) else n.module
               for n in ast.walk(tree)
               if isinstance(n, (ast.Import, ast.ImportFrom))
               and getattr(n, 'module', None)]
    print('\n'.join(imports))
except: pass
" 2>/dev/null | while read module; do
      path=$(echo "$module" | tr '.' '/').py
      [ -n "$module" ] && [ ! -f "$path" ] && \
        echo "STALE IMPORT: $testfile → $module (not found)"
    done
  done | head -15

# TypeScript: tests with always-passing assertions
echo ""
echo "=== Always-passing test patterns ==="
grep -rn "expect(true)\|toBe(true)\|toEqual({})\|expect(1).toBe(1)" \
  --include="*.test.ts" --include="*.test.tsx" --include="*.spec.ts" \
  . 2>/dev/null | head -10

Interpretation Table

Test ratio Coverage state Risk
≥50% Baseline present Low — monitor for stale tests and coverage gaps
30–50% Partial coverage Medium — refactoring carries moderate regression risk
10–30% Insufficient High — structural interventions are high-risk without additional tests
<10% Effectively absent Critical — every change is unguarded; regressions reach production

Coverage quality signals:

  • Business logic (services, domain modules) with 0% coverage: critical regardless of overall ratio
  • Tests that require a live database or full application stack: integration tests, not unit tests — they do not run in CI/CD without infrastructure
  • Test files with expect(true).toBe(true) or equivalent: placeholder tests — they inflate coverage numbers without providing protection

Remediation Path

Step 1: Establish the Test Baseline — Priority Order

The test baseline is not written uniformly across the codebase. It is concentrated on the highest-risk code paths:

Priority 1 — Business logic (highest risk, write first):
  ✓ Pricing and discount calculations
  ✓ Permission and authorization checks
  ✓ Data validation logic (email, phone, date ranges)
  ✓ State transition logic (order status, payment flow)
  ✓ Any calculation that affects money or access

Priority 2 — Integration boundaries (medium risk):
  ✓ API endpoint contracts (input validation, response shape)
  ✓ Repository layer (query correctness with test DB)
  ✓ External service adapters (mocked)

Priority 3 — UI behavior (lower risk for backend-heavy systems):
  ✓ Form validation
  ✓ Critical user flows (login, checkout)

Step 2: Write Tests for Current Behavior Before Refactoring

# Before refactoring a service, write a characterization test
# that captures current behavior — even if the behavior is wrong

class TestCalculateOverageService:
    def test_pro_user_gets_discount(self):
        repo = MockUserRepository(subscription_tier="pro")
        service = CalculateOverageService(repo)
        result = service.execute(CalculateOverageRequest(
            user_id="user-123",
            amount=Decimal("100.00")
        ))
        # Capture current behavior — verify after refactoring
        assert result.total == Decimal("90.00")

    def test_free_user_no_discount(self):
        repo = MockUserRepository(subscription_tier="free")
        service = CalculateOverageService(repo)
        result = service.execute(CalculateOverageRequest(
            user_id="user-456",
            amount=Decimal("100.00")
        ))
        assert result.total == Decimal("100.00")

Step 3: Enforce Coverage Threshold in CI/CD

# .github/workflows/ci.yml
- name: Run tests with coverage threshold
  run: |
    # Python
    pytest --cov=. --cov-report=term-missing --cov-fail-under=30
    # TypeScript
    # npx jest --coverage --coverageThreshold='{"global":{"lines":30}}'
// jest.config.js — coverage threshold enforcement
module.exports = {
  coverageThreshold: {
    global: {
      lines: 30,
      functions: 30,
    },
    // Higher threshold for business logic
    "./src/domains/": {
      lines: 60,
      functions: 60,
    }
  }
};

How FP014 Interacts with Other Failure Patterns

Low test coverage does not occur in isolation. It interacts with other failure patterns in a compounding pattern:

  • FP006 (Circular Dependencies) — circular dependencies make unit testing structurally impossible; a test for module A must load module B (which loads module A), preventing true isolation; FP006 must be addressed before FP014 can be fully remediated
  • FP002 (Business Logic in Wrong Layer) — logic in the wrong layer is structurally harder to test; a route handler with embedded DB queries requires a live database to test, which is why it is often left untested; FP002 remediation directly enables FP014 remediation
  • FP005 (Full-File Rewrites) — prompt-driven regeneration creates stale tests; a test written against the previous version of a regenerated file passes but verifies nothing; stale tests inflate coverage numbers while providing no protection
  • FP017 (Missing CI/CD) — without CI/CD enforcement, coverage thresholds are not enforced; coverage can drop to 0% without any automated signal

Addressing FP014 in isolation — writing tests without fixing FP006 and FP002 — produces tests that are either integration tests (requiring full stack) or stale tests (testing previous behavior). The correct sequence: fix FP006 and FP002 first to create the architectural preconditions for testable code, then write tests against the isolated modules.


Is This Happening in Your Codebase?

Get a structural assessment with your AI Chaos Index score — delivered in 24 hours.