PD02

The Same Function Copied Into Five Files — Codebase Entropy in AI-Generated Apps

AI tools generate code fast. They also generate the same code — over and over, in different files, with slightly different implementations. The codebase grows, but the functionality doesn't. What started as a lean MVP becomes a bloated, redundant system where every bug fix requires updating the same logic in a dozen places.

This is codebase entropy — the uncontrolled growth of complexity in AI-generated codebases. It is the primary mechanism behind delivery slowdown and the main reason AI-built MVPs become "hard to maintain."


What We Observe

  • Massive code duplication — "We see the same function copied and pasted into five or ten files. Each one works. None share a common utility." (Exoft audit of 50+ MVPs)
  • 8-fold increase in code duplication with AI assistance (GitClear 2025)
  • Code duplication +48%, refactoring activity -60% (GitClear analysis of 153M lines)
  • Oversized files — "Scripts that should be only a few hundred lines are 800-1000 lines long." Single files accumulate logic from multiple domains.
  • Dependency sprawl — "When an AI agent needs functionality, it reaches for a package... dependency trees balloon." (Noqta analysis)
  • "1000 lines where 100 would suffice" — AI generates elaborate class hierarchies, unnecessary abstractions, and excessive boilerplate for simple tasks

Developer language:

  • "Codebase not optimized at all, with lots of redundant code, and lots of different ways of doing similar things." — r/cursor
  • "Redundancies are stacked to make a disgusting slop code." — r/gamedev
  • "The codebase is just getting bigger and more confusing every day." — common developer complaint
  • "AI Code Bloat." / "300-line bloated mess." / "No factory pattern, no strategy pattern." — r/webdev

The Structural Cause

AI tools favor code addition over code mutation. Generating new boilerplate is computationally cheaper for an LLM than executing precise refactoring across multiple files. The result:

  • No DRY principle — AI generates fresh implementations instead of reusing existing functions
  • No refactoring — AI adds code but never consolidates it. Refactoring activity dropped 60% with AI adoption.
  • Naming inconsistency — Different sessions use different naming conventions for the same concepts
  • File organization drift — Mix of flat and nested structures, duplicate code locations, models in three different places

Each sprint deposits a new unoptimized layer. The codebase grows faster than functionality.


Detection

# Find duplicate code blocks
npx jscpd --min-lines 5 --reporters console src/

# Find oversized files (>500 LOC)
find src -name "*.ts" -o -name "*.tsx" | xargs wc -l | awk '$1 > 500' | sort -rn

# Count total vs unique logic (rough proxy)
cloc src/ --by-file --csv

If duplication exceeds 15% or multiple files exceed 500 LOC, entropy is confirmed.


This Is a Symptom Of


FAQ

Can AI tools refactor the entropy they created?

Partially. AI can identify obvious duplicates but struggles with cross-file refactoring that requires understanding the full dependency graph. The "almost every time I've asked AI to refactor, it has ended up ruining the codebase" pattern (r/ChatGPTCoding) is common. Structural refactoring requires deterministic tooling, not probabilistic generation.


Is This Happening in Your Codebase?

Get a structural assessment with your AI Chaos Index score — delivered in 24 hours.