PD02

The Same Function Copied Into Five Files — Codebase Entropy in AI-Generated Apps

AI tools generate code fast. They also generate the same code — over and over, in different files, with slightly different implementations. The codebase grows, but the functionality doesn't. What started as a lean MVP becomes a bloated, redundant system where every bug fix requires updating the same logic in a dozen places.

This is codebase entropy — the uncontrolled growth of complexity in AI-generated codebases. It is the primary mechanism behind delivery slowdown and the main reason AI-built MVPs become "hard to maintain."

What We Observe

Massive code duplication — "We see the same function copied and pasted into five or ten files. Each one works. None share a common utility." (Exoft audit of 50+ MVPs)
8-fold increase in code duplication with AI assistance (GitClear 2025)
Code duplication +48%, refactoring activity -60% (GitClear analysis of 153M lines)
Oversized files — "Scripts that should be only a few hundred lines are 800-1000 lines long." Single files accumulate logic from multiple domains.
Dependency sprawl — "When an AI agent needs functionality, it reaches for a package... dependency trees balloon." (Noqta analysis)
"1000 lines where 100 would suffice" — AI generates elaborate class hierarchies, unnecessary abstractions, and excessive boilerplate for simple tasks

Developer language:

"Codebase not optimized at all, with lots of redundant code, and lots of different ways of doing similar things." — r/cursor
"Redundancies are stacked to make a disgusting slop code." — r/gamedev
"The codebase is just getting bigger and more confusing every day." — common developer complaint
"AI Code Bloat." / "300-line bloated mess." / "No factory pattern, no strategy pattern." — r/webdev

The Structural Cause

AI tools favor code addition over code mutation. Generating new boilerplate is computationally cheaper for an LLM than executing precise refactoring across multiple files. The result:

No DRY principle — AI generates fresh implementations instead of reusing existing functions
No refactoring — AI adds code but never consolidates it. Refactoring activity dropped 60% with AI adoption.
Naming inconsistency — Different sessions use different naming conventions for the same concepts
File organization drift — Mix of flat and nested structures, duplicate code locations, models in three different places

Each sprint deposits a new unoptimized layer. The codebase grows faster than functionality.

Detection

# Find duplicate code blocks
npx jscpd --min-lines 5 --reporters console src/

# Find oversized files (>500 LOC)
find src -name "*.ts" -o -name "*.tsx" | xargs wc -l | awk '$1 > 500' | sort -rn

# Count total vs unique logic (rough proxy)
cloc src/ --by-file --csv

If duplication exceeds 15% or multiple files exceed 500 LOC, entropy is confirmed.

This Is a Symptom Of

Hidden Technical Debt (PF02) — Entropy IS the debt, manifested as redundant code
Architecture Drift (PD01) — Without enforced boundaries, entropy grows unchecked

FAQ

Can AI tools refactor the entropy they created?

Partially. AI can identify obvious duplicates but struggles with cross-file refactoring that requires understanding the full dependency graph. The "almost every time I've asked AI to refactor, it has ended up ruining the codebase" pattern (r/ChatGPTCoding) is common. Structural refactoring requires deterministic tooling, not probabilistic generation.

What We Observe

The Structural Cause

Detection

This Is a Symptom Of

FAQ

Is This Happening in Your Codebase?