The Same Function Copied Into Five Files — Codebase Entropy in AI-Generated Apps
AI tools generate code fast. They also generate the same code — over and over, in different files, with slightly different implementations. The codebase grows, but the functionality doesn't. What started as a lean MVP becomes a bloated, redundant system where every bug fix requires updating the same logic in a dozen places.
This is codebase entropy — the uncontrolled growth of complexity in AI-generated codebases. It is the primary mechanism behind delivery slowdown and the main reason AI-built MVPs become "hard to maintain."
What We Observe
- Massive code duplication — "We see the same function copied and pasted into five or ten files. Each one works. None share a common utility." (Exoft audit of 50+ MVPs)
- 8-fold increase in code duplication with AI assistance (GitClear 2025)
- Code duplication +48%, refactoring activity -60% (GitClear analysis of 153M lines)
- Oversized files — "Scripts that should be only a few hundred lines are 800-1000 lines long." Single files accumulate logic from multiple domains.
- Dependency sprawl — "When an AI agent needs functionality, it reaches for a package... dependency trees balloon." (Noqta analysis)
- "1000 lines where 100 would suffice" — AI generates elaborate class hierarchies, unnecessary abstractions, and excessive boilerplate for simple tasks
Developer language:
- "Codebase not optimized at all, with lots of redundant code, and lots of different ways of doing similar things." — r/cursor
- "Redundancies are stacked to make a disgusting slop code." — r/gamedev
- "The codebase is just getting bigger and more confusing every day." — common developer complaint
- "AI Code Bloat." / "300-line bloated mess." / "No factory pattern, no strategy pattern." — r/webdev
The Structural Cause
AI tools favor code addition over code mutation. Generating new boilerplate is computationally cheaper for an LLM than executing precise refactoring across multiple files. The result:
- No DRY principle — AI generates fresh implementations instead of reusing existing functions
- No refactoring — AI adds code but never consolidates it. Refactoring activity dropped 60% with AI adoption.
- Naming inconsistency — Different sessions use different naming conventions for the same concepts
- File organization drift — Mix of flat and nested structures, duplicate code locations, models in three different places
Each sprint deposits a new unoptimized layer. The codebase grows faster than functionality.
Detection
# Find duplicate code blocks
npx jscpd --min-lines 5 --reporters console src/
# Find oversized files (>500 LOC)
find src -name "*.ts" -o -name "*.tsx" | xargs wc -l | awk '$1 > 500' | sort -rn
# Count total vs unique logic (rough proxy)
cloc src/ --by-file --csv
If duplication exceeds 15% or multiple files exceed 500 LOC, entropy is confirmed.
This Is a Symptom Of
- Hidden Technical Debt (PF02) — Entropy IS the debt, manifested as redundant code
- Architecture Drift (PD01) — Without enforced boundaries, entropy grows unchecked
FAQ
Can AI tools refactor the entropy they created?
Partially. AI can identify obvious duplicates but struggles with cross-file refactoring that requires understanding the full dependency graph. The "almost every time I've asked AI to refactor, it has ended up ruining the codebase" pattern (r/ChatGPTCoding) is common. Structural refactoring requires deterministic tooling, not probabilistic generation.