Solving Hard Problems in Complex Codebases
The one-sentence version: AI coding tools get dumber the longer you talk to them, so you have to keep conversations short and structured.
AI coding tools (like Claude Code, Cursor, Copilot) are great at building new stuff from scratch. But when you point them at a big, messy, real-world codebase — the kind with ten years of history and thousands of files — they start writing bad code. A Stanford study found that the "extra code" AI writes often just cleans up the mess it made last week.
Think of the AI's brain as a whiteboard. It can only hold so much at once. When you keep chatting and pasting files and asking follow-ups, you fill the whiteboard with noise. Around the halfway mark, the AI enters what Dex calls the "Dumb Zone" — it stops thinking clearly and starts guessing.
Instead of just chatting at the AI and hoping, use a three-step process:
And the secret weapon: start fresh often. When the conversation gets long, ask the AI to write a summary, then start a brand new chat with that summary. Wipe the whiteboard, keep the notes.
The bumper sticker: Research. Plan. Build. Reset. Repeat.
A survey of 100,000 developers has exposed an awkward truth about AI coding tools: they make easy things easier and hard things harder. In greenfield projects — the blank-slate work of building from zero — AI copilots deliver genuine speed gains. But in brownfield codebases, the decade-old repositories where most real software lives, the gains evaporate. Much of the "extra code" AI produces turns out to be rework on its own earlier slop.
The common response divides neatly into pessimists ("this will never work") and pragmatists ("smarter models will fix it"). Dex Horthy, founder of HumanLayer and author of the influential "12 Factor Agents" essay that first coined the term "context engineering," argues both camps miss the point. The constraint is not intelligence. It is attention.
Large language models process information through a fixed-size "context window" — roughly analogous to working memory. Horthy's team found that model performance degrades sharply once the context window fills to about 40% of its capacity, a threshold he calls the Dumb Zone. Below that line, the model reasons clearly. Above it, responses grow repetitive, contradictory, and architecturally unsound. The solution, Horthy argues, is not better models but better information management.
He calls the discipline "frequent intentional compaction" — the practice of actively compressing working context into structured summaries and starting fresh sessions at regular intervals. It is, in effect, the intellectual hygiene of AI-assisted development: clearing the desk before it buries you.
From this insight, Horthy developed the RPI workflow: Research, Plan, Implement.
In the research phase, the AI examines the codebase without writing a line. It reads files, traces dependencies, and produces a concise research document. The goal is ground truth — what the code actually does, not what the documentation claims it does. ("Internal docs lie," Horthy observes. "Code doesn't.")
The plan phase is, by Horthy's reckoning, the most critical. The AI produces a detailed implementation plan: specific files, line numbers, code snippets, testing strategy. This document serves two purposes. First, it gives the human a chance to review the approach before a single line ships. Second, it acts as a compression of intent — a dense, actionable blueprint that carries into the implementation phase without the baggage of prior exploration.
Implementation then becomes mechanical execution of an approved plan, ideally in a fresh context window loaded only with the plan itself and the relevant files.
Horthy's sharpest observation may be organisational. Senior engineers, who already think architecturally, report marginal speed gains from AI. Junior and mid-level engineers, who lack that instinct, often use AI to fill the gap — shipping plausible-looking code that senior engineers must then untangle. The result is a new form of technical debt, generated faster than any human could create alone.
"Do not outsource the thinking," Horthy warns. AI amplifies whatever thinking you bring to it. If you bring rigour, you get leverage. If you bring vagueness, you get slop at scale.
The core tension: AI coding tools are force multipliers. That means they multiply carelessness just as readily as they multiply competence. The workflow — not the model — determines which.
LLMs operate on a fixed token budget. Claude's effective context window is ~200k tokens, but attention quality degrades non-linearly with utilisation. Horthy's empirical finding: at ~40% utilisation (~67k tokens in a 168k window), the model enters a degraded regime characterised by:
This degradation is not a cliff but a gradient. The "Dumb Zone" onset varies with task complexity: simple tasks tolerate higher fill, complex multi-file reasoning degrades earlier.
FIC is the core mechanism for staying in the Smart Zone. Three compaction strategies:
| Strategy | When | How |
|---|---|---|
| Session reset | After each RPI phase | Summarise current state into structured markdown. Start new session with summary + relevant files only. |
| Sub-agent delegation | During research phase | Spawn sub-agents for specific file exploration. Results flow back as compressed summaries, not raw file content. |
| Progressive context loading | During implementation | Load files on-demand as the plan dictates, not all at once. Each file enters context only when its section of the plan executes. |
After the original talk, Horthy's team developed the Ralph loop — an automated version of the RPI cycle that runs continuously:
The name "Ralph" is informal — it refers to the ruthless reset-and-loop pattern. The key insight: each loop iteration operates in a fresh context window, carrying only the compacted summary from previous iterations plus the current sub-task plan.
| Task type | Workflow | Example |
|---|---|---|
| Trivial | Direct conversation | Change button colour |
| Small feature | Brief plan, then build | Add a form field |
| Multi-file feature | Research + plan | New API endpoint across repos |
| Complex problem | Full RPI with compaction | Refactoring auth in 300k LOC Rust |
| Architectural | Whiteboard first, then RPI | Removing Hadoop dependency |
Horthy warns about semantic diffusion: the process by which precise technical terms lose meaning as they spread across communities and tools. His example: "spec-driven development" means different things to different tools and teams. Cursor interprets it one way, Claude Code another, and each team using those tools adds their own spin.
The result: what started as a sharp, actionable concept becomes a vague gesture. Teams think they're aligned because they use the same words, but execute differently because the words mean different things.
Defence: Define terms operationally within your team. Don't say "we do spec-driven development." Say "before implementation, we produce a markdown document containing file paths, line numbers, code snippets, and test strategy. The developer approves this document before any code is written."
The most culturally charged finding from Horthy's work:
Horthy's position: AI amplifies existing thinking. If you bring architectural clarity, you get leverage. If you bring "make it work somehow," you get technical debt at machine speed.
The organisational fix is top-down: mandate the RPI workflow. Force the plan review step. Make thinking visible before code ships.
| Term | Definition | Why it matters |
|---|---|---|
| Context Engineering | Building dynamic systems to provide the right information in the right format so the LLM can complete the task | First coined in Horthy's "12 Factor Agents" essay (Apr 2025). Now the dominant framing for AI-assisted development. |
| Dumb Zone | The region (~40-80% context utilisation) where model quality degrades from attention fragmentation | Explains why long AI conversations produce worse output than short ones |
| Frequent Intentional Compaction (FIC) | Deliberately summarising and resetting context to stay in the Smart Zone | The primary defence against Dumb Zone degradation |
| RPI | Research → Plan → Implement. Three-phase workflow for complex AI-assisted coding. | Separates thinking from typing. Makes decisions visible before code ships. |
| Ralph Loop | Automated RPI cycle with mandatory compaction between iterations | Enables sustained autonomous coding without context degradation |
| Semantic Diffusion | Precise terms losing meaning as they spread across communities | Teams think they're aligned but execute differently. Define terms operationally. |
| Compression of Intent | A plan document that densely encodes the developer's thinking in a context-efficient format | Survives context resets. Carries intent without carrying the full exploration history. |
| Mental Alignment | Human and AI sharing the same understanding of what to build and why | The plan review step achieves this. Without it, AI drifts from the developer's intent. |
| Ground Truth | What the code actually does (vs. what docs say it does) | Research phase must establish this from source code, not documentation. |
Horthy's April 2025 essay that started it all. Originally about building reliable LLM applications, it became the intellectual foundation for "No Vibes Allowed." The core argument: don't use prompts for control flow.
The through-line: Factors 3, 9, and 10 directly feed the "No Vibes" thesis. Own context, compact errors, keep agents small. Everything else supports the structural reliability that makes these possible.
Horthy's most contrarian claim: the best AI agents are not autonomous reasoning loops. They are mostly deterministic software with strategic LLM integration points. The LLM makes classification decisions at branch points; the control flow, error handling, and state management are all regular code.
| Date | Work | Key contribution |
|---|---|---|
| Apr 2025 | "12 Factor Agents" blog post | Coined "context engineering." 12 principles for reliable LLM apps. |
| Jun 2025 | AI Engineer World's Fair (first talk) | Context engineering concept "blew up." First public Dumb Zone discussion. |
| Aug 2025 | First sharing of FIC techniques | Frequent Intentional Compaction methodology published |
| Dec 2025 | "No Vibes Allowed" (this talk) | Full RPI framework. 414k views. Definitive statement on AI coding discipline. |
| Mar 2026 | "Everything We Got Wrong About RPI" | 6-month post-mortem. Ralph loops. Lessons from scaling RPI in organisations. |
| Mar 2026 | Dev Interrupted podcast | Deep dive on Ralph loops, escaping the Dumb Zone at scale |
Verified outcomes using RPI workflow:
Synthesised March 2026 from video content, blog posts, podcast appearances, and community analysis.