No Vibes Allowed

Solving Hard Problems in Complex Codebases

Dex Horthy, HumanLayer (YC F24) — AI Engineer World's Fair, Dec 2025
414k views • 13k likes • 20:30 • Progressive education resource synthesized from talk + related works

Tier 1: The Simple Version

Plain English — No jargon

The one-sentence version: AI coding tools get dumber the longer you talk to them, so you have to keep conversations short and structured.

The problem

AI coding tools (like Claude Code, Cursor, Copilot) are great at building new stuff from scratch. But when you point them at a big, messy, real-world codebase — the kind with ten years of history and thousands of files — they start writing bad code. A Stanford study found that the "extra code" AI writes often just cleans up the mess it made last week.

Why it happens

Think of the AI's brain as a whiteboard. It can only hold so much at once. When you keep chatting and pasting files and asking follow-ups, you fill the whiteboard with noise. Around the halfway mark, the AI enters what Dex calls the "Dumb Zone" — it stops thinking clearly and starts guessing.

Smart Zone Dumb Zone Disaster |<-- clear ------>|<-- foggy -------->|<-- broken -->| 0% ~40% ~80% 100% of the AI's memory used

The fix

Instead of just chatting at the AI and hoping, use a three-step process:

  1. Research — Let the AI look around the codebase and take notes. No coding yet.
  2. Plan — The AI writes a detailed plan. You read it and approve it.
  3. Build — The AI follows the plan, step by step.

And the secret weapon: start fresh often. When the conversation gets long, ask the AI to write a summary, then start a brand new chat with that summary. Wipe the whiteboard, keep the notes.

The bumper sticker: Research. Plan. Build. Reset. Repeat.

Tier 2: The Economist Version

Clear, informed analysis — As if explaining to a smart friend

The productivity paradox

A survey of 100,000 developers has exposed an awkward truth about AI coding tools: they make easy things easier and hard things harder. In greenfield projects — the blank-slate work of building from zero — AI copilots deliver genuine speed gains. But in brownfield codebases, the decade-old repositories where most real software lives, the gains evaporate. Much of the "extra code" AI produces turns out to be rework on its own earlier slop.

The common response divides neatly into pessimists ("this will never work") and pragmatists ("smarter models will fix it"). Dex Horthy, founder of HumanLayer and author of the influential "12 Factor Agents" essay that first coined the term "context engineering," argues both camps miss the point. The constraint is not intelligence. It is attention.

The attention economy of AI

Large language models process information through a fixed-size "context window" — roughly analogous to working memory. Horthy's team found that model performance degrades sharply once the context window fills to about 40% of its capacity, a threshold he calls the Dumb Zone. Below that line, the model reasons clearly. Above it, responses grow repetitive, contradictory, and architecturally unsound. The solution, Horthy argues, is not better models but better information management.

He calls the discipline "frequent intentional compaction" — the practice of actively compressing working context into structured summaries and starting fresh sessions at regular intervals. It is, in effect, the intellectual hygiene of AI-assisted development: clearing the desk before it buries you.

A three-phase method

From this insight, Horthy developed the RPI workflow: Research, Plan, Implement.

In the research phase, the AI examines the codebase without writing a line. It reads files, traces dependencies, and produces a concise research document. The goal is ground truth — what the code actually does, not what the documentation claims it does. ("Internal docs lie," Horthy observes. "Code doesn't.")

The plan phase is, by Horthy's reckoning, the most critical. The AI produces a detailed implementation plan: specific files, line numbers, code snippets, testing strategy. This document serves two purposes. First, it gives the human a chance to review the approach before a single line ships. Second, it acts as a compression of intent — a dense, actionable blueprint that carries into the implementation phase without the baggage of prior exploration.

Implementation then becomes mechanical execution of an approved plan, ideally in a fresh context window loaded only with the plan itself and the relevant files.

The cultural fault line

Horthy's sharpest observation may be organisational. Senior engineers, who already think architecturally, report marginal speed gains from AI. Junior and mid-level engineers, who lack that instinct, often use AI to fill the gap — shipping plausible-looking code that senior engineers must then untangle. The result is a new form of technical debt, generated faster than any human could create alone.

"Do not outsource the thinking," Horthy warns. AI amplifies whatever thinking you bring to it. If you bring rigour, you get leverage. If you bring vagueness, you get slop at scale.

The core tension: AI coding tools are force multipliers. That means they multiply carelessness just as readily as they multiply competence. The workflow — not the model — determines which.

Tier 3: Deep Technical

Full implementation detail — For practitioners

Context window mechanics

LLMs operate on a fixed token budget. Claude's effective context window is ~200k tokens, but attention quality degrades non-linearly with utilisation. Horthy's empirical finding: at ~40% utilisation (~67k tokens in a 168k window), the model enters a degraded regime characterised by:

This degradation is not a cliff but a gradient. The "Dumb Zone" onset varies with task complexity: simple tasks tolerate higher fill, complex multi-file reasoning degrades earlier.

## CONTEXT WINDOW QUALITY MODEL ZONE 1 (0-40% fill): "Smart Zone" High recall accuracy across full context Novel reasoning, architectural coherence Reliable tool use and structured output ZONE 2 (40-80% fill): "Dumb Zone" Attention fragmentation begins Model favours recent tokens over early context "Lost in the middle" phenomenon intensifies Output quality degrades proportional to fill ZONE 3 (80-100% fill): "Disaster Zone" Truncation of early context Incoherent outputs, contradictory statements Effectively unusable for complex tasks

Frequent Intentional Compaction (FIC)

FIC is the core mechanism for staying in the Smart Zone. Three compaction strategies:

StrategyWhenHow
Session reset After each RPI phase Summarise current state into structured markdown. Start new session with summary + relevant files only.
Sub-agent delegation During research phase Spawn sub-agents for specific file exploration. Results flow back as compressed summaries, not raw file content.
Progressive context loading During implementation Load files on-demand as the plan dictates, not all at once. Each file enters context only when its section of the plan executes.
The Ralph Loop (Post-Talk Evolution)

After the original talk, Horthy's team developed the Ralph loop — an automated version of the RPI cycle that runs continuously:

STEP 1: AGENT receives task description STEP 2: RESEARCH scan codebase, identify relevant files STEP 3: PLAN generate detailed implementation spec STEP 4: HUMAN reviews plan approve / reject / modify STEP 5: IMPLEMENT execute plan in fresh context STEP 6: COMPACT summarise what was done STEP 7: GOTO STEP 1 with next sub-task ## Key: Step 6 resets context to prevent Dumb Zone drift ## Each loop iteration starts with minimal, high-signal context

The name "Ralph" is informal — it refers to the ruthless reset-and-loop pattern. The key insight: each loop iteration operates in a fresh context window, carrying only the compacted summary from previous iterations plus the current sub-task plan.

RPI workflow — full implementation detail

Phase 1: Research

Phase 2: Plan (the critical step)

Phase 3: Implement

Scaling by task complexity

Task typeWorkflowExample
TrivialDirect conversationChange button colour
Small featureBrief plan, then buildAdd a form field
Multi-file featureResearch + planNew API endpoint across repos
Complex problemFull RPI with compactionRefactoring auth in 300k LOC Rust
ArchitecturalWhiteboard first, then RPIRemoving Hadoop dependency
Semantic diffusion — why terms decay

Horthy warns about semantic diffusion: the process by which precise technical terms lose meaning as they spread across communities and tools. His example: "spec-driven development" means different things to different tools and teams. Cursor interprets it one way, Claude Code another, and each team using those tools adds their own spin.

The result: what started as a sharp, actionable concept becomes a vague gesture. Teams think they're aligned because they use the same words, but execute differently because the words mean different things.

Defence: Define terms operationally within your team. Don't say "we do spec-driven development." Say "before implementation, we produce a markdown document containing file paths, line numbers, code snippets, and test strategy. The developer approves this document before any code is written."

"Don't outsource the thinking" — the junior/senior rift

The most culturally charged finding from Horthy's work:

Horthy's position: AI amplifies existing thinking. If you bring architectural clarity, you get leverage. If you bring "make it work somehow," you get technical debt at machine speed.

The organisational fix is top-down: mandate the RPI workflow. Force the plan review step. Make thinking visible before code ships.

Key Concepts Dictionary

TermDefinitionWhy it matters
Context Engineering Building dynamic systems to provide the right information in the right format so the LLM can complete the task First coined in Horthy's "12 Factor Agents" essay (Apr 2025). Now the dominant framing for AI-assisted development.
Dumb Zone The region (~40-80% context utilisation) where model quality degrades from attention fragmentation Explains why long AI conversations produce worse output than short ones
Frequent Intentional Compaction (FIC) Deliberately summarising and resetting context to stay in the Smart Zone The primary defence against Dumb Zone degradation
RPI Research → Plan → Implement. Three-phase workflow for complex AI-assisted coding. Separates thinking from typing. Makes decisions visible before code ships.
Ralph Loop Automated RPI cycle with mandatory compaction between iterations Enables sustained autonomous coding without context degradation
Semantic Diffusion Precise terms losing meaning as they spread across communities Teams think they're aligned but execute differently. Define terms operationally.
Compression of Intent A plan document that densely encodes the developer's thinking in a context-efficient format Survives context resets. Carries intent without carrying the full exploration history.
Mental Alignment Human and AI sharing the same understanding of what to build and why The plan review step achieves this. Without it, AI drifts from the developer's intent.
Ground Truth What the code actually does (vs. what docs say it does) Research phase must establish this from source code, not documentation.

The 12 Factor Agents (Foundation)

Horthy's April 2025 essay that started it all. Originally about building reliable LLM applications, it became the intellectual foundation for "No Vibes Allowed." The core argument: don't use prompts for control flow.

All 12 factors at a glance
  1. Natural language to tool calls — Convert requests into structured JSON that triggers deterministic code
  2. Own your prompts — Treat prompts as first-class code you control completely
  3. Own your context window — Structure information for maximum token and attention efficiency
  4. Tools are just structured outputs — Separate the decision from execution
  5. Unify execution state and business state — Infer status from context, don't track separately
  6. Launch / pause / resume with simple APIs — Build agents that handle long operations gracefully
  7. Contact humans with tool calls — Human-in-the-loop as a structured operation, not an exception
  8. Own your control flow — Build custom structures for interrupts, approvals, error handling
  9. Compact errors into context window — Include failures in context so agents self-heal
  10. Small, focused agents — 3-20 steps maximum. Longer contexts cause focus loss.
  11. Trigger from anywhere — Slack, email, SMS, webhooks — meet users where they are
  12. Make your agent a stateless reducer — Pure functions transforming state through events

The through-line: Factors 3, 9, and 10 directly feed the "No Vibes" thesis. Own context, compact errors, keep agents small. Everything else supports the structural reliability that makes these possible.

The key insight: agents are mostly deterministic code

Horthy's most contrarian claim: the best AI agents are not autonomous reasoning loops. They are mostly deterministic software with strategic LLM integration points. The LLM makes classification decisions at branch points; the control flow, error handling, and state management are all regular code.

## GOOD AGENT ARCHITECTURE (Horthy's model) INPUT Deterministic pre-processing LLM decision point (classify intent) Deterministic routing based on classification LLM decision point (generate plan) Human review gate Deterministic execution of plan LLM decision point (evaluate results) OUTPUT ## BAD AGENT ARCHITECTURE (the "bag of tools" anti-pattern) INPUT LLM loop until done (context fills up) (enters Dumb Zone) (produces slop) OUTPUT (probably wrong)

Talk Structure: Chapter-by-Chapter

00:00Intro: complex code — Stanford study, AI productivity paradox in brownfield 01:40Context engineering — Definition, origin in 12 Factor Agents essay 02:53Advanced context — Beyond basic prompting, structured information management 04:38Context obsession — Why context quality matters more than model quality 05:55Dumb Zone concept — 40% threshold, attention degradation mechanics 07:26Context management — Frequent Intentional Compaction, sub-agents 09:37Complex problem solved — 300k LOC Rust codebase case study 10:45Semantic diffusion — Terms losing meaning, operational definitions 12:14Onboarding agents — Treating AI like a new hire, research-first 13:57Internal docs lie — Ground truth from code, not documentation 15:03Mental alignment key — Plan review as alignment mechanism 16:12Code snippet plans — Plans with file paths, line numbers, test strategy 17:38Don't outsource thinking — Junior/senior rift, AI amplifies thinking quality 18:45RPI: Smart Zone — Full workflow, scaling by task complexity 19:46Cultural change hard — Organisational adoption, top-down mandate

Dex Horthy's Body of Work

DateWorkKey contribution
Apr 2025"12 Factor Agents" blog postCoined "context engineering." 12 principles for reliable LLM apps.
Jun 2025AI Engineer World's Fair (first talk)Context engineering concept "blew up." First public Dumb Zone discussion.
Aug 2025First sharing of FIC techniquesFrequent Intentional Compaction methodology published
Dec 2025"No Vibes Allowed" (this talk)Full RPI framework. 414k views. Definitive statement on AI coding discipline.
Mar 2026"Everything We Got Wrong About RPI"6-month post-mortem. Ralph loops. Lessons from scaling RPI in organisations.
Mar 2026Dev Interrupted podcastDeep dive on Ralph loops, escaping the Dumb Zone at scale
Related resources worth exploring

Real-World Results

Verified outcomes using RPI workflow:

Sources

Synthesised March 2026 from video content, blog posts, podcast appearances, and community analysis.