No Vibes Allowed

Solving Hard Problems in Complex Codebases

Dex Horthy, HumanLayer (YC F24) — AI Engineer World's Fair, Dec 2025
414k views • 13k likes • 20:30 • Progressive education resource synthesized from talk + related works

Tier 1: The Simple Version

Plain English — No jargon

The one-sentence version: AI coding tools get dumber the longer you talk to them, so you have to keep conversations short and structured.

The problem

AI coding tools (like Claude Code, Cursor, Copilot) are great at building new stuff from scratch. But when you point them at a big, messy, real-world codebase — the kind with ten years of history and thousands of files — they start writing bad code. A Stanford study found that the "extra code" AI writes often just cleans up the mess it made last week.

Why it happens

Think of the AI's brain as a whiteboard. It can only hold so much at once. When you keep chatting and pasting files and asking follow-ups, you fill the whiteboard with noise. Around the halfway mark, the AI enters what Dex calls the "Dumb Zone" — it stops thinking clearly and starts guessing.

Smart Zone Dumb Zone Disaster |<-- clear ------>|<-- foggy -------->|<-- broken -->| 0% ~40% ~80% 100% of the AI's memory used

The fix

Instead of just chatting at the AI and hoping, use a three-step process:

Research — Let the AI look around the codebase and take notes. No coding yet.
Plan — The AI writes a detailed plan. You read it and approve it.
Build — The AI follows the plan, step by step.

And the secret weapon: start fresh often. When the conversation gets long, ask the AI to write a summary, then start a brand new chat with that summary. Wipe the whiteboard, keep the notes.

The bumper sticker: Research. Plan. Build. Reset. Repeat.

Tier 2: The Economist Version

Clear, informed analysis — As if explaining to a smart friend

The productivity paradox

A survey of 100,000 developers has exposed an awkward truth about AI coding tools: they make easy things easier and hard things harder. In greenfield projects — the blank-slate work of building from zero — AI copilots deliver genuine speed gains. But in brownfield codebases, the decade-old repositories where most real software lives, the gains evaporate. Much of the "extra code" AI produces turns out to be rework on its own earlier slop.

The common response divides neatly into pessimists ("this will never work") and pragmatists ("smarter models will fix it"). Dex Horthy, founder of HumanLayer and author of the influential "12 Factor Agents" essay that first coined the term "context engineering," argues both camps miss the point. The constraint is not intelligence. It is attention.

The attention economy of AI

Large language models process information through a fixed-size "context window" — roughly analogous to working memory. Horthy's team found that model performance degrades sharply once the context window fills to about 40% of its capacity, a threshold he calls the Dumb Zone. Below that line, the model reasons clearly. Above it, responses grow repetitive, contradictory, and architecturally unsound. The solution, Horthy argues, is not better models but better information management.

He calls the discipline "frequent intentional compaction" — the practice of actively compressing working context into structured summaries and starting fresh sessions at regular intervals. It is, in effect, the intellectual hygiene of AI-assisted development: clearing the desk before it buries you.

A three-phase method

From this insight, Horthy developed the RPI workflow: Research, Plan, Implement.

In the research phase, the AI examines the codebase without writing a line. It reads files, traces dependencies, and produces a concise research document. The goal is ground truth — what the code actually does, not what the documentation claims it does. ("Internal docs lie," Horthy observes. "Code doesn't.")

The plan phase is, by Horthy's reckoning, the most critical. The AI produces a detailed implementation plan: specific files, line numbers, code snippets, testing strategy. This document serves two purposes. First, it gives the human a chance to review the approach before a single line ships. Second, it acts as a compression of intent — a dense, actionable blueprint that carries into the implementation phase without the baggage of prior exploration.

Implementation then becomes mechanical execution of an approved plan, ideally in a fresh context window loaded only with the plan itself and the relevant files.

The cultural fault line

Horthy's sharpest observation may be organisational. Senior engineers, who already think architecturally, report marginal speed gains from AI. Junior and mid-level engineers, who lack that instinct, often use AI to fill the gap — shipping plausible-looking code that senior engineers must then untangle. The result is a new form of technical debt, generated faster than any human could create alone.

"Do not outsource the thinking," Horthy warns. AI amplifies whatever thinking you bring to it. If you bring rigour, you get leverage. If you bring vagueness, you get slop at scale.

The core tension: AI coding tools are force multipliers. That means they multiply carelessness just as readily as they multiply competence. The workflow — not the model — determines which.

Tier 3: Deep Technical

Full implementation detail — For practitioners

Context window mechanics

LLMs operate on a fixed token budget. Claude's effective context window is ~200k tokens, but attention quality degrades non-linearly with utilisation. Horthy's empirical finding: at ~40% utilisation (~67k tokens in a 168k window), the model enters a degraded regime characterised by:

Repetition of earlier patterns instead of novel reasoning
Contradiction between early-context and late-context statements
Architectural drift — losing coherence with the established plan
Hallucinated file paths, function names, and API surfaces

This degradation is not a cliff but a gradient. The "Dumb Zone" onset varies with task complexity: simple tasks tolerate higher fill, complex multi-file reasoning degrades earlier.

## CONTEXT WINDOW QUALITY MODEL ZONE 1 (0-40% fill): "Smart Zone" → High recall accuracy across full context → Novel reasoning, architectural coherence → Reliable tool use and structured output ZONE 2 (40-80% fill): "Dumb Zone" → Attention fragmentation begins → Model favours recent tokens over early context → "Lost in the middle" phenomenon intensifies → Output quality degrades proportional to fill ZONE 3 (80-100% fill): "Disaster Zone" → Truncation of early context → Incoherent outputs, contradictory statements → Effectively unusable for complex tasks

Frequent Intentional Compaction (FIC)

FIC is the core mechanism for staying in the Smart Zone. Three compaction strategies:

Strategy	When	How
Session reset	After each RPI phase	Summarise current state into structured markdown. Start new session with summary + relevant files only.
Sub-agent delegation	During research phase	Spawn sub-agents for specific file exploration. Results flow back as compressed summaries, not raw file content.
Progressive context loading	During implementation	Load files on-demand as the plan dictates, not all at once. Each file enters context only when its section of the plan executes.

The Ralph Loop (Post-Talk Evolution)

After the original talk, Horthy's team developed the Ralph loop — an automated version of the RPI cycle that runs continuously:

STEP 1: AGENT receives task description STEP 2: RESEARCH → scan codebase, identify relevant files STEP 3: PLAN → generate detailed implementation spec STEP 4: HUMAN reviews plan → approve / reject / modify STEP 5: IMPLEMENT → execute plan in fresh context STEP 6: COMPACT → summarise what was done STEP 7: GOTO STEP 1 with next sub-task ## Key: Step 6 resets context to prevent Dumb Zone drift ## Each loop iteration starts with minimal, high-signal context

The name "Ralph" is informal — it refers to the ruthless reset-and-loop pattern. The key insight: each loop iteration operates in a fresh context window, carrying only the compacted summary from previous iterations plus the current sub-task plan.

RPI workflow — full implementation detail

Phase 1: Research

Objective: Establish ground truth about the codebase
Constraints: No file modifications. Read-only exploration.
Output: Structured research document listing: relevant files (with paths), key data structures, API surfaces, dependency graph, test infrastructure
Anti-pattern: Trusting internal documentation. Docs lag behind code. Always verify by reading actual source.

Phase 2: Plan (the critical step)

Objective: Produce an actionable implementation blueprint
Must contain: Exact file paths, specific line numbers, code snippets showing the change, testing strategy, rollback plan
Purpose: Compression of intent — the plan is a dense representation of the developer's thinking that survives context compaction
Human review is non-negotiable. A bad plan generates hundreds of bad lines. Catching it here saves 10x the effort.

Phase 3: Implement

Objective: Mechanical execution of the approved plan
Context loading: Fresh window containing only: the plan, the specific files being modified, relevant test files
Decision minimisation: The plan has already made the decisions. Implementation should be boring.

Scaling by task complexity

Task type	Workflow	Example
Trivial	Direct conversation	Change button colour
Small feature	Brief plan, then build	Add a form field
Multi-file feature	Research + plan	New API endpoint across repos
Complex problem	Full RPI with compaction	Refactoring auth in 300k LOC Rust
Architectural	Whiteboard first, then RPI	Removing Hadoop dependency

Semantic diffusion — why terms decay

Horthy warns about semantic diffusion: the process by which precise technical terms lose meaning as they spread across communities and tools. His example: "spec-driven development" means different things to different tools and teams. Cursor interprets it one way, Claude Code another, and each team using those tools adds their own spin.

The result: what started as a sharp, actionable concept becomes a vague gesture. Teams think they're aligned because they use the same words, but execute differently because the words mean different things.

Defence: Define terms operationally within your team. Don't say "we do spec-driven development." Say "before implementation, we produce a markdown document containing file paths, line numbers, code snippets, and test strategy. The developer approves this document before any code is written."

"Don't outsource the thinking" — the junior/senior rift

The most culturally charged finding from Horthy's work:

Senior engineers report 10-30% speedups. They already think architecturally; AI handles the typing.
Junior/mid engineers report 200-500% speedups — but much of the output is slop. AI fills the skill gap with plausible-looking code that lacks architectural coherence.
The debt cycle: Juniors ship fast with AI. Seniors spend weeks untangling the result. Net productivity may be negative.

Horthy's position: AI amplifies existing thinking. If you bring architectural clarity, you get leverage. If you bring "make it work somehow," you get technical debt at machine speed.

The organisational fix is top-down: mandate the RPI workflow. Force the plan review step. Make thinking visible before code ships.

Key Concepts Dictionary

Term	Definition	Why it matters
Context Engineering	Building dynamic systems to provide the right information in the right format so the LLM can complete the task	First coined in Horthy's "12 Factor Agents" essay (Apr 2025). Now the dominant framing for AI-assisted development.
Dumb Zone	The region (~40-80% context utilisation) where model quality degrades from attention fragmentation	Explains why long AI conversations produce worse output than short ones
Frequent Intentional Compaction (FIC)	Deliberately summarising and resetting context to stay in the Smart Zone	The primary defence against Dumb Zone degradation
RPI	Research → Plan → Implement. Three-phase workflow for complex AI-assisted coding.	Separates thinking from typing. Makes decisions visible before code ships.
Ralph Loop	Automated RPI cycle with mandatory compaction between iterations	Enables sustained autonomous coding without context degradation
Semantic Diffusion	Precise terms losing meaning as they spread across communities	Teams think they're aligned but execute differently. Define terms operationally.
Compression of Intent	A plan document that densely encodes the developer's thinking in a context-efficient format	Survives context resets. Carries intent without carrying the full exploration history.
Mental Alignment	Human and AI sharing the same understanding of what to build and why	The plan review step achieves this. Without it, AI drifts from the developer's intent.
Ground Truth	What the code actually does (vs. what docs say it does)	Research phase must establish this from source code, not documentation.

The 12 Factor Agents (Foundation)

Horthy's April 2025 essay that started it all. Originally about building reliable LLM applications, it became the intellectual foundation for "No Vibes Allowed." The core argument: don't use prompts for control flow.

All 12 factors at a glance

Natural language to tool calls — Convert requests into structured JSON that triggers deterministic code
Own your prompts — Treat prompts as first-class code you control completely
Own your context window — Structure information for maximum token and attention efficiency
Tools are just structured outputs — Separate the decision from execution
Unify execution state and business state — Infer status from context, don't track separately
Launch / pause / resume with simple APIs — Build agents that handle long operations gracefully
Contact humans with tool calls — Human-in-the-loop as a structured operation, not an exception
Own your control flow — Build custom structures for interrupts, approvals, error handling
Compact errors into context window — Include failures in context so agents self-heal
Small, focused agents — 3-20 steps maximum. Longer contexts cause focus loss.
Trigger from anywhere — Slack, email, SMS, webhooks — meet users where they are
Make your agent a stateless reducer — Pure functions transforming state through events

The through-line: Factors 3, 9, and 10 directly feed the "No Vibes" thesis. Own context, compact errors, keep agents small. Everything else supports the structural reliability that makes these possible.

The key insight: agents are mostly deterministic code

Horthy's most contrarian claim: the best AI agents are not autonomous reasoning loops. They are mostly deterministic software with strategic LLM integration points. The LLM makes classification decisions at branch points; the control flow, error handling, and state management are all regular code.

## GOOD AGENT ARCHITECTURE (Horthy's model) INPUT → Deterministic pre-processing → LLM decision point (classify intent) → Deterministic routing based on classification → LLM decision point (generate plan) → Human review gate → Deterministic execution of plan → LLM decision point (evaluate results) → OUTPUT ## BAD AGENT ARCHITECTURE (the "bag of tools" anti-pattern) INPUT → LLM loop until done → (context fills up) → (enters Dumb Zone) → (produces slop) → OUTPUT (probably wrong)

Talk Structure: Chapter-by-Chapter

00:00Intro: complex code — Stanford study, AI productivity paradox in brownfield 01:40Context engineering — Definition, origin in 12 Factor Agents essay 02:53Advanced context — Beyond basic prompting, structured information management 04:38Context obsession — Why context quality matters more than model quality 05:55Dumb Zone concept — 40% threshold, attention degradation mechanics 07:26Context management — Frequent Intentional Compaction, sub-agents 09:37Complex problem solved — 300k LOC Rust codebase case study 10:45Semantic diffusion — Terms losing meaning, operational definitions 12:14Onboarding agents — Treating AI like a new hire, research-first 13:57Internal docs lie — Ground truth from code, not documentation 15:03Mental alignment key — Plan review as alignment mechanism 16:12Code snippet plans — Plans with file paths, line numbers, test strategy 17:38Don't outsource thinking — Junior/senior rift, AI amplifies thinking quality 18:45RPI: Smart Zone — Full workflow, scaling by task complexity 19:46Cultural change hard — Organisational adoption, top-down mandate

Dex Horthy's Body of Work

Date	Work	Key contribution
Apr 2025	"12 Factor Agents" blog post	Coined "context engineering." 12 principles for reliable LLM apps.
Jun 2025	AI Engineer World's Fair (first talk)	Context engineering concept "blew up." First public Dumb Zone discussion.
Aug 2025	First sharing of FIC techniques	Frequent Intentional Compaction methodology published
Dec 2025	"No Vibes Allowed" (this talk)	Full RPI framework. 414k views. Definitive statement on AI coding discipline.
Mar 2026	"Everything We Got Wrong About RPI"	6-month post-mortem. Ralph loops. Lessons from scaling RPI in organisations.
Mar 2026	Dev Interrupted podcast	Deep dive on Ralph loops, escaping the Dumb Zone at scale

Related resources worth exploring

12 Factor Agents (full text): humanlayer.dev/blog/12-factor-agents — The original essay
12 Factor Agents (GitHub): github.com/humanlayer/12-factor-agents — Community-maintained with examples
Dev Interrupted podcast: "Dex Horthy on Ralph, RPI, and Escaping the Dumb Zone" — 45min deep dive
LangChain blog: "The Rise of Context Engineering" — How the term spread across the industry
AI Engineer conference: "Don't Build Agents, Build Skills Instead" (Barry Zhang & Mahesh Murag, Anthropic) — Complementary talk from the same conference
Y Combinator feature: YC interview on scaling coding agents and spec-first development
HumanLayer: humanlayer.dev — Horthy's company, building human-in-the-loop infrastructure for agents

Real-World Results

Verified outcomes using RPI workflow:

300k LOC Rust codebase — Successfully fixed complex bug using full RPI + compaction
35,000 lines shipped to BAML — 7-hour session using structured context management
Hackathon overnight — Ralph loops replicated 6 sponsor products autonomously
Failure case: Hadoop dependency removal from complex Java — required whiteboard-first human analysis. AI couldn't replace the architectural thinking.

Sources

Synthesised March 2026 from video content, blog posts, podcast appearances, and community analysis.