March 19, 2026 4 min read 2026

The Winning AI Strategy in 2026 Is Just Loops

I built skills, configured subagents, and set up slash commands. Then a single loop running overnight outperformed all of it. Three loop architectures that actually deliver.

I built skills. I configured subagents and slash commands. Then a single loop running overnight brought back better results than all of that setup combined.

In March 2026, the way to get the most out of AI is a simple loop that never stops running.

Ralph Loop: One Line of Bash That Pushes Through Failure

The core is while :; do cat PROMPT.md | claude-code ; done. When the agent finishes and tries to exit, a Stop Hook blocks the exit and feeds the same prompt back in.

The key insight is that every iteration opens a fresh context window. Previous work lives only in git history and the file system. The context itself always starts clean. This eliminates the classic problem where agent loops degrade as conversations get longer.

After each pass, learnings get recorded in AGENTS.md. The next iteration’s agent reads those notes automatically, so it avoids repeating the same mistakes. When a single task fails more than 10 times, it gets flagged as stuck and automatically broken into smaller pieces for retry. Failure itself becomes data. As Huntley put it, “deterministically bad” results feed right into the next loop’s input.

One honest admission: the first time I ran Ralph, about 3 out of 10 loops burned tokens repeating the same error. The cumulative learning only kicked in after I refined the prompt to properly structure what gets written to AGENTS.md. The tool matters less than the prompt design around it.

Ralph repository

RLM: A Model That Recursively Calls Itself to Reason

Feed a long document into an LLM and it loses accuracy toward the end. RLM solves this problem in a fundamentally different way.

Instead of passing a long prompt directly to the model, it loads the text into Python REPL variables. The model then writes code to slice, search, and selectively read those variables, calling itself again with just the relevant pieces. Rather than expanding the context window, the model decides how to navigate its own context.

GPT-5-mini with RLM scored more than double GPT-5’s correct answers on the OOLONG benchmark. The entire trajectory of recursive calls is preserved as code, so you can trace exactly why the model reached a given answer. Unlike summarization or RAG, which compress information, RLM delegates specific fragments to sub-LM calls. Information loss doesn’t happen structurally.

RLM repository

autoresearch: 100 Experiments While You Sleep

Give an agent a single train.py and let it modify freely. Change the architecture, tweak the optimizer, whatever it wants. Run training for exactly 5 minutes. If val_bpb improved, commit. If not, reset.

Repeat this overnight and by morning you have logs showing which changes worked and which failed. The human just writes the direction in program.md.

The fixed 5-minute time budget is what makes it work. Whether the agent changes model size or batch size, every experiment runs under identical conditions. Fair comparison is the core of high-quality iteration. Everything runs on a git branch, so failed experiments vanish with reset and successful ones accumulate as commits. A morning git log tells the full improvement story.

Karpathy’s next vision is a distributed research structure like SETI@home, where multiple agents experiment in different directions and merge results. That said, autoresearch currently runs on a single machine, and any experiment that doesn’t show meaningful difference within 5 minutes gets discarded. It’s not the right fit for every kind of research.

autoresearch repository

Why Repetition Works in AI

These three tools share a common principle. They all exploit test-time compute scaling: spending more computation at inference time improves performance without making the model larger.

OpenAI’s o1 already validated this principle. Ralph applies it to code quality. RLM applies it to context comprehension. autoresearch applies it to research.

When three ingredients come together, the output goes beyond simple code:

A worthwhile idea
A loop with clear verification conditions
Enough token budget to run overnight

Your 8 sleeping hours are someone else’s window for 100 improvements. Not all 100 will succeed, of course. That’s fine. The accumulated failures are fuel for the next loop.

Join the newsletter

Get updates on my latest projects, articles, and experiments with AI and web development.