RLM ARCHITECTURE
▸ DEMO BRIEF #001 / SECTOR: DEV-INFRA / 2026.05
A two-tier, Cloudflare-native agentic CLI built on Recursive Language Models.

SO WE KILLED
THE CONTEXT
WINDOW.

▸ MISSION SUMMARY

Managed recursive AI coding CLI. The root model writes code; agent swarms decompose impossible problems in parallel.

No context limits. No API keys. No config. The prompt is the environment, not the input.

FIG. 01.2 // PARALLEL EXECUTION
α-01scan auth/
45%
α-02map imports
31%
γ-01gen tests
77%
β-01audit CVEs
59%
β-02fix oauth2
33%
δ-02validate
80%
FIG. 01.1 // LIVE DISPATCH● RECORDING
~ hotcopy-term //
$ npx hotcopy login
▸ Opening browser... signing in.
$ hotcopy "refactor auth to use OAuth2 with PKCE"
▸ Scanning project... 2,847 files indexed
▸ Orchestrator dispatching to 12 agents...
▸ [α-01] analyzing auth flow [α-02] mapping dependencies
▸ [α-03] generating tests [β-01] security audit
▸ [β-02] fix oauth2 pkce impl [β-03] type inference
▸ [γ-01] validation checks [γ-02] migration scripts
▸ [δ-01] docs update [δ-02] changelog
✓ Hot copy. 14 files modified. All tests passing.
▸ METRIC 01
CONTEXT DEPTH
unbounded by design
▸ METRIC 02
50+
PARALLEL AGENTS
per single task
▸ METRIC 03
CHEAPER THAN OPUS
on the same task
▸ METRIC 04
0
API KEYS REQUIRED
fully managed
BRIEF 02 // HOW RLM WORKS
▸ ALGORITHM 1 — ZHANG ET AL. 2026

Recursive Language
Model architecture.

A root orchestrator decomposes your task. Worker agents execute in parallel. Results synthesize back. No single model sees everything — but everything gets seen.

01
PROMPT AS ENVIRONMENT

Your codebase loads as a variable in a sandboxed REPL — never into a model's context window. The orchestrator sees only metadata.

02
ROOT WRITES CODE

The root model generates JavaScript that runs in a Cloudflare V8 isolate. It reads slices of context, regex-filters with priors, chunks by AST.

03
SUB-CALLS FAN OUT

llm_batch() dispatches worker agents in parallel. Each worker gets isolated context, returns a 1–2K token summary.

04
ANSWER FROM VARIABLES

The result is built up across REPL turns and returned via FINAL_VAR() — bypassing every model's generation length cap.

BRIEF 03 // ARCHITECTURE
▸ FULLY MANAGED STACK

Fully managed.
You just write code.

Cloudflare runs the models, orchestration, routing, and scaling on hardened infrastructure. Your source stays local. Only targeted context reaches workers in encrypted, ephemeral channels. Code is never stored or trained on.

▸ LOCAL CLI
Your machine
HOTCOPY.md
MCP servers
Test suite
◀ contextagents ▶
RLM Engine CLOUDFLARE WORKERS
◀ resultsrecurse ▶
▸ MANAGED
Workers AI · root orchestrator
Workers AI · sub-call workers
D1 + Vectorize memory
Durable Objects swarm
BRIEF 04 // BENCHMARKS
▸ PAPER RESULTS

Real numbers.
Reproducible.

Recursive Language Models close the gap on long-context benchmarks where base models collapse — at median cost equal to or below the base model.

Benchmark Input size Base GPT-5 RLM (GPT-5) Avg cost
BrowseComp+ 6–11M tokens 0% 91.3% $0.99
OOLONG 131K tokens 44.0% 56.5% $0.43
OOLONG-Pairs 32K tokens 0.04% 58.0% $0.33
CodeQA 23K – 4.2M tokens 24.0% 62.0% $0.11

▸ ZHANG, KRASKA, KHATTAB. MIT CSAIL. arXiv:2512.24601, JAN 2026.

FIG. 03

50+ agents.
One command.
One bill.

A live snapshot of the orchestrator at work. Red nodes are active scouts; tan nodes are idle workers waiting for dispatch. The swarm churns through chunks at sub-second cadence.

▸ FIG.03.1 // SWARM TOPOLOGY● 60 ACTIVE
OROOT◉ 60 ACTIVE SCOUTSDISPATCHED IN PARALLEL ▸
END OF BRIEF // RUN IT

Hot copy.
Cold start.

One npx command. Sign in. The swarm dispatches.

▸ ALPHA · NO CARD REQUIRED · MANAGED INFRA