RLM ARCHITECTURE

▸ DEMO BRIEF #001 / SECTOR: DEV-INFRA / 2026.05

A two-tier, Cloudflare-native agentic CLI built on Recursive Language Models.

SO WE KILLED
THE CONTEXT
WINDOW.

▸ MISSION SUMMARY

Managed recursive AI coding CLI. The root model writes code; agent swarms decompose impossible problems in parallel.

No context limits. No API keys. No config. The prompt is the environment, not the input.

Install hotcopy.ai ▸

FIG. 01.2 // PARALLEL EXECUTION

α-01scan auth/

45%

α-02map imports

31%

γ-01gen tests

77%

β-01audit CVEs

59%

β-02fix oauth2

33%

δ-02validate

80%

FIG. 01.1 // LIVE DISPATCH● RECORDING

~ hotcopy-term //

$ npx hotcopy login

▸ Opening browser... signing in.

$ hotcopy "refactor auth to use OAuth2 with PKCE"

▸ Scanning project... 2,847 files indexed

▸ Orchestrator dispatching to 12 agents...

▸ [α-01] analyzing auth flow [α-02] mapping dependencies

▸ [α-03] generating tests [β-01] security audit

▸ [β-02] fix oauth2 pkce impl [β-03] type inference

▸ [γ-01] validation checks [γ-02] migration scripts

▸ [δ-01] docs update [δ-02] changelog

✓ Hot copy. 14 files modified. All tests passing.

▸ METRIC 01

∞

CONTEXT DEPTH

unbounded by design

▸ METRIC 02

50+

PARALLEL AGENTS

per single task

▸ METRIC 03

5×

CHEAPER THAN OPUS

on the same task

▸ METRIC 04

API KEYS REQUIRED

fully managed

BRIEF 02 // HOW RLM WORKS

▸ ALGORITHM 1 — ZHANG ET AL. 2026

Recursive Language
Model architecture.

A root orchestrator decomposes your task. Worker agents execute in parallel. Results synthesize back. No single model sees everything — but everything gets seen.

PROMPT AS ENVIRONMENT

Your codebase loads as a variable in a sandboxed REPL — never into a model's context window. The orchestrator sees only metadata.

ROOT WRITES CODE

The root model generates JavaScript that runs in a Cloudflare V8 isolate. It reads slices of context, regex-filters with priors, chunks by AST.

SUB-CALLS FAN OUT

llm_batch() dispatches worker agents in parallel. Each worker gets isolated context, returns a 1–2K token summary.

ANSWER FROM VARIABLES

The result is built up across REPL turns and returned via FINAL_VAR() — bypassing every model's generation length cap.

BRIEF 03 // ARCHITECTURE

▸ FULLY MANAGED STACK

Fully managed.
You just write code.

Cloudflare runs the models, orchestration, routing, and scaling on hardened infrastructure. Your source stays local. Only targeted context reaches workers in encrypted, ephemeral channels. Code is never stored or trained on.

▸ LOCAL CLI

Your machine

HOTCOPY.md

MCP servers

Test suite

◀ contextagents ▶

RLM Engine CLOUDFLARE WORKERS

◀ resultsrecurse ▶

▸ MANAGED

Workers AI · root orchestrator

Workers AI · sub-call workers

D1 + Vectorize memory

Durable Objects swarm

BRIEF 04 // BENCHMARKS

▸ PAPER RESULTS

Real numbers.
Reproducible.

Recursive Language Models close the gap on long-context benchmarks where base models collapse — at median cost equal to or below the base model.

Benchmark	Input size	Base GPT-5	RLM (GPT-5)	Avg cost
BrowseComp+	6–11M tokens	0%	91.3%	$0.99
OOLONG	131K tokens	44.0%	56.5%	$0.43
OOLONG-Pairs	32K tokens	0.04%	58.0%	$0.33
CodeQA	23K – 4.2M tokens	24.0%	62.0%	$0.11

▸ ZHANG, KRASKA, KHATTAB. MIT CSAIL. arXiv:2512.24601, JAN 2026.

FIG. 03

50+ agents.
One command.
One bill.

A live snapshot of the orchestrator at work. Red nodes are active scouts; tan nodes are idle workers waiting for dispatch. The swarm churns through chunks at sub-second cadence.

▸ FIG.03.1 // SWARM TOPOLOGY● 60 ACTIVE

END OF BRIEF // RUN IT

Hot copy.
Cold start.

One npx command. Sign in. The swarm dispatches.

Install HotCopy ▸ Read the paper

▸ ALPHA · NO CARD REQUIRED · MANAGED INFRA

SO WE KILLEDTHE CONTEXTWINDOW.

Recursive LanguageModel architecture.

Fully managed.You just write code.

Real numbers.Reproducible.

50+ agents.One command.One bill.