WRITEUP 001·THESIS·JUNE 2026

Why Fund Arch

Arch Research

ABSTRACT

The cost of building frontier AI is rising faster than almost any technology in history. The compute cost of the largest training runs is doubling roughly every eight months — about 2.4x per year. The leading labs already run training jobs that cost on the order of a billion dollars, and industry leaders openly project ten- to hundred-billion-dollar models within a few years, with total sector spending approaching a trillion dollars by 2027.

That trajectory is decided by capital, and it is not a race we can win by joining it. Arch is built on a different premise: efficiency per parameter. Rather than buying capability with ever-larger models and ever-larger budgets, we extract more capability from each parameter and each unit of compute. This document presents what we have measured — on a single consumer GPU, for under $50 — and what a grant would let us prove at scale.

We do not claim to rival a frontier model. We claim something more fundable: a measured, reproducible efficiency advantage, with every result produced by a named script and re-runnable by a reviewer. Negative results are retained, not discarded.

The thesis: intelligence per parameter

The frontier developers scale by adding parameters, data, and dollars. We pursue a different lever — getting more out of each parameter that already exists.

Our architecture, RDA (Recurrent Depth-Adaptive), lets a model reason in iterative rounds and decide for itself when to stop, allocating more computation to difficult inputs and less to easy ones. The result is higher capability and lower runtime cost per parameter.

Approach	The lever	The result
The contenders	More parameters + more data + more dollars	Capability
Arch	A better mechanism per parameter	Capability per dollar

What we have proven

Every result below was produced on a consumer RTX 4060 for between $0 and $5, and is reproducible from the underlying scripts.

Claim	Measured result	Method
Adapts computation to difficulty	Spearman correlation of 0.99	Problem difficulty vs. rounds used
Cheaper than fixed-depth, equal accuracy	42% average compute saved (up to 75% on easy inputs), at full accuracy	Adaptive halting, holds 5M–20M params
Outperforms the standard published method	1.000 ± 0.000 vs. the baseline's 0.977 ± 0.058	8 seeds each, identical training
Reasons beyond its training distribution	Solves chains 2.5x longer than trained, by thinking longer	3 seeds
Capability rises with scale	Held-out reasoning 0.33 to 0.43 (2.2M to 14.6M params)	3 seeds each, non-overlapping error bars
Generalizes across task types	One model handles arithmetic and transitive logic, both generalizing	Single model, balanced data
Deterministic and reliable	8 of 8 seeds converge; byte-identical reruns	Determinism stress test

The efficiency advantage, quantified

RDA spends computation in proportion to difficulty. Measured against a fixed-depth model that always runs at full depth, on identical problems at full accuracy:

Easy (length 4)75% compute saved

Medium (length 8)59% compute saved

Hard (length 12)42% compute saved

Harder (length 16)26% compute saved

Hardest (length 20)9% compute saved

Average42% compute saved

On a realistic mixed workload, where most queries are easy, RDA uses roughly half the computation of a fixed-depth model for the same output. In commercial terms, that is a direct reduction in cost per query — and because it compounds at inference for the life of every deployment, it is a margin advantage that scaling parameters alone cannot produce. This matters precisely because the industry's own leaders now warn that inference, not training, is where total spending heads toward a trillion dollars.

Capability scales with size

This is the central result for an investor. We trained the same recipe at increasing model sizes and measured accuracy on reasoning chains longer than any seen during training — a test of genuine generalization, not memorization.

2.2M parameters0.329 ± 0.016

14.6M parameters0.433 ± 0.048

A 6.6x increase in parameters produced a measured gain of +0.10 in held-out reasoning accuracy, with non-overlapping error bars across three seeds — a real effect, not noise. The model's reasoning also became better-calibrated at the larger size. This is the concrete evidence that capital converts to capability: more parameters yield measurably more reasoning, on a task with genuine headroom.

We state the limit plainly: this is a two-point trend, measured locally on a structured reasoning task, and real-world scaling exhibits diminishing returns. We do not extrapolate it to a straight line. Extending this curve with real data at full scale is precisely the experiment a grant funds — and the point at which our per-parameter efficiency becomes the decisive advantage.

Why efficiency is the only viable wedge

Even the small open models released by major labs are products of enormous data budgets — compact models in the one-to-two-billion-parameter range are routinely trained on eleven to eighteen trillion tokens. A new entrant cannot replicate that data and compute scale, and should not try. The opening is not to out-spend the incumbents on scale, but to out-engineer them on efficiency: a mechanism that delivers more capability per parameter is worth more, per dollar, at every size — and that advantage only grows as compute becomes the binding constraint on the entire industry.

How funding would be deployed

We allocate capital the way the thesis demands: for maximum result per dollar.

Allocation	Share	Purpose	Rationale
Training data	35%	Quality tokens at scale	The proven bottleneck is data, not architecture
Compute	35%	GPU hours; scale RDA from 100M to 1–3B across many runs	Each run extends the scaling curve above
Senior research hire	20%	Approximately one researcher-year	Compounds output beyond any single GPU
Runway and operations	10%	Entity, legal, 12–18 month buffer	Removes pressure to raise on poor terms

Funds are spent across many small, measured runs rather than a single large gamble — every result de-risks the next dollar.

Why this is a low-risk grant

- The hard problems are already solved cheaply. Efficiency, generalization, the scaling lift, and reliability were established for near-zero cost on consumer hardware. That capital discipline is how we would steward grant funds. - Every claim is a number against a fair baseline. No hype, no claims of general intelligence, negative results logged. Any reviewer can reproduce any result from the underlying script. - The next milestone is legible and cheap. The decisive scale test costs single-digit dollars, not millions. A grant funds a clearly defined experiment with a measurable pass/fail criterion, not an open-ended bet.

What remains unproven

We state our open questions directly, because credibility depends on it.

Open question	Why it matters	Cost to answer
Does the efficiency edge hold at 100M+ parameters on real language?	The thesis at scale	A funded run on real data
Does a small RDA match a model several times its size?	The core intelligence-per-parameter claim	Scale run plus a fair larger baseline
Does the approach extend to a general assistant?	The product direction	A larger run on real data

The mechanism is proven. Funding proves it at scale. That is the bet — and it is a measured one.

REFERENCES

All Arch figures are reproducible from the project's research scripts and findings. Total spent to date: under $50, on a consumer GPU.

CITE

Arch Research (2026). Why Fund Arch. Arch Research.