ARCH RESEARCH
← RESEARCH
WRITEUP 001·THESIS·JUNE 2026

Why Fund Arch

Arch Research
ABSTRACT

The cost of building frontier AI is rising faster than almost any technology in history. The compute cost of the largest training runs is doubling roughly every eight months — about 2.4x per year. The leading labs already run training jobs that cost on the order of a billion dollars, and industry leaders openly project ten- to hundred-billion-dollar models within a few years, with total sector spending approaching a trillion dollars by 2027.

That trajectory is decided by capital, and it is not a race we can win by joining it. Arch is built on a different premise: efficiency per parameter. Rather than buying capability with ever-larger models and ever-larger budgets, we extract more capability from each parameter and each unit of compute. This document presents what we have measured — on a single consumer GPU, for under $50 — and what a grant would let us prove at scale.

We do not claim to rival a frontier model. We claim something more fundable: a measured, reproducible efficiency advantage, with every result produced by a named script and re-runnable by a reviewer. Negative results are retained, not discarded.


The thesis: intelligence per parameter

The frontier developers scale by adding parameters, data, and dollars. We pursue a different lever — getting more out of each parameter that already exists.

Our architecture, RDA (Recurrent Depth-Adaptive), lets a model reason in iterative rounds and decide for itself when to stop, allocating more computation to difficult inputs and less to easy ones. The result is higher capability and lower runtime cost per parameter.

ApproachThe leverThe result
The contendersMore parameters + more data + more dollarsCapability
ArchA better mechanism per parameterCapability per dollar

What we have proven

Every result below was produced on a consumer RTX 4060 for between $0 and $5, and is reproducible from the underlying scripts.

ClaimMeasured resultMethod
Adapts computation to difficultySpearman correlation of 0.99Problem difficulty vs. rounds used
Cheaper than fixed-depth, equal accuracy42% average compute saved (up to 75% on easy inputs), at full accuracyAdaptive halting, holds 5M–20M params
Outperforms the standard published method1.000 ± 0.000 vs. the baseline's 0.977 ± 0.0588 seeds each, identical training
Reasons beyond its training distributionSolves chains 2.5x longer than trained, by thinking longer3 seeds
Capability rises with scaleHeld-out reasoning 0.33 to 0.43 (2.2M to 14.6M params)3 seeds each, non-overlapping error bars
Generalizes across task typesOne model handles arithmetic and transitive logic, both generalizingSingle model, balanced data
Deterministic and reliable8 of 8 seeds converge; byte-identical rerunsDeterminism stress test

The efficiency advantage, quantified

RDA spends computation in proportion to difficulty. Measured against a fixed-depth model that always runs at full depth, on identical problems at full accuracy:

Easy (length 4)75% compute saved
Medium (length 8)59% compute saved
Hard (length 12)42% compute saved
Harder (length 16)26% compute saved
Hardest (length 20)9% compute saved
Average42% compute saved

On a realistic mixed workload, where most queries are easy, RDA uses roughly half the computation of a fixed-depth model for the same output. In commercial terms, that is a direct reduction in cost per query — and because it compounds at inference for the life of every deployment, it is a margin advantage that scaling parameters alone cannot produce. This matters precisely because the industry's own leaders now warn that inference, not training, is where total spending heads toward a trillion dollars.


Capability scales with size

This is the central result for an investor. We trained the same recipe at increasing model sizes and measured accuracy on reasoning chains longer than any seen during training — a test of genuine generalization, not memorization.

2.2M parameters0.329 ± 0.016
14.6M parameters0.433 ± 0.048

A 6.6x increase in parameters produced a measured gain of +0.10 in held-out reasoning accuracy, with non-overlapping error bars across three seeds — a real effect, not noise. The model's reasoning also became better-calibrated at the larger size. This is the concrete evidence that capital converts to capability: more parameters yield measurably more reasoning, on a task with genuine headroom.

We state the limit plainly: this is a two-point trend, measured locally on a structured reasoning task, and real-world scaling exhibits diminishing returns. We do not extrapolate it to a straight line. Extending this curve with real data at full scale is precisely the experiment a grant funds — and the point at which our per-parameter efficiency becomes the decisive advantage.


Why efficiency is the only viable wedge

Even the small open models released by major labs are products of enormous data budgets — compact models in the one-to-two-billion-parameter range are routinely trained on eleven to eighteen trillion tokens. A new entrant cannot replicate that data and compute scale, and should not try. The opening is not to out-spend the incumbents on scale, but to out-engineer them on efficiency: a mechanism that delivers more capability per parameter is worth more, per dollar, at every size — and that advantage only grows as compute becomes the binding constraint on the entire industry.


How funding would be deployed

We allocate capital the way the thesis demands: for maximum result per dollar.

AllocationSharePurposeRationale
Training data35%Quality tokens at scaleThe proven bottleneck is data, not architecture
Compute35%GPU hours; scale RDA from 100M to 1–3B across many runsEach run extends the scaling curve above
Senior research hire20%Approximately one researcher-yearCompounds output beyond any single GPU
Runway and operations10%Entity, legal, 12–18 month bufferRemoves pressure to raise on poor terms

Funds are spent across many small, measured runs rather than a single large gamble — every result de-risks the next dollar.


Why this is a low-risk grant

- The hard problems are already solved cheaply. Efficiency, generalization, the scaling lift, and reliability were established for near-zero cost on consumer hardware. That capital discipline is how we would steward grant funds. - Every claim is a number against a fair baseline. No hype, no claims of general intelligence, negative results logged. Any reviewer can reproduce any result from the underlying script. - The next milestone is legible and cheap. The decisive scale test costs single-digit dollars, not millions. A grant funds a clearly defined experiment with a measurable pass/fail criterion, not an open-ended bet.


What remains unproven

We state our open questions directly, because credibility depends on it.

Open questionWhy it mattersCost to answer
Does the efficiency edge hold at 100M+ parameters on real language?The thesis at scaleA funded run on real data
Does a small RDA match a model several times its size?The core intelligence-per-parameter claimScale run plus a fair larger baseline
Does the approach extend to a general assistant?The product directionA larger run on real data

The mechanism is proven. Funding proves it at scale. That is the bet — and it is a measured one.

REFERENCES
  1. [1]Epoch AI — "Training compute costs are doubling every eight months for the largest AI models." Frontier training cost growth of ~2.4x per year.
  2. [2]Epoch AI — "The rising costs of training frontier AI models" (arXiv:2405.21015). Largest runs projected to exceed $1B by 2027.
  3. [3]Dario Amodei (Anthropic CEO) — current frontier models cost ~$100M; billion-dollar runs already in development; $10B–$100B models and ~$1T sector inference spend projected by 2027.
  4. [4]SmolLM2 — "When Smol Goes Big: Data-Centric Training of a Small Language Model" (arXiv:2502.02737). A 1.7B model trained on ~11 trillion tokens.
  5. [5]Qwen2.5 Technical Report — compact open models trained on up to 18 trillion tokens.

All Arch figures are reproducible from the project's research scripts and findings. Total spent to date: under $50, on a consumer GPU.

CITE
Arch Research (2026). Why Fund Arch. Arch Research.