POST 004·EFFICIENCY·JUNE 2026

Spending Computation Only Where It Is Needed

Arch Research

ABSTRACT

A standard AI model does the same amount of work on every input. Whether you ask it something trivial or something genuinely hard, it runs through the same fixed number of steps before answering. It is a little like a person who spends exactly the same amount of time on "what is two plus two" as on a difficult proof. It works, but it wastes an enormous amount of effort on the easy cases.

We built our models to do the obvious thing instead: spend more computation on hard inputs and less on easy ones, and decide for themselves which is which.

What we measured

We tested how much computation our approach uses across problems of increasing difficulty, compared to a conventional fixed-effort model, holding the accuracy the same. The savings were large, and they tracked difficulty exactly as they should.

Easy problemsUp to 75% less computation used

Typical workloadAbout 42% less on average

Hardest problemsOnly 9% less, as these genuinely need the effort

On easy problems, our model used a fraction of the computation. On the hardest ones, it used almost as much as the standard model, because those problems genuinely require the effort. On a realistic mix of work, where most questions are easy, it cut computation roughly in half, with no loss in accuracy.

We also confirmed that the effort it spends is genuinely tied to difficulty and not random: across many problems, the harder the input, the more the model worked on it, with a near-perfect correlation between the two.

The model worked hard on hard problems and barely at all on easy ones, and got the same answers as a model

that worked hard on everything.

Why it matters

Computation is the single largest cost in running an AI model, and it is paid every single time the model is used. A method that cuts that cost in half on typical workloads, without giving up accuracy, is not a minor optimization. It is a structural advantage that compounds over the entire life of a deployment.

The whole industry is currently worried about the runaway cost of running AI at scale. Our answer is not to buy more hardware. It is to waste less of it, by spending computation only where it actually does something. We measured that this works at small scale. Proving it holds at the scale where the cost really bites is exactly the next experiment we are working toward.

CITE

Arch Research (2026). Spending Computation Only Where It Is Needed. Arch Research.