POST 001·RESULT·JUNE 2026

More Capability per Parameter, Measured

Arch Research

ABSTRACT

The usual way to make an AI model better is to make it bigger. More parameters, more layers, more compute. It works, but it is expensive, and it has quietly become the only strategy the field bothers with. We think that is a mistake, and we ran an experiment to show why.

We built two models and put them head to head on the same reasoning task. One was ours. The other was a standard fixed-depth model, the conventional design, given more than twice as many parameters and roughly four times as much computation per step. By every assumption the industry runs on, the bigger, heavier model should win.

It did not.

The result

Both models were trained on the same problems and then tested on harder ones they had never seen, which is the honest way to tell reasoning apart from memorization. On those held-out problems, our smaller model generalized better.

Our model (smaller, cheaper)69% correct on held-out problems

Standard model (2.3x bigger)65% correct on held-out problems

The gap widened on the hardest problems. Where the larger model broke down and stopped reasoning, ours kept going, holding nearly three times the accuracy at the most difficult length we tested. And to be confident this was real and not a lucky run, we repeated the whole experiment across multiple random seeds. Our model won every time.

A model with 2.3 times fewer parameters and about 4 times less computation generalized better than the

conventional design. Smaller and cheaper, and still ahead.

Why this matters

This is the heart of what we are building toward. If a smaller, cheaper model can match or beat a larger one, then capability is not only about size. It is about design. And design is something a small team can improve without a billion-dollar budget.

We are careful about what this does and does not show. It is measured on a structured reasoning task at small scale, not on the open-ended messiness of real language, and it is the head-to-head comparison that is confirmed, not a sweeping claim that our approach always scales perfectly. The honest next step is to run the same comparison at larger scale on real data, which is exactly the experiment we are working toward.

But the direction is clear, and it is measured, and anyone technical can check it. The future of AI does not have to be a spending race that only a few companies can run. There is another road, where you get more out of each part of the model instead of just adding more parts. We have taken the first measured steps down it, and a smaller model beating a bigger one is the clearest sign yet that the road is real.

CITE

Arch Research (2026). A Smaller Model That Beats a Bigger One. Arch Research.