ARCH RESEARCH
← RESEARCH
POST 003·NEGATIVE RESULT·JUNE 2026

A Model That Could Talk but Could Not Add

Arch Research
ABSTRACT

Most companies only show you what worked. We think the failures are at least as informative, and hiding them is how a field fools itself. So here is one of ours, in detail.

We trained a model with about 100 million parameters and asked it to do two things: answer questions about itself, and solve grade-school arithmetic. The first part went well. Ask it its name, ask it who made it, ask the same question reworded in ways it had never seen, and it answered correctly and consistently. That part was genuinely encouraging.

Then we asked it to add two numbers.

What went wrong

It produced something that looked like reasoning and was complete nonsense. It would write out the shape of a worked solution, with steps and numbers in the right places, but the actual arithmetic was wrong and the content was invented. It had learned the form of solving a problem without learning to actually solve one.

This is a known and important failure mode, and seeing it up close taught us exactly where the line is. A model can imitate the surface of reasoning long before it can reason. If we had only tested it on problems similar to its training, it might have looked fine. Testing it honestly, on real arithmetic, showed the truth.

It learned what a solution looks like without learning how to produce a correct one. The imitation was good.

The reasoning was absent.

What it taught us

The instinct in AI is to blame the architecture when a model fails. We looked carefully and concluded the opposite. This was not a flaw in the design. It was a data problem. The model simply had not seen anywhere near enough correct examples to learn real computation, where the leading small models in the field are trained on trillions of examples and ours saw a tiny fraction of that.

That distinction matters enormously for what we do next. If the ceiling were the architecture, more data would not help and we would be stuck. Because the ceiling was data, the path forward is clear: the mechanism is sound, and it needs to be fed properly at scale. Knowing which of those two it is, for certain, is worth more than a model that happened to work without us understanding why.

We are sharing this because a research company that only ever reports wins is one you should not trust. The failures are where the real understanding comes from, and we would rather show you ours than pretend we do not have any.

CITE
Arch Research (2026). A Model That Could Talk but Could Not Add. Arch Research.