Arch
Open Interface
Models

Every Arch model is scored on ARS-S — one reasoning scale from 0 to 100, plotted against model size.

How models are scored

Questions are created fresh at the moment of evaluation, never drawn from a fixed list. Nothing the model could have seen in training appears on the test, so a score reflects reasoning, not memorization. We grade only the final answer.

The score spans five tiers of increasing difficulty — recall, arithmetic, symbolic, compositional, and abstract. Harder tiers are weighted more, so a model that only handles the easy ones lands low. The ceiling is set far above current Arch models on purpose.

Score against size