A blog post from Percepta AI, published March 11 and bylined by researcher Christos Tzamos, poses a question that cuts to the heart of how AI agents are built: can large language models function as general-purpose computers, executing programs directly inside the transformer's forward pass rather than approximating computation through next-token prediction?
One important caveat up front: the available source material is thin. What exists publicly is a title, a byline, and a publication date. The full methodology, benchmarks, and the basis for the claimed exponential inference speedup have not been released. What follows is an account of what the claim is and why it would matter — not a verdict on whether it's correct.
The theoretical ground the work appears to stand on is well-established. Researchers have shown for years that transformer architectures can, in principle, simulate Turing-complete computation. The interesting question is whether that theoretical property can be made practically useful. If Tzamos and collaborators have found a way to <a href="/news/2026-03-14-percepta-ai-transformers-logarithmic-attention-inference">execute structured programs inside the model</a> rather than alongside it, the payoff would be a hybrid approach that bypasses much of what makes large model inference slow and expensive — the repeated forward passes, the autoregressive decoding, the overhead of routing computation to external tools.
For agent builders, that matters. Multi-step reasoning, tool orchestration, and long-horizon planning all pile on inference costs quickly. A method that offloads structured computation onto internal model representations — rather than spinning up separate inference calls — could meaningfully change the economics of deploying agentic systems at scale. Exponential speedup, if real and reproducible, would be a significant shift.
The honest state of play right now: none of that can be confirmed. Percepta AI is not a widely covered lab, and Tzamos's full research background isn't publicly documented from this post alone. When the paper or technical report surfaces, the exponential speedup claim and the benchmarks supporting it will face scrutiny. Until then, this is a claim worth tracking, not a result worth citing.