Percepta published a technical blog post on March 11 claiming that large language models can function as general-purpose computers — not as a theoretical curiosity, but by executing programs directly inside the transformer's forward pass during inference. The stated result is 30,000 tokens per second. When the post surfaced on Hacker News, the community's working title was "30k Tok/S (Allegedly)." That parenthetical did the editorial work the post itself avoided.
The author is Christos Tzamos, a theoretical computer scientist whose background runs through algorithmic game theory and computational complexity. His credentials are not trivial — this is not a startup engineer gesturing at computer science. The question of whether transformer architectures are Turing-complete in any exploitable sense has drawn serious theoretical attention, and Tzamos is the kind of person who would know where the literature actually stands. What he hasn't provided is a paper, reproducible code, or any independent benchmark. The blog post's claims are stated rather than demonstrated.
The 30,000 tokens per second figure is the headline because it should be startling. Production inference on large models lands between a few hundred and a few thousand tokens per second on current hardware; 30k isn't a marginal improvement over the state of the art, it's a different category of result entirely. That's not a reason to dismiss it — genuine step-changes do happen — but the absence of methodology makes it impossible to determine whether the number is robust or the product of specific conditions that wouldn't survive generalization.
For the agent ecosystem, faster inference is not a nice-to-have. Long reasoning chains are slow and expensive; tool-call loops compound that cost at every step. If Percepta's approach works as described, the economics of agentic AI shift substantially. That's a significant conditional. Right now, the work isn't shown.