RNSR claims a perfect FinanceBench score — and it never chunks a single document

Building enterprise RAG systems that don't hallucinate has been the field's defining frustration for two years. An open-source project called RNSR — Recursive Neural-Symbolic Retriever — is claiming it just solved that problem, at least for financial documents: 100% accuracy and zero hallucinations on FinanceBench, the standard benchmark for financial document question-answering. GPT-4 RAG scores around 60% on the same tests. Claude RAG, around 65%.

Those are extraordinary claims. The architecture behind them is a deliberate rejection of chunking.

Most RAG systems slice documents into flat text segments and retrieve the nearest match by vector similarity. RNSR doesn't chunk at all. It reads font hierarchy — via something the developer calls a Font Histogram Algorithm — to reconstruct the actual structure of a document: sections, subsections, tables, the whole skeleton. From there it deploys what it terms Recursive Language Models, which generate navigation code to traverse that structure directly rather than running an approximate similarity search. Less like rummaging through a pile of pages, more like reading a table of contents and knowing exactly which drawer to open.

Layered on top are Knowledge Graph extraction for grounding entities and relationships, Tree-of-Thoughts reasoning for multi-step inference, and a SQLite backend storing everything atomically. Provenance matters here: every answer traces back to source text, and when the information simply isn't present, the system returns nothing instead of inventing something plausible.

The head-to-head numbers on financial and legal documents are striking. RNSR scores 100% on both relevance and correctness; Naive RAG scores 75% correctness and hallucinates 50% of the time; Long Context LLM approaches also land at 75%. Timeline extraction and contradiction detection across legal and project documents return perfect recall.

The caveats are real. On broader academic benchmarks — TAT-QA, QASPER, DocVQA — RNSR scored 67%. The developer attributes this to small sample sizes and formatting inconsistencies, noting that extractive sub-types within those same tests hit 100%. That's plausible, but it doesn't close the question.

More importantly, the FinanceBench numbers are self-reported. A perfect score on a well-regarded benchmark, without independent replication, has a way of looking different six months later. That's the bar this project still needs to clear before the field takes the headline claims at face value.

For now, RNSR is worth close attention for anyone building in document-heavy verticals — legal, finance, compliance. It supports OpenAI, Anthropic, and Google Gemini as backend providers, so it drops in as a retrieval layer rather than a platform commitment. And the broader architectural argument it makes — that structure navigation beats chunk retrieval for complex documents — is the direction serious enterprise RAG work has been quietly trending toward regardless. Whether RNSR turns out to be the project that proves that case, or just an early signal that the case can be made, probably comes down to what independent benchmarking finds.