PageIndex and the Rise of Agentic RAG: Tree Search for High-Stakes Document Retrieval

As enterprises push retrieval-augmented generation (RAG) into high-stakes workflows, the standard “chunk-and-embed” recipe is running into structural limits. A new open-source framework called PageIndex targets one of the hardest of these: reliably answering questions over very long, highly structured documents where traditional vector search breaks down.

Rather than optimizing chunk sizes, embeddings, or indexes, PageIndex changes the problem formulation entirely. It treats document access as a navigation and reasoning task instead of a pure similarity search problem, and it does so by borrowing a strategy more familiar from game-playing AI than from search engines.

From Chunk-and-Embed to Tree Search: What PageIndex Changes

The classic RAG workflow is by now well understood: split documents into chunks, compute embeddings, store them in a vector database, and at query time retrieve the top-k chunks with the highest semantic similarity to the user’s question. For short or moderately sized documents, this works well for straightforward Q&A.

However, as organizations apply RAG to long annual reports, legal contracts, pharmaceutical protocols, and other dense artifacts, marginal gains from tweaking chunk sizes or similarity thresholds are no longer enough. The problem is less about selecting which paragraph “sounds” similar to a query and more about reasoning through the document’s structure the way a human expert would.

PageIndex abandons chunk-and-embed entirely. Instead of precomputing vectors, it constructs a structural representation of each document—a Global Index—that mirrors human navigation patterns. Sections, subsections, appendices, and other logical units are organized into a tree. At query time, the large language model (LLM) performs a tree search over this structure, explicitly deciding which nodes are relevant to the user’s task.

Mingtian Zhang, co-founder of PageIndex, describes this as bringing an “AlphaGo-style” approach to retrieval. In game-playing AI, tree search is used to explore possible moves and outcomes. In PageIndex, the same principle is applied to document navigation: the model reasons step-by-step about where to look next, given the user’s intent.

This reframes retrieval as an active process. Instead of passively fetching high-similarity chunks, the system behaves like an agent that inspects the table of contents, chooses a promising chapter, drills into a subsection, then follows references or footnotes when needed.

How PageIndex’s “AlphaGo for Documents” Works

To understand what PageIndex is doing under the hood, it’s useful to compare it to how humans read long documents under time pressure. Faced with a dense textbook or 300-page annual report, a human analyst does not read every paragraph linearly. They start with the table of contents, identify relevant chapters, skim headings, then zoom in on specific pages and references. This is, in computer science terms, navigation over a tree-structured representation.

PageIndex formalizes that behavior:

Global Index construction: The framework analyzes the document and builds a tree whose nodes correspond to structural units—chapters, sections, subsections, appendices. This is analogous to an enriched table of contents.
LLM-guided classification: When a user issues a query, the LLM is asked to classify each node (or a subset of nodes during search) as relevant or not, given the full context of the query.
Tree search: The model traverses the tree, pruning irrelevant branches and exploring promising ones, much like a game-playing AI prunes game states that won’t lead to a win.

This approach forces the model to incorporate global structural context into retrieval decisions instead of relying solely on local semantic similarity. It converts a one-shot “find similar text” problem into a multi-step reasoning process: Where should I look next, given what I’m trying to answer?

That shift underlies the “AlphaGo for documents” analogy Zhang draws: the same idea of tree search that powered game AIs is now powering document navigation, but in service of retrieval rather than board games.

Why Semantic Similarity Breaks in Regulated Domains

Traditional RAG systems rely on a core assumption: the most semantically similar text chunks to a query are likely to be the most relevant. In casual or consumer use cases—like finding similar articles or FAQ answers—this is often good enough. But in regulated, high-stakes domains, that assumption frequently fails.

Zhang highlights financial reporting as a clear example. Consider a query about “EBITDA” in a quarterly or annual report. A vector-based retriever will happily return every chunk where “EBITDA” or related phrasing appears. Yet only one or a handful of sections may actually define the precise calculation, adjustments, or reporting scope relevant to that specific question.

The semantic signatures of these mentions can be nearly identical—same acronym, similar surrounding language—so the embedding space has little signal to distinguish a superficial mention from the authoritative definition. The retrieval step, driven purely by similarity, can’t easily tell which occurrence truly addresses the user’s intent.

This highlights what Zhang describes as the “intent vs. content” gap. The user is not merely looking for the occurrence of a term; they are looking for the underlying logic specific to that filing or quarter: how EBITDA is calculated, what adjustments are made, what scope is included or excluded.

A second limitation stems from context truncation. Embedding models typically have strict input-length limits, so production systems often embed only the latest user question, ignoring prior turns in the conversation. That means the retriever sees a short, decontextualized query rather than the fuller history of the problem the user is solving. The retrieval layer is decoupled from the user’s reasoning process.

By contrast, PageIndex’s approach lets the LLM consider richer context when deciding which parts of the document tree to explore. Instead of optimizing for embedding proximity, it can reason: Given everything the user has been asking, which section is logically where such a definition or calculation would live?

Multi-Hop Reasoning and the FinanceBench Results

The payoff from structural, reasoning-based retrieval is most visible in multi-hop queries—questions that require following references, footnotes, or appendices rather than staying within a single contiguous passage.

On a benchmark called FinanceBench, a system built on PageIndex, named Mafin 2.5, reached an accuracy score of 98.7%. While the underlying benchmark details are not exhaustively described in the source, the reported result underscores the gap that can open between tree-search-based retrieval and traditional vector methods in realistic financial analysis tasks.

One example illustrates why. Imagine a user asks for the total value of deferred assets in a Federal Reserve annual report. The main narrative section might describe only the change in deferred assets over a period, not the total. Buried in the text is a cue: a footnote saying, “See Appendix G of this report … for more detailed information.” Appendix G itself could be a table of numbers with no obvious textual similarity to the user’s natural language question.

A similarity-based system frequently stalls here. The language in Appendix G is not semantically close to the question; it may be column headers and numeric rows. With no strong embedding match, the vector database will likely never surface that appendix.

PageIndex’s reasoning-driven retriever, however, can behave more like an analyst. It can:

Recognize that the main text does not contain the total figure but includes a structural reference.
Follow the cited link in the Global Index from the main section to Appendix G.
Inspect that appendix for the relevant numeric value.

By aligning retrieval with document navigation and internal references, rather than raw semantic proximity, PageIndex can resolve these multi-hop dependencies that are common in financial filings and other formal documents. The FinanceBench result, at 98.7% accuracy for Mafin 2.5, signals how large that performance difference can be in practice.

Latency, Streaming, and Infrastructure Implications

For enterprise architects, a natural concern with any LLM-driven retrieval loop is latency. Vector database lookups typically complete in milliseconds. Having an LLM “read” a table of contents and traverse a tree sounds slower and potentially user-visible.

According to Zhang, the way PageIndex integrates retrieval into generation can mitigate that concern. In a classic RAG architecture, retrieval is a blocking pre-step: the system must query the vector store, aggregate results, and only then begin generating an answer. This can delay Time to First Token (TTFT).

In PageIndex, retrieval is part of the model’s ongoing reasoning. The system can start streaming a response immediately while continuing to navigate the document structure in the background. As Zhang puts it, there is no additional “retrieval gate” before the first token. In practice, TTFT can be comparable to a standard LLM call, even though more logic is happening under the hood.

The framework also has implications for data infrastructure:

No mandatory vector database: Since PageIndex does not rely on embeddings for its core retrieval, there is no inherent need for a dedicated vector store. The tree-structured Global Index is light enough to be stored in traditional relational systems such as PostgreSQL.
Incremental structural updates: PageIndex separates structure indexing from text extraction. If a contract is amended or a policy is updated, only the impacted subtree needs re-indexing rather than the entire corpus. This directly addresses a common operational pain point for vector-based RAG systems: keeping embeddings and vector stores synchronized with evolving source documents.

For teams already maintaining complex pipelines around embedding generation, index refresh, and schema evolution, this can simplify the stack. Retrieval logic shifts upwards into the model’s reasoning behavior, while storage can lean on mature relational infrastructure.

When to Use Tree Search vs. Vector Search

Despite its reported accuracy gains, PageIndex’s tree-search approach is not a universal replacement for vector search. Instead, it adds a new option in an emerging decision matrix for retrieval architectures.

Different regimes emerge:

No retrieval needed: For short documents—emails, short memos, brief chat transcripts—modern LLMs can often fit the entire context window directly. In these cases, a retrieval layer may add complexity without improving results.
Similarity-centric tasks: For workloads like “find similar products,” “recommend related content,” or clustering items by theme or “vibe,” vector embeddings remain a strong fit. The goal is proximity in a semantic space, not logical navigation or multi-hop reasoning.
High-stakes, long, structured documents: This is where PageIndex is designed to shine: technical manuals, FDA filings, merger and acquisition agreements, and similar artifacts. The cost of error is high, internal references and appendices matter, and users often need not only the answer but also a clear explanation of how it was derived.

In that last category, auditability becomes a requirement. Systems must be able to show the chain of reasoning—e.g., “The model inspected Section 4.1, followed a reference to Appendix B, and synthesized values from that appendix.” A tree-search approach anchored in the document’s structure naturally supports this kind of explainable traversal.

Designing RAG for Auditability and High-Stakes Use Cases

For architects and ML engineers, PageIndex suggests a design pattern for RAG in regulated or mission-critical environments: treat document retrieval as transparent navigation through a known structure.

Because the Global Index encodes the hierarchy of chapters, sections, and appendices, every retrieval step can be mapped back to a human-readable path. This offers several advantages:

Traceable answers: When an AI system provides a number or interpretation, teams can see which sections were consulted and which references were followed.
Policy alignment: Internal compliance policies often specify which sections of a document or which official sources must be consulted for certain decisions. A tree-based approach makes it easier to verify that the model’s retrieval steps conform to those policies.
Debuggability: When an answer is wrong, teams can diagnose whether the error was in navigation (wrong branch of the tree), interpretation (model misread the right section), or outdated structure (tree index out of sync with the latest version).

In other words, PageIndex is not just about finding the right passage more often; it’s about making the retrieval and reasoning process legible enough that enterprises can trust and govern it.

Agentic RAG and the Shifting Role of Vector Databases

PageIndex also fits into a broader shift sometimes referred to as “Agentic RAG.” As LLMs become more capable at planning and tool use, they can take on more of the responsibility for how to find and assemble information, rather than relegating that entirely to the database layer.

This trend is already visible in code-focused agents such as Claude Code and Cursor, which are moving beyond simple embedding lookups and toward active exploration of a codebase—opening files, following references, and reasoning across project structure. PageIndex applies a similar philosophy to textual documents.

Zhang emphasizes that vector databases still have valid roles, especially for similarity search and recommendation-style workloads. But the assumption that they will be the default storage engine for most LLM and AI applications is increasingly in question. As models handle more of the navigation and reasoning, the underlying index can focus more on faithfully representing structure than on optimizing for nearest-neighbor queries.

For practitioners, the implication is not to abandon vector search, but to expand the design palette. For simple semantic lookups, vector databases remain powerful tools. For long, structured, high-stakes documents, approaches like PageIndex’s tree search and Agentic RAG may offer better accuracy, clearer reasoning paths, and easier compliance.

As the ecosystem matures, the central architectural question will shift from “Which vector database?” to “Which retrieval behavior matches this workload’s risk profile, document structure, and audit requirements?” PageIndex positions tree-search-driven, agentic retrieval as a compelling answer for the most demanding slice of that spectrum.

Cary Huang

Hi, I’m Cary Huang — a tech enthusiast based in Canada. I’ve spent years working with complex production systems and open-source software. Through TechBuddies.io, my team and I share practical engineering insights, curate relevant tech news, and recommend useful tools and products to help developers learn and work more effectively.