What is PageIndex?
PageIndex can transform lengthy documents into semantic tree structures, ready for reasoning-based RAG.
- Hierarchical Tree Structure
- LLM-friendly "table of contents" for efficient document navigation and comprehension.
- Chunk-Free Segmentation
- Preserves natural document structure without arbitrary chunking.
- Node Summary with Precise Page Referencing
- Provides exact page references and summaries for precise information extraction.
- Designed for Long Documents
- Optimized for financial reports, legal documents, and technical manuals beyond LLM context limits.
PageIndex.json
...
{
"title": "Financial Stability",
"node_id": "0006",
"start_index": 21,
"end_index": 22,
"summary": "The Federal Reserve ..."
"nodes": [
{
"title": "Monitoring Financial Vulnerabilities",
"node_id": "0007",
"start_index": 22,
"end_index": 28,
"summary": "The Federal Reserve's monitoring ..."
},
{
"title": "Domestic and International Cooperation and Coordination",
"node_id": "0008",
"start_index": 28,
"end_index": 31,
"summary": "In 2023, the Federal Reserve collaborated ..."
}
],
},
...
Beyond Semantic Similarity
Reasoning-Based RAG with PageIndex
Vector-based RAG relies on semantic similarity, often returning loosely related but contextually off-target results. They miss document structure and will produce unreliable retrievals in specialized domains.
Reasoning-based RAG with PageIndex uses tree search algorithms that navigate documents like humans do, finding information based on document structure rather than just semantic similarity.

RAG Comparison
PageIndex vs Vector DB
Choose the right RAG technique for your task.
PageIndexLogical Reasoning
High Retrieval Accuracy
Relies on logical reasoning, ideal for domain-specific data where semantics are similar.
Fully Traceable Retrieval Process
Tree search provides a traceable reasoning process, each retrieved node also contains an exact page reference.
Slower Retrieval Due to Tree Search
Tree search is slower, but provides accurate results for complex domain-specific queries.
Efficient Prompt-Level Knowledge Integration
Easily integrates with expert knowledge and user preferences during the tree search process.
Best for Domain-Specific Document Analysis
- Financial reports and SEC filings
- Regulatory and compliance documents
- Healthcare and medical reports
- Legal contracts and case law
- Technical manuals and scientific documentation
Vector DBSemantic Similarity
Low Retrieval Accuracy
Relies on semantic similarity, unreliable for domain-specific data where all content has similar semantics.
Black Box Retrieval without Traceability
Often lacks clear traceability to source documents, difficult to verify information or understand retrieval decisions.
Faster Retrieval Due to Vector Search
Offers faster retrieval speeds, making it efficient for applications where quick responses are critical.
Knowledge Integration Requires Fine-Tuning
Requires fine-tuning embedding models to incorporate new knowledge or preferences.
Best for Generic & Exploratory Applications
- Semantic recommendation systems
- Creative writing and ideation tools
- Short passage retrieval
- Multi-modal retrieval
- Generic knowledge question answering
Case Study
PageIndex Powers Leading Industry Models
PageIndex forms the foundation of Mafin 2.5, a leading RAG model for financial report analysis, achieving 98.7% accuracy on FinanceBench — the highest in the market.
30%
RAG with Vector DB
One vector index for all the documents.
50%
RAG with Vector DB
One vector index for each document.
98.7%
RAG with PageIndex
Query-to-SQL for document-level retrieval, PageIndex for node-level retrieval.
The results of RAG with Vector DB are from the FinanceBench paper.
Ready to integrate Reasoning-based RAG with PageIndex?