Blog Post #45: Advanced RAG Techniques: Routing and Fusion Retrievers in LlamaIndex

Our RAG systems have been powerful, but so far, they’ve been like a library with only one section. We’ve built a single, monolithic index and queried it. But what happens when your knowledge is spread across multiple, distinct sources? You don’t want to search the “engineering team roster” when a user asks about the “Q3 financial report.”

A single, large index can “muddy the waters,” where specific, structured information gets lost in a sea of general text. To build truly sophisticated RAG applications, we need to handle multiple knowledge sources intelligently.

LlamaIndex provides advanced patterns for this. In this tutorial, we will explore two of the most powerful:

  1. Router Query Engine: A smart “switchboard” that directs a query to the single most relevant knowledge base.
  2. Fusion Retriever: A “search committee” that queries multiple sources at once and intelligently merges the results for a comprehensive answer.

Part 1: Setting Up Our Multiple Knowledge Bases

To demonstrate these concepts, we need at least two distinct data sources. Let’s create a scenario for a fictional tech company.

1. Create the Data

In your project’s data folder, create two files:

data/team_info.txt: (Structured information about employees)

Team Member: Dr. Aris Thorne
Role: Lead AI Scientist
Team: Research & Development (R&D)
Expertise: Large Language Models, Agentic Architectures

Team Member: Priya Singh
Role: Senior Software Engineer
Team: Product Engineering
Project: ChronosAI Calendar App

Team Member: Kenji Tanaka
Role: Head of Product
Team: Product Management
Focus: User experience and market strategy

data/company_news.txt: (Unstructured news and announcements)

September 2025 Press Release:
Our company today announced the launch of ChronosAI, a revolutionary new calendar application powered by our next-generation language model. The project was led by the Product Engineering team and represents a major milestone in our AI-first strategy.

October 2025 Company Update:
Dr. Aris Thorne from our R&D team has published a new paper on multi-agent collaboration, which is now pending peer review. This groundbreaking research will inform the next wave of our product development.

2. Create and Persist Two Separate Indexes

We will now write a script to create two completely separate vector indexes, one for each data source.

First, install dependencies: pip install llama-index llama-index-llms-openai

Create setup_indexes.py:

# setup_indexes.py
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from dotenv import load_dotenv

load_dotenv()

# --- Index for Team Info ---
team_docs = SimpleDirectoryReader(input_files=["data/team_info.txt"]).load_data()
team_index = VectorStoreIndex.from_documents(team_docs)
team_index.storage_context.persist(persist_dir="./storage_team")
print("Team index created and persisted.")

# --- Index for Company News ---
news_docs = SimpleDirectoryReader(input_files=["data/company_news.txt"]).load_data()
news_index = VectorStoreIndex.from_documents(news_docs)
news_index.storage_context.persist(persist_dir="./storage_news")
print("News index created and persisted.")

Run this script once: python setup_indexes.py. You will now have two distinct knowledge bases saved to disk.


Part 2: The Router – The Intelligent Switchboard

A Router is an LLM-powered component that decides which single data source is most appropriate for a given query. It’s like a receptionist at a large company who directs your question to the correct department.

This is perfect when your data sources are distinct and a query is usually about one or the other.

# main_router.py
from llama_index.core import StorageContext, load_index_from_storage
from llama_index.core.tools import QueryEngineTool
from llama_index.core.query_engine import RouterQueryEngine
from llama_index.core.selectors import LLMSelector
from dotenv import load_dotenv

load_dotenv()

# --- Load Indexes ---
team_index = load_index_from_storage(StorageContext.from_defaults(persist_dir="./storage_team"))
news_index = load_index_from_storage(StorageContext.from_defaults(persist_dir="./storage_news"))

# --- Create Query Engines and Tools ---
team_engine = team_index.as_query_engine(verbose=True)
news_engine = news_index.as_query_engine(verbose=True)

team_tool = QueryEngineTool.from_defaults(
    query_engine=team_engine,
    name="team_info",
    description="Useful for questions about team members, their roles, and projects they are on."
)
news_tool = QueryEngineTool.from_defaults(
    query_engine=news_engine,
    name="company_news",
    description="Useful for questions about company news, recent product launches, and official announcements."
)

# --- Create Router Query Engine ---
query_engine = RouterQueryEngine(
    selector=LLMSelector.from_defaults(),
    query_engine_tools=[team_tool, news_tool],
    verbose=True
)

# --- Run Queries ---
if __name__ == "__main__":
    print("--- Query 1: Routing to Team Info ---")
    response1 = query_engine.query("Who is the lead AI scientist?")
    print(str(response1))

    print("\n--- Query 2: Routing to Company News ---")
    response2 = query_engine.query("What was the major product launch in September 2025?")
    print(str(response2))

When you run this, the verbose output will show the RouterQueryEngine first making an LLM call to select the correct tool (team_info for the first query, company_news for the second) before executing the query against that single, targeted index.


Part 3: The Fusion Retriever – The Search Committee

What if a query is broad and could benefit from information from both sources? A Router will only pick one. A Fusion Retriever queries multiple sources in parallel, gathers all the results, and then intelligently “fuses” and re-ranks them to provide the most comprehensive context to the LLM.

  • Analogy: A research committee. You ask a question, and experts from different departments bring their relevant documents. A lead researcher then picks the top 3-5 most relevant pages from the combined pile to give you the best possible answer.
# main_fusion.py
from llama_index.core import StorageContext, load_index_from_storage, VectorStoreIndex
from llama_index.core.retrievers import FusionRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.tools import QueryEngineTool
from dotenv import load_dotenv

load_dotenv()

# --- Load Indexes ---
team_index = load_index_from_storage(StorageContext.from_defaults(persist_dir="./storage_team"))
news_index = load_index_from_storage(StorageContext.from_defaults(persist_dir="./storage_news"))

# --- Create Retrievers ---
team_retriever = team_index.as_retriever()
news_retriever = news_index.as_retriever()

# --- Create Fusion Retriever ---
fusion_retriever = FusionRetriever(
    retrievers=[team_retriever, news_retriever],
    # You can configure how results are fused, e.g., by re-ranking
)

# --- Create Query Engine from the Fusion Retriever ---
query_engine = RetrieverQueryEngine.from_args(fusion_retriever)

# --- Run Query ---
if __name__ == "__main__":
    print("--- Querying with Fusion Retriever ---")
    response = query_engine.query("Tell me about the ChronosAI project team and the recent news.")
    print(str(response))

This query is a perfect case for fusion. The retriever will fetch documents about “Priya Singh” and “ChronosAI” from the team index and documents about the “ChronosAI launch” from the news index. It combines these sources to give the LLM a complete picture, allowing it to synthesize a much richer answer.

When to Use Which?

  • Use a Router when:
    • Your data sources are highly distinct and mutually exclusive (e.g., biology vs. history).
    • Queries are typically about one source OR the other.
    • You want to minimize cost and latency by only querying one index per question.
  • Use a Fusion Retriever when:
    • Your data sources have overlapping or complementary information.
    • Queries are broad and may require synthesizing facts from multiple sources.
    • You want the most comprehensive answer possible, even if it means querying multiple systems.

Conclusion

By moving beyond a single, monolithic index, you can build far more sophisticated, accurate, and efficient RAG systems. Routers allow for precision and efficiency, while Fusion Retrievers provide unparalleled comprehensiveness. Mastering these advanced RAG patterns is the key to building enterprise-grade knowledge bases that can handle the complexity and scale of real-world data.

Author

Debjeet Bhowmik

Experienced Cloud & DevOps Engineer with hands-on experience in AWS, GCP, Terraform, Ansible, ELK, Docker, Git, GitLab, Python, PowerShell, Shell, and theoretical knowledge on Azure, Kubernetes & Jenkins. In my free time, I write blogs on ckdbtech.com

Leave a Comment