Blog Post #43: Query Engines vs. Chat Engines in LlamaIndex

In our last tutorial, we built a powerful knowledge base and used a QueryEngine to ask it questions. It worked flawlessly. You ask, “What is Sodepur known for?” and get a perfect, fact-based answer. But then you try a natural follow-up:

You: “Why is that important?”

Engine: (Confused) “Why is what important?”

Your expert Q&A system has amnesia.

This happens because a QueryEngine is fundamentally stateless. Each .query() call is a brand new, independent event, like a new search in a search engine. It has no memory of what you asked five seconds ago.

To build a truly conversational RAG application, LlamaIndex provides a different, stateful abstraction: the Chat Engine.


The QueryEngine: The Factual Lookup Tool

A QueryEngine is optimized for a single-turn, question-and-answer exchange. It’s the perfect tool for factual lookups.

  • Analogy: A QueryEngine is like a search bar. Each query is a new, self-contained search. The engine doesn’t remember your previous searches.
  • Workflow:
    1. You ask a full, complete question.
    2. The engine retrieves the most relevant context from your knowledge base.
    3. It synthesizes a direct answer.
  • Strengths: Simple, direct, and highly efficient for single-shot factual retrieval.
  • Weaknesses: Has no conversational memory. It cannot handle pronouns (“What about it?”), follow-ups (“Tell me more”), or any question that relies on the context of the previous turn.

The ChatEngine: The Conversational Partner

A ChatEngine is a stateful interface built on top of your index. It is specifically designed to maintain the history of a conversation and use that context to understand and answer follow-up questions.

  • Analogy: A ChatEngine is like talking to a knowledgeable librarian. The librarian remembers what you’ve already discussed, allowing you to have a natural, back-and-forth dialogue.
  • Workflow:
    1. You ask a question (which can be a follow-up).
    2. The engine looks at both the new question and the past conversation history.
    3. It often rewrites your follow-up into a new, more detailed standalone question (e.g., “Why is that important?” becomes “Why is the Shyamsundar Temple important?”).
    4. This rewritten query is then used to retrieve relevant context.
    5. The history, context, and original question are all used to synthesize a conversational answer.
    6. The new exchange is added to the history for the next turn.

The key difference is that the ChatEngine actively uses the conversation history to interpret your intent.


Hands-On: Upgrading to a Chat Engine

Let’s upgrade the project from our last post to be fully conversational. We will use the same persisted index we already built.

The easiest way to get started is with the .as_chat_engine() method on our index object.

# main.py (updated version)
import os
from dotenv import load_dotenv
from llama_index.core import StorageContext, load_index_from_storage

load_dotenv()
PERSIST_DIR = "./storage"

# Load the existing index
storage_context = StorageContext.from_defaults(persist_dir=PERSIST_DIR)
index = load_index_from_storage(storage_context)

# Create a Chat Engine instead of a Query Engine
# 'verbose=True' lets us see what's happening under the hood
chat_engine = index.as_chat_engine(
    chat_mode="context",
    verbose=True
)

print("--- Starting a fresh conversation ---")
response1 = chat_engine.chat("What is Sodepur famous for?")
print("Bot:", response1)

print("\n--- Asking a follow-up question ---")
response2 = chat_engine.chat("Tell me more about that fair you mentioned.")
print("Bot:", response2)

# To start a new conversation, reset the engine
chat_engine.reset()
print("\n--- Conversation Reset ---")

When you run this, pay attention to the verbose=True output for the second question. You will see that the chat engine includes the history of the first question and answer when formulating its response to the follow-up. It understands that “that fair” refers to the Ghoshpara fair mentioned in the first turn.

Creating an Interactive Chat Loop

To make this feel like a real chatbot, we can wrap our chat_engine in a simple conversational loop.

# main_interactive.py
# (Same setup and index loading as above)

chat_engine = index.as_chat_engine()

print("Chatbot is ready. Type 'exit' to end.")
while True:
    user_input = input("You: ")
    if user_input.lower() == "exit":
        break
    
    response = chat_engine.chat(user_input)
    print(f"Bot: {response}")

Now you can have a continuous, stateful conversation with your knowledge base.

You: What is Sodepur known for?

Bot: Sodepur is known for its rich religious and cultural heritage, particularly its connection to the Vaishnava tradition through the Shyamsundar Temple, and the annual Ghoshpara fair.

You: Tell me more about its history.

Bot: The history of Sodepur is ancient, with links to the Sena dynasty. It was a significant hub for riverside trade during the medieval period and played a role in the Bengal Renaissance.

Conclusion: Which Engine Should You Use?

FeatureQueryEngineChatEngine
StateStateless (forgets each turn)Stateful (remembers the conversation)
Use CaseSingle Q&A, Factual Lookup, SearchMulti-turn Conversation, Chatbots
Method.query().chat() or .stream_chat()
AnalogyUsing a Search BarTalking to a Librarian

The rule for choosing is simple:

  • Use a QueryEngine when your application needs to answer discrete, self-contained questions based on your data. Think of it as a function you call to get a fact.
  • Use a ChatEngine when you are building a conversational assistant that needs to remember the dialogue and handle natural follow-up questions.

By providing clean, high-level abstractions for both stateless and stateful interactions, LlamaIndex allows you to build applications that are not just knowledgeable, but also naturally conversational.

Author

Debjeet Bhowmik

Experienced Cloud & DevOps Engineer with hands-on experience in AWS, GCP, Terraform, Ansible, ELK, Docker, Git, GitLab, Python, PowerShell, Shell, and theoretical knowledge on Azure, Kubernetes & Jenkins. In my free time, I write blogs on ckdbtech.com

Leave a Comment