Blog Post #32: Advanced Memory Techniques: Summarization, Knowledge Graphs, and Windowed Buffers

In our last post, we cured our agent’s amnesia with ConversationBufferMemory. This was a huge step forward, but it introduced a new problem. Our agent’s memory is now too perfect. It’s like creating a full, unedited transcript of a meeting. For a short meeting, this is fine. For a three-hour meeting, it becomes an unmanageable, token-guzzling wall of text.

The LLM’s context window is finite. As a conversation with a simple buffer memory grows, it will inevitably exceed the token limit, leading to high API costs and, eventually, application-breaking errors.

The goal of advanced memory is to distill the conversation, not just store it. Let’s explore three powerful strategies LangChain provides to manage long conversations efficiently.


1. The Simple Fix: ConversationBufferWindowMemory

This is the most straightforward solution to the token limit problem. Instead of keeping the entire conversation, it only keeps a “window” of the last k number of turns.

  • Concept: A sliding window that remembers the most recent interactions.
  • Analogy: A person who has a great short-term memory but can only recall the last five minutes of a conversation.
  • How it Works: As a new message turn (human + AI response) comes in, the oldest turn is dropped from the buffer if the window size (k) is exceeded.
  • Pros: Very simple to implement, offers predictable and fixed token usage. Excellent for applications where the most recent context is the most important.
  • Cons: Abruptly and completely forgets potentially crucial information from early in the conversation.

Implementation

The change from our previous post is minimal. You just import ConversationBufferWindowMemory and set the value for k.

from langchain.memory import ConversationBufferWindowMemory

# k=2 means the memory will store the last TWO full conversation turns.
memory = ConversationBufferWindowMemory(
    k=2,
    memory_key="chat_history",
    return_messages=True
)

# The rest of your conversational loop and AgentExecutor remains exactly the same!

2. The Efficient Condenser: ConversationSummaryMemory

What if the gist of the entire conversation is important, not just the last few turns? This is where summarization memory comes in. It uses an LLM to condense the conversation into a running summary.

  • Concept: As the conversation grows, it gets progressively summarized, keeping the core ideas without the lengthy transcript.
  • Analogy: Instead of keeping the full meeting transcript, the agent maintains the running “meeting minutes.”
  • Pros: Can handle very long conversations while keeping token usage extremely low. Preserves the overall context and key points of the interaction.
  • Cons:
    • Cost & Latency: Requires additional LLM calls to perform the summarization, which adds to your API costs and slows down the response time.
    • Potential for Information Loss: The summarizer LLM might omit a specific detail (like a name or number) that it deems unimportant at the time, but which becomes critical later on.

Implementation

This memory type requires an LLM to perform the summarization tasks.

from langchain.memory import ConversationSummaryMemory
from langchain_openai import ChatOpenAI

# This memory needs an LLM to create the summaries
llm_for_summary = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)

memory = ConversationSummaryMemory(
    llm=llm_for_summary,
    memory_key="chat_history",
    return_messages=True
)

# The rest of your conversational loop remains the same.

3. The Structured Thinker: ConversationKGMemory (Knowledge Graph)

This is the most sophisticated and “intelligent” form of conversational memory. Instead of storing a transcript or a summary, it uses an LLM to extract key entities (people, places, things) and their relationships and stores them in a structured knowledge graph.

  • Concept: Creates a structured database of facts mentioned in the conversation.
  • Analogy: An intelligence analyst who isn’t just taking notes, but is actively building a relationship map on a whiteboard, connecting people, places, and key facts.
  • Pros: Creates a robust, long-term memory of concrete facts. It excels at remembering “who is who” and “what is what” (e.g., (Rohan, is a, friend of Priya)). The extracted knowledge can be used to answer questions with high precision.
  • Cons: Computationally the most expensive, as it requires a very capable LLM to accurately extract entities and relationships. It’s better at remembering facts than capturing the subtle, emotional nuance of a conversation.

Implementation

Like the summary memory, this also requires an LLM.

from langchain.memory import ConversationKGMemory
from langchain_openai import ChatOpenAI

# A powerful LLM is recommended for accurate knowledge extraction
llm_for_kg = ChatOpenAI(model="gpt-4o", temperature=0)

memory = ConversationKGMemory(
    llm=llm_for_kg,
    memory_key="chat_history",
    return_messages=True
)
# The conversational loop remains the same.

With this memory, if you say “My friend Rohan lives in Khardaha,” the agent adds (Rohan, lives in, Khardaha) to its knowledge graph. Later, if you ask, “Where does Rohan live?” the memory will inject this fact into the prompt.


Summary: Which Memory Should You Choose?

The choice of memory is a trade-off between conversational detail, token cost, and latency.

Memory TypeCore IdeaSolves Token Limit By…Best For…Key Weakness
BufferWindowMemoryKeep last K turnsDropping old messagesQuick, reactive chats where recent context is key.Forgets important old info.
SummaryMemoryCondense the historySummarizing messagesLong, general-purpose conversations.Loses specific details.
KGMemoryExtract facts/entitiesStructuring knowledgeRemembering facts about people, places, and things.Misses conversational nuance.

A Simple Decision Guide:

  • Start with ConversationBufferMemory. If your conversations are short, it’s perfect.
  • If you hit token limits, ask: “Is only the most recent context important?” If yes, switch to ConversationBufferWindowMemory.
  • If the whole conversation’s “gist” is important, use ConversationSummaryMemory.
  • If the agent must remember specific facts about entities over time, use ConversationKGMemory.

Conclusion

There is no single “best” memory for every agent. The most advanced systems often use a hybrid approach—a windowed buffer for perfect short-term recall, combined with a knowledge graph or vector store for long-term factual memory.

By moving beyond simple buffers and choosing the right memory strategy for your use case, you can build agents that are not only more capable but also more efficient, scalable, and intelligent over time.

Author

Debjeet Bhowmik

Experienced Cloud & DevOps Engineer with hands-on experience in AWS, GCP, Terraform, Ansible, ELK, Docker, Git, GitLab, Python, PowerShell, Shell, and theoretical knowledge on Azure, Kubernetes & Jenkins. In my free time, I write blogs on ckdbtech.com

Leave a Comment