Blog Post #28: LangChain Primitives: A Guide to Built-in Output Parsers

Table of Contents

We’ve learned how to craft perfect prompts and pipe them into powerful chat models. But the model’s raw output is still just a ChatMessage object—essentially a block of text with a “role.” To build applications that can take programmatic action, we need to convert that text into a reliable, structured format like a JSON object or a Python class.

This is the critical job of Output Parsers. They are the final, essential link in our chain, translating the LLM’s linguistic output into the logical, structured data our code can work with.

This tutorial provides hands-on examples of LangChain’s most important built-in parsers, moving from simple strings to complex, type-safe objects.

The Baseline: `StrOutputParser`

This is the simplest parser and the one we’ve been using so far. Its only job is to take the AIMessage from a chat model and extract its string content.

Use Case: Perfect for simple Q&A, summarization, or any task where the final desired output is a block of natural language.

# (Assuming .env setup and llm = ChatOpenAI(model="gpt-4o"))
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

prompt = ChatPromptTemplate.from_template(
    "Tell me a short joke about the city of {city}."
)

# Chain: Prompt -> LLM -> String Output
chain = prompt | llm | StrOutputParser()

response = chain.invoke({"city": "Kolkata"})
print(type(response))
print(response)

Output:

<class 'str'>
Why did the bookworm move to Kolkata? Because he heard it was the city of "joy" and he wanted to be well-read!

It’s simple and effective for text, but for agents that need to make decisions, we need more structure.

Getting Structured Data: `JsonOutputParser`

This is our first step into truly structured output. This parser instructs the LLM to respond with a JSON-formatted string and then safely loads that string into a Python dictionary.

Use Case: When you need to extract multiple, distinct pieces of information from a block of text.

from langchain_core.output_parsers import JsonOutputParser

json_prompt = ChatPromptTemplate.from_template(
    "Extract key information from the following sentence. "
    "Format your response as a JSON object with 'name', 'age', and 'profession' keys.\n\n"
    "Sentence: {sentence}"
)

# The parser automatically adds instructions to the prompt to format the output as JSON.
json_parser = JsonOutputParser()

json_chain = json_prompt | llm | json_parser

sentence = "Priya, who is 32 years old, works as a data scientist in Kolkata."
response = json_chain.invoke({"sentence": sentence})

print(type(response))
print(response)
print(f"The person's name is {response['name']}.")

Output:

<class 'dict'>
{'name': 'Priya', 'age': 32, 'profession': 'data scientist'}
The person's name is Priya.

Now, instead of a single string, we have a dictionary that our program can easily work with.

The Gold Standard: `PydanticOutputParser`

This is the most powerful and recommended method for any complex data structure. It uses the excellent Pydantic library to define your desired output schema as a Python class.

Use Case: Any time you need a reliable, type-safe, and well-documented data structure. This is the preferred method for defining tool inputs and agent responses.

Step 1: Define your data structure with Pydantic.

from pydantic import BaseModel, Field
from typing import List

# Define the data structure we want the LLM to return
class LocalGuide(BaseModel):
    city: str = Field(description="The city being described.")
    famous_for: List[str] = Field(description="A list of 3-5 things the city is famous for.")
    # The Field description is crucial - it's passed to the LLM!
    best_time_to_visit: str = Field(description="The recommended season or months to visit.")

Step 2: Create the parser and the chain.

from langchain_core.output_parsers import PydanticOutputParser

# 1. Create a PydanticOutputParser instance with our class
pydantic_parser = PydanticOutputParser(pydantic_object=LocalGuide)

# 2. Get the formatting instructions that we'll inject into our prompt
format_instructions = pydantic_parser.get_format_instructions()

pydantic_prompt = ChatPromptTemplate.from_template(
    "You are an expert local guide. Answer the user's query.\n"
    "{format_instructions}\n" # This is where the parser's instructions go
    "Query: {query}"
)

pydantic_chain = pydantic_prompt | llm | pydantic_parser

# 3. Invoke the chain
query = "Tell me about visiting Khardaha, West Bengal."
response = pydantic_chain.invoke({
    "query": query,
    "format_instructions": format_instructions
})

print(type(response))
print(response)
print(f"\nThe best time to visit {response.city} is {response.best_time_to_visit}.")

Output:

<class '__main__.LocalGuide'>
city='Khardaha, West Bengal' famous_for=['Ghoshpara Utsab', 'Temples like Shyamsundar Mandir', 'Historical significance in the Bengal Renaissance', 'Proximity to the Hooghly River'] best_time_to_visit='October to March, during the cooler winter months.'

The best time to visit Khardaha, West Bengal is October to March, during the cooler winter months.

The result is not just a dictionary; it’s a true instance of our LocalGuide class. Your IDE will provide autocompletion (response.city), and the data types are validated automatically. This is the key to building reliable, maintainable agentic systems.

Bonus: `CsvOutputParser`

To show the versatility, LangChain also includes parsers for other formats, like CSV.

from langchain_core.output_parsers import CommaSeparatedListOutputParser

csv_parser = CommaSeparatedListOutputParser()
format_instructions = csv_parser.get_format_instructions()

csv_prompt = ChatPromptTemplate.from_template(
    "List 5 famous Bengali sweet dishes.\n{format_instructions}"
)

csv_chain = csv_prompt | llm | csv_parser
response = csv_chain.invoke({}) # No input variable needed

print(type(response))
print(response)

Output:

<class 'list'>
['Rosogolla', 'Mishti Doi', 'Sandesh', 'Shor Bhaja', 'Ledikeni']

Conclusion

Stop thinking of LLMs as just text generators. Start treating them as structured data generators that you can program with prompts and parsers.

StrOutputParser for simple text.
JsonOutputParser for dictionaries and lists.
PydanticOutputParser for robust, type-safe, and self-documenting objects.

For a ReAct agent to function, it must receive a predictable output from the LLM that specifies the next tool and its parameters. Output parsers are the mechanism that makes this possible. You have now mastered the final primitive needed to build a complete agent.

Author

Debjeet Bhowmik

Experienced Cloud & DevOps Engineer with hands-on experience in AWS, GCP, Terraform, Ansible, ELK, Docker, Git, GitLab, Python, PowerShell, Shell, and theoretical knowledge on Azure, Kubernetes & Jenkins. In my free time, I write blogs on ckdbtech.com

The Baseline: StrOutputParser

Getting Structured Data: JsonOutputParser

The Gold Standard: PydanticOutputParser

Bonus: CsvOutputParser

Conclusion

Author

Leave a Comment Cancel reply

The Baseline: `StrOutputParser`

Getting Structured Data: `JsonOutputParser`

The Gold Standard: `PydanticOutputParser`

Bonus: `CsvOutputParser`