How to Use Pydantic for Structured LLM Output
The article explains why LLM responses can be inconsistent, introduces Pydantic as a way to define custom output schemas, and walks through concrete examples—both with OpenAI and Ollama models—showing how to build a LangChain pipeline that parses responses into structured data.
Introduction
When using large language models (LLMs) for tasks that require structured output, the variability of the generated text becomes a major obstacle; the same prompt can yield different answers, making downstream processing unreliable. To address this, Pydantic offers a solution that lets developers define custom object types for LLM responses.
Pydantic Overview
Pydantic is a Python library for data validation and settings management. By creating a schema (model) that inherits from BaseModel, developers can enforce field types, default values, and validation rules, ensuring that LLM output conforms to a predefined structure.
In the LangChain ecosystem, the PydanticOutputParser builds on Pydantic to provide JSON‑style parsing of LLM output.
Example: Structured Output with OpenAI Key
The following code demonstrates the full workflow using ChatOpenAI as the model.
from typing import List
from langchain.output_parsers import PydanticOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field, validator
from langchain_openai import ChatOpenAIInstantiate the model: model = ChatOpenAI(temperature=0) Define the expected output schema:
class Output(BaseModel):
setup: str = Field(description="Give me the list of all the name of the characters")
punchline: str = Field(description="Give me the list of places if it is available")
@validator("setup")
def question_ends_with_question_mark(cls, field):
if field[-1] != "?":
raise ValueError("Badly formed question!")
return fieldSet up the parser and prompt template:
parser = PydanticOutputParser(pydantic_object=Output)
prompt = PromptTemplate(
template="Answer the user query.
{format_instructions}
{query}
",
input_variables=["query"],
partial_variables={"format_instructions": parser.get_format_instructions()},
)
chain = prompt | model | parser
answer = chain.invoke({"query": ""})
print(answer)Example: Structured Output without OpenAI Key (using Ollama)
When an OpenAI key is not available, the same approach works with a local Ollama model.
from langchain_community.llms import Ollama
from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field
from langchain_core.prompts import PromptTemplateDefine the model name and create the Ollama instance (the model must be installed locally):
model_name = "model_name"
model = Ollama(model=model_name)Define the output schema:
class Output(BaseModel):
names: list = Field(description="Give me the list of all the name of the characters")
places: list = Field(description="Give me the list of places if it is available")Set up the parser and prompt:
parser = PydanticOutputParser(pydantic_object=Output)
prompt = PromptTemplate(
template="Answer the user query.
{format_instructions}
{query}
",
input_variables=["query"],
partial_variables={"format_instructions": parser.get_format_instructions()},
)
chain = prompt | model | parser
query = "The Fable of the Fox and the Crow"
result = chain.invoke({"query": query})
print(result.names)
print(result.places)Key Components Explained
PromptTemplate : Creates a prompt that includes the format instructions generated by the parser and the user query.
LLMChain : Represents the sequence prompt → model → parser . The prompt is filled, sent to the LLM, and the raw text is parsed into the Pydantic model.
Chain Invocation : The invoke method runs the whole pipeline with the provided query.
Conclusion
PydanticOutputParser, a core component of the LangChain toolkit, bridges raw LLM text and structured, JSON‑like data. By defining explicit schemas and integrating them into a LangChain pipeline, developers can reliably extract meaningful information from generative models, turning free‑form text into organized data ready for downstream processing.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
AI Algorithm Path
A public account focused on deep learning, computer vision, and autonomous driving perception algorithms, covering visual CV, neural networks, pattern recognition, related hardware and software configurations, and open-source projects.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
