How to Use Pydantic for Structured LLM Output

The article explains why LLM responses can be inconsistent, introduces Pydantic as a way to define custom output schemas, and walks through concrete examples—both with OpenAI and Ollama models—showing how to build a LangChain pipeline that parses responses into structured data.

AI Algorithm Path
AI Algorithm Path
AI Algorithm Path
How to Use Pydantic for Structured LLM Output

Introduction

When using large language models (LLMs) for tasks that require structured output, the variability of the generated text becomes a major obstacle; the same prompt can yield different answers, making downstream processing unreliable. To address this, Pydantic offers a solution that lets developers define custom object types for LLM responses.

Pydantic Overview

Pydantic is a Python library for data validation and settings management. By creating a schema (model) that inherits from BaseModel, developers can enforce field types, default values, and validation rules, ensuring that LLM output conforms to a predefined structure.

In the LangChain ecosystem, the PydanticOutputParser builds on Pydantic to provide JSON‑style parsing of LLM output.

Example: Structured Output with OpenAI Key

The following code demonstrates the full workflow using ChatOpenAI as the model.

from typing import List
from langchain.output_parsers import PydanticOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field, validator
from langchain_openai import ChatOpenAI

Instantiate the model: model = ChatOpenAI(temperature=0) Define the expected output schema:

class Output(BaseModel):
    setup: str = Field(description="Give me the list of all the name of the characters")
    punchline: str = Field(description="Give me the list of places if it is available")
    @validator("setup")
    def question_ends_with_question_mark(cls, field):
        if field[-1] != "?":
            raise ValueError("Badly formed question!")
        return field

Set up the parser and prompt template:

parser = PydanticOutputParser(pydantic_object=Output)
prompt = PromptTemplate(
    template="Answer the user query.
{format_instructions}
{query}
",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)
chain = prompt | model | parser
answer = chain.invoke({"query": ""})
print(answer)

Example: Structured Output without OpenAI Key (using Ollama)

When an OpenAI key is not available, the same approach works with a local Ollama model.

from langchain_community.llms import Ollama
from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field
from langchain_core.prompts import PromptTemplate

Define the model name and create the Ollama instance (the model must be installed locally):

model_name = "model_name"
model = Ollama(model=model_name)

Define the output schema:

class Output(BaseModel):
    names: list = Field(description="Give me the list of all the name of the characters")
    places: list = Field(description="Give me the list of places if it is available")

Set up the parser and prompt:

parser = PydanticOutputParser(pydantic_object=Output)
prompt = PromptTemplate(
    template="Answer the user query.
{format_instructions}
{query}
",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)
chain = prompt | model | parser
query = "The Fable of the Fox and the Crow"
result = chain.invoke({"query": query})
print(result.names)
print(result.places)

Key Components Explained

PromptTemplate : Creates a prompt that includes the format instructions generated by the parser and the user query.

LLMChain : Represents the sequence prompt → model → parser . The prompt is filled, sent to the LLM, and the raw text is parsed into the Pydantic model.

Chain Invocation : The invoke method runs the whole pipeline with the provided query.

Conclusion

PydanticOutputParser, a core component of the LangChain toolkit, bridges raw LLM text and structured, JSON‑like data. By defining explicit schemas and integrating them into a LangChain pipeline, developers can reliably extract meaningful information from generative models, turning free‑form text into organized data ready for downstream processing.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PythonLLMLangChainStructured DataOllamaPydanticoutput-parser
AI Algorithm Path
Written by

AI Algorithm Path

A public account focused on deep learning, computer vision, and autonomous driving perception algorithms, covering visual CV, neural networks, pattern recognition, related hardware and software configurations, and open-source projects.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.