Fundamentals 13 min read

Getting Started with Pydantic v2: Models, Fields, and Validators

This tutorial walks through every core feature of Pydantic v2 on Python 3.10+, showing how to define models with BaseModel, constrain fields using Field, reuse constraints via Annotated, switch between lax and strict validation modes, write field and model validators, customize serialization, work with nested and recursive models, and generate JSON schemas, all with runnable code examples.

DeepHub IMBA
DeepHub IMBA
DeepHub IMBA
Getting Started with Pydantic v2: Models, Fields, and Validators

Defining Models with BaseModel

Pydantic centers on BaseModel. Subclass BaseModel and declare fields with type annotations. During class creation Pydantic builds a validation schema; each instantiation validates the data against it.

from pydantic import BaseModel

class Address(BaseModel):
    street: str
    city: str
    state: str
    zip_code: str
    country: str = "US"           # optional, default "US"
    apartment: str | None = None   # optional, default None

addr = Address(
    street="123 Main St", city="Springfield", state="IL", zip_code="62704"
)
print(addr)
# street='123 Main St' city='Springfield' state='IL' zip_code='62704' country='US' apartment=None

Fields without defaults are required; those with defaults or annotated as T | None = None are optional.

Using Field() for Metadata and Constraints

Field()

adds metadata, validation constraints, and documentation to a field.

from pydantic import BaseModel, Field

class Product(BaseModel):
    name: str = Field(min_length=1, max_length=200, title="Product Name",
                     description="商品显示名称", examples=["Widget Pro"])
    sku: str = Field(pattern=r"^[A-Z]{2,4}-\d{4,8}$",
                     description="库存单位,格式 'XX-0000'", examples=["WP-12345"])
    price: float = Field(gt=0, le=999_999.99, description="美元价格,必须为正")
    quantity: int = Field(default=0, ge=0, description="库存数量,不可为负")
    category: str = Field(validation_alias="product_category",
                         description="来自目录系统的产品类别")

product = Product(name="Widget Pro", sku="WP-12345", price=29.99,
                quantity=150, product_category="Electronics")
print(product.category)  # Electronics

If a field uses validation_alias, Pydantic only accepts the alias as input. To also accept the original field name, set model_config = ConfigDict(populate_by_name=True).

Reusing Constraints with Annotated

from typing import Annotated
from pydantic import BaseModel, Field

PositiveInt = Annotated[int, Field(gt=0)]
ShortStr = Annotated[str, Field(min_length=1, max_length=100)]

class Widget(BaseModel):
    quantity: PositiveInt
    name: ShortStr

Both the classic Field() style and the Annotated style produce identical validation behavior; Annotated is handy for sharing constraints across models.

Strict vs. Lax Validation Modes

The default is lax mode**, which automatically coerces compatible types (useful for JSON where everything is a string).

event = Event(name="PyCon", attendees="500", event_date="2025-05-15")
# "500" is coerced to int, "2025-05-15" to date

Model‑level strict mode: model_config = ConfigDict(strict=True). Field‑level strict mode: Field(strict=True) or Annotated[int, Strict()]. Use strict mode when the source data is already strongly typed (e.g., internal Python calls or typed database drivers); keep lax mode for JSON or form data.

Validators: field_validator and model_validator

Pydantic provides built‑in Field() constraints, but custom validators are available when needed. @field_validator supports four modes: mode='after' (default): runs after built‑in validation, receives already‑parsed values. mode='before': runs before built‑in validation, receives raw input. mode='wrap': wraps built‑in validation, useful for logging or error translation. mode='plain': completely replaces built‑in validation.

class User(BaseModel):
    username: str = Field(min_length=3, max_length=30)
    email: str

    @field_validator("username", mode="before")
    @classmethod
    def normalize_username(cls, v: object) -> str:
        if not isinstance(v, str):
            raise ValueError("Username must be a string")
        return v.strip().lower()

    @field_validator("email", mode="after")
    @classmethod
    def validate_email_domain(cls, v: str) -> str:
        if "@" not in v:
            raise ValueError("Invalid email: missing '@'")
        return v

When mode='before' is used, the validator runs first, stripping whitespace before the min_length=3 check.

For cross‑field validation, use @model_validator:

class DateRange(BaseModel):
    start: date
    end: date
    label: str | None = None

    @model_validator(mode="after")
    def check_start_before_end(self) -> "DateRange":
        if self.start >= self.end:
            raise ValueError(f"'start' ({self.start}) must be before 'end' ({self.end})")
        return self

In after mode the validator must return self; returning None triggers a ValidationError for non‑optional fields.

Model validators in before mode are class methods that receive the raw data and can reshape it before any field validation:

class Coordinate(BaseModel):
    x: float
    y: float

    @model_validator(mode="before")
    @classmethod
    def accept_tuple(cls, data: object) -> object:
        if isinstance(data, (list, tuple)) and len(data) == 2:
            return {"x": data[0], "y": data[1]}
        return data

print(Coordinate.model_validate((3.0, 4.0)))  # x=3.0 y=4.0

Custom Serialization with @field_serializer

class LogEntry(BaseModel):
    message: str
    timestamp: datetime

    @field_serializer("timestamp")
    def serialize_timestamp(self, v: datetime) -> str:
        if v.tzinfo is None:
            v = v.replace(tzinfo=timezone.utc)
        return v.astimezone(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")

Nested and Recursive Models

Embedding one model inside another creates natural nesting:

class Employee(BaseModel):
    name: str
    title: str
    employee_id: int = Field(gt=0)

class Department(BaseModel):
    name: str
    head: Employee
    members: list[Employee] = []

class Company(BaseModel):
    name: str
    founded: int
    departments: list[Department]

Validation errors pinpoint the exact path, e.g., departments -> 0 -> members -> 0 -> employee_id when an invalid value like "not_a_number" is supplied.

Self‑referencing models require from __future__ import annotations:

class TreeNode(BaseModel):
    value: str
    children: list[TreeNode] = []

Exporting Data: model_dump and model_dump_json

model_dump()

returns a native Python dict. model_dump(mode='json') returns JSON‑compatible values. model_dump_json() returns a JSON string directly, bypassing json.dumps() for speed.

All three accept filtering parameters such as exclude_unset, exclude_none, include, and exclude_defaults.

Input parsing uses model_validate() for dict and model_validate_json() for raw JSON strings, the latter calling the Rust core for higher performance.

Field aliases: alias – works for both input and output. validation_alias – input only. serialization_alias – output only.

Both AliasPath and AliasChoices support nested access and multiple candidate names.

JSON Schema Generation

Item.model_json_schema()

produces a JSON Schema where constraints from Field() (title, description, examples, etc.) are automatically included.

Pydantic Dataclasses and TypeAdapter

Pydantic dataclasses share the same validation capabilities as BaseModel but lack methods like model_dump(). Serialization must be performed via TypeAdapter. TypeAdapter can validate stand‑alone types without defining a model, useful for function arguments, collection validation, or generating JSON Schemas for API types:

int_list_adapter = TypeAdapter(list[int])
int_list_adapter.validate_python(["1", "2", "3"])  # [1, 2, 3]
int_list_adapter.validate_json('[4, 5, 6]')          # [4, 5, 6]

FAQ

field_validator or model_validator? Use @field_validator for single‑field checks (fast and precise). Use @model_validator(mode='after') when you need to access multiple fields together.

BaseModel vs. @dataclass? BaseModel provides full Pydantic features. @dataclass offers familiar syntax but lacks model methods; you need TypeAdapter for serialization.

How to make a field optional with a default? Either field: str = "default" or field: str | None = None.

Validate JSON without a model? Use TypeAdapter(list[int]).validate_json('[1,2,3]').

Validator not running when using an alias? By default only the alias is accepted. Add ConfigDict(populate_by_name=True) to accept the original field name as well.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PythonData ValidationFieldPydanticBaseModelValidators
DeepHub IMBA
Written by

DeepHub IMBA

A must‑follow public account sharing practical AI insights. Follow now. internet + machine learning + big data + architecture = IMBA

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.