Extract Structured Vehicle Data from Images with Pydantic and GPT‑4 Vision

This tutorial shows how to build a LangChain pipeline that uses GPT‑4 Vision to read vehicle images, defines a Pydantic model for type, license, make, model and color, and returns the results as structured JSON for both single and batch inference.

AI Algorithm Path
AI Algorithm Path
AI Algorithm Path
Extract Structured Vehicle Data from Images with Pydantic and GPT‑4 Vision

Problem

Goal: extract vehicle type, license plate, make, model, and color from checkpoint camera images. Traditional computer‑vision methods struggle with pattern variations, and supervised deep‑learning pipelines require multiple specialized models and large labeled datasets. Multimodal large language models such as GPT‑4 Vision can perform zero‑shot extraction, but the raw output must be converted to a structured format.

Tech Stack

GPT‑4 Vision – OpenAI multimodal model accessed via the OpenAI API.

LangChain – Framework for chaining image loading, prompt creation, model invocation, and output parsing.

Pydantic – Python library used to define the structured output schema.

Dataset

Vehicle images are taken from the Kaggle "Car Number Plate" dataset (Apache‑2.0). Link: https://www.kaggle.com/datasets/alihassanml/car-number-plate

Pipeline Components

Image loading

def image_encoding(inputs):
    """Load and convert image to base64 encoding"""
    with open(inputs["image_path"], "rb") as image_file:
        image_base64 = base64.b64encode(image_file.read()).decode("utf-8")
    return {"image": image_base64}

Output schema

class Vehicle(BaseModel):
    Type: str = Field(..., examples=["Car","Truck","Motorcycle","Bus"], description="Return the type of the vehicle.")
    License: str = Field(..., description="Return the license plate number of the vehicle.")
    Make: str = Field(..., examples=["Toyota","Honda","Ford","Suzuki"], description="Return the Make of the vehicle.")
    Model: str = Field(..., examples=["Corolla","Civic","F-150"], description="Return the Model of the vehicle.")
    Color: str = Field(..., example=["Red","Blue","Black","White"], description="Return the color of the vehicle.")

Parser

parser = JsonOutputParser(pydantic_object=Vehicle)
instructions = parser.get_format_instructions()

Prompt generation

@chain
def prompt(inputs):
    """Create the prompt"""
    prompt = [
        SystemMessage(content="You are an AI assistant whose job is to inspect an image and provide the desired information from the image. If the desired field is not clear or not well detected, return none for this field. Do not try to guess."),
        HumanMessage(content=[
            {"type": "text", "text": "Examine the main vehicle type, make, model, license plate number and color."},
            {"type": "text", "text": instructions},
            {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{inputs['image']}", "detail": "low"}}
        ])
    ]
    return prompt

MLLM component

model = ChatOpenAI(model="gpt-4-vision-preview", temperature=0, max_tokens=1024)

Pipeline assembly

All components are linked with the | operator to build a LangChain pipeline.

Single‑image inference

output = pipeline.invoke({"image_path": img_path})
json_output = json.dumps(output)

Batch inference

# Prepare a list of dictionaries with image paths
batch_input = [{"image_path": path} for path in image_paths]
# Perform batch inference
output = pipeline.batch(batch_input)
df = pd.DataFrame(output)

Observations

GPT‑4 Vision reliably returns values for vehicle type, license plate, make, model, and color; fields that cannot be determined are returned as None. The pipeline is modular and can be swapped for other multimodal LLMs. Users should remain aware of potential hallucinations inherent to large language models.

PythonImage ProcessingLangChainmultimodal LLMPydanticGPT-4 VisionStructured Data Extraction
AI Algorithm Path
Written by

AI Algorithm Path

A public account focused on deep learning, computer vision, and autonomous driving perception algorithms, covering visual CV, neural networks, pattern recognition, related hardware and software configurations, and open-source projects.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.