Artificial Intelligence 8 min read

Master YOLOv12: A Step‑by‑Step Guide to Build, Train, and Deploy Custom Models

This tutorial walks readers through the fundamentals of YOLOv12, covering model variants, dataset preparation with Roboflow, optional FlashAttention acceleration, installation, model selection, training commands, post‑training tasks such as tracking, validation, inference, exporting to ONNX, and benchmarking, all with concrete code snippets and practical tips.

Code Mala Tang

Mar 5, 2026

Master YOLOv12: A Step‑by‑Step Guide to Build, Train, and Deploy Custom Models

What is YOLOv12?

YOLOv12 (You Only Look Once v12) is a state‑of‑the‑art computer‑vision model that supports six tasks: classification, object detection, instance segmentation, multi‑object tracking, human pose estimation, and oriented bounding boxes (OBB). The model family provides five size variants—Nano (N), Small (S), Medium (M), Large (L) and eXtra‑large (X)—so users can balance accuracy against memory and latency requirements.

Classify : Detect objects and assign class labels.

Detect : Locate objects with axis‑aligned bounding boxes.

Segment : Produce pixel‑level masks for each object.

Track : Extend detection to video streams, assigning consistent IDs across frames.

Pose : Estimate human skeletal keypoints.

OBB : Predict rotated bounding boxes for arbitrarily oriented objects.

Creating a Dataset

Collect all images in a single directory and upload the directory to Roboflow. Choose the project type that matches the intended task:

Object Detection → "Object Detection"

Classification → "Classification"

Segmentation → "Instance Segmentation"

Use Roboflow’s annotation tools (bounding‑box selector, polygon drawer, or AI‑assisted helpers) to label the images. After annotation, run the health‑check, create a version, and export the dataset in YOLO format as a zip file. Unzip the archive before proceeding.

Optional: FlashAttention on NVIDIA GPUs

If an NVIDIA GPU with compute capability ≥ 7.0 is available, compiling FlashAttention can reduce inference latency and accelerate training. The source code and build instructions are provided in the YOLOv12 repository (see URL below). Users without compatible hardware may skip this step.

Repository: https://github.com/sunsmarterjie/yolov12

Exporting and Training

Install the YOLOv12 Python package from source and remove any conflicting ultralytics installation:

pip uninstall ultralytics
pip install git+https://github.com/sunsmarterjie/yolov12.git

Verify the installation by running the CLI command: yolo Select a model size and, for non‑detection tasks, append the appropriate suffix:

Segmentation → -seg Classification → -cls Detection/Tracking → no suffix

The resulting model file follows the pattern yolov12<size><suffix>.yaml/.pt. Example filenames:

yolov12x-seg.yaml

yolov12m-cls.pt

Training with Python

from ultralytics import YOLO

# Replace INSERT_MODEL_NAME with the desired pretrained checkpoint, e.g. "yolov12m.pt"
model = YOLO('INSERT_MODEL_NAME')

# PATH_TO_DATASET points to the directory that contains the YOLO‑format data.yaml file
results = model.train(data='PATH_TO_DATASET', epochs=50, imgsz=640)

Training progress, checkpoints and final weights are saved under the runs directory, inside a sub‑folder named after the task (e.g., detect/train) followed by a numeric run identifier.

Post‑Training Options

Tracking

from ultralytics import YOLO
model = YOLO('PATH_TO_MODEL')
results = model.track('YOUTUBE_VIDEO_URL', show=True)

Model Validation

from ultralytics import YOLO
model = YOLO('PATH_TO_MODEL')
metrics = model.val()
# mAP metrics
print('mAP50‑95:', metrics.box.map)
print('mAP50:', metrics.box.map50)

Inference on New Images

from ultralytics import YOLO
model = YOLO('PATH_TO_MODEL')
results = model(["im1.jpg", "im2.jpg"])
for r in results:
    boxes = r.boxes          # bounding boxes
    masks = r.masks          # segmentation masks (if available)
    keypoints = r.keypoints  # pose keypoints (if available)
    probs = r.probs          # classification probabilities
    obb = r.obb              # oriented bounding boxes
    r.show()                # display result
    r.save(filename="result.jpg")

Export to Other Formats (e.g., ONNX)

from ultralytics import YOLO
model = YOLO('PATH_TO_MODEL')
model.export(format="onnx")

Benchmarking

from ultralytics.utils.benchmarks import benchmark
benchmark(model='PATH_TO_MODEL', data='PATH_TO_DATASET', imgsz=640, half=False)

Reference Repository

All code and model definitions are hosted at https://github.com/sunsmarterjie/yolov12.

computer vision Python object detection FlashAttention model training Roboflow YOLOv12

Written by

Code Mala Tang

Read source code together, write articles together, and enjoy spicy hot pot together.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.