Master YOLOv12: A Step‑by‑Step Guide to Build, Train, and Deploy Custom Models

This tutorial walks readers through the fundamentals of YOLOv12, covering model variants, dataset preparation with Roboflow, optional FlashAttention acceleration, installation, model selection, training commands, post‑training tasks such as tracking, validation, inference, exporting to ONNX, and benchmarking, all with concrete code snippets and practical tips.

Code Mala Tang
Code Mala Tang
Code Mala Tang
Master YOLOv12: A Step‑by‑Step Guide to Build, Train, and Deploy Custom Models

What is YOLOv12?

YOLOv12 (You Only Look Once v12) is a state‑of‑the‑art computer‑vision model that supports six tasks: classification, object detection, instance segmentation, multi‑object tracking, human pose estimation, and oriented bounding boxes (OBB). The model family provides five size variants—Nano (N), Small (S), Medium (M), Large (L) and eXtra‑large (X)—so users can balance accuracy against memory and latency requirements.

YOLOv12 task type diagram
YOLOv12 task type diagram

Classify : Detect objects and assign class labels.

Detect : Locate objects with axis‑aligned bounding boxes.

Segment : Produce pixel‑level masks for each object.

Track : Extend detection to video streams, assigning consistent IDs across frames.

Pose : Estimate human skeletal keypoints.

OBB : Predict rotated bounding boxes for arbitrarily oriented objects.

Creating a Dataset

Collect all images in a single directory and upload the directory to Roboflow. Choose the project type that matches the intended task:

Object Detection → "Object Detection"

Classification → "Classification"

Segmentation → "Instance Segmentation"

Use Roboflow’s annotation tools (bounding‑box selector, polygon drawer, or AI‑assisted helpers) to label the images. After annotation, run the health‑check, create a version, and export the dataset in YOLO format as a zip file. Unzip the archive before proceeding.

Roboflow interface
Roboflow interface

Optional: FlashAttention on NVIDIA GPUs

If an NVIDIA GPU with compute capability ≥ 7.0 is available, compiling FlashAttention can reduce inference latency and accelerate training. The source code and build instructions are provided in the YOLOv12 repository (see URL below). Users without compatible hardware may skip this step.

Repository: https://github.com/sunsmarterjie/yolov12

Exporting and Training

Install the YOLOv12 Python package from source and remove any conflicting ultralytics installation:

pip uninstall ultralytics
pip install git+https://github.com/sunsmarterjie/yolov12.git

Verify the installation by running the CLI command: yolo Select a model size and, for non‑detection tasks, append the appropriate suffix:

Segmentation → -seg Classification → -cls Detection/Tracking → no suffix

The resulting model file follows the pattern yolov12<size><suffix>.yaml/.pt. Example filenames:

yolov12x-seg.yaml
yolov12m-cls.pt

Training with Python

from ultralytics import YOLO

# Replace INSERT_MODEL_NAME with the desired pretrained checkpoint, e.g. "yolov12m.pt"
model = YOLO('INSERT_MODEL_NAME')

# PATH_TO_DATASET points to the directory that contains the YOLO‑format data.yaml file
results = model.train(data='PATH_TO_DATASET', epochs=50, imgsz=640)

Training progress, checkpoints and final weights are saved under the runs directory, inside a sub‑folder named after the task (e.g., detect/train) followed by a numeric run identifier.

Post‑Training Options

Tracking

from ultralytics import YOLO
model = YOLO('PATH_TO_MODEL')
results = model.track('YOUTUBE_VIDEO_URL', show=True)

Model Validation

from ultralytics import YOLO
model = YOLO('PATH_TO_MODEL')
metrics = model.val()
# mAP metrics
print('mAP50‑95:', metrics.box.map)
print('mAP50:', metrics.box.map50)

Inference on New Images

from ultralytics import YOLO
model = YOLO('PATH_TO_MODEL')
results = model(["im1.jpg", "im2.jpg"])
for r in results:
    boxes = r.boxes          # bounding boxes
    masks = r.masks          # segmentation masks (if available)
    keypoints = r.keypoints  # pose keypoints (if available)
    probs = r.probs          # classification probabilities
    obb = r.obb              # oriented bounding boxes
    r.show()                # display result
    r.save(filename="result.jpg")

Export to Other Formats (e.g., ONNX)

from ultralytics import YOLO
model = YOLO('PATH_TO_MODEL')
model.export(format="onnx")

Benchmarking

from ultralytics.utils.benchmarks import benchmark
benchmark(model='PATH_TO_MODEL', data='PATH_TO_DATASET', imgsz=640, half=False)

Reference Repository

All code and model definitions are hosted at https://github.com/sunsmarterjie/yolov12.

computer visionPythonobject detectionFlashAttentionmodel trainingRoboflowYOLOv12
Code Mala Tang
Written by

Code Mala Tang

Read source code together, write articles together, and enjoy spicy hot pot together.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.