YOLO Object Detection in Python — Core Concepts

YOLO reframed object detection from a multi-stage search into a single regression problem. Instead of proposing candidate regions first and classifying them later (like Faster R-CNN), YOLO divides the image into a grid and predicts bounding boxes and class probabilities for every cell in one forward pass.

How the grid works

The input image is split into an S×S grid (e.g., 13×13). Each grid cell predicts a fixed number of bounding boxes. For each box, the model outputs five values — x-center, y-center, width, height, and an objectness score — plus a probability for every class.

A cell is “responsible” for detecting an object if the object’s center falls inside that cell. This means YOLO can detect up to S×S objects with non-overlapping centers in a single pass.

Anchor boxes

Raw grid cells struggle with objects of very different aspect ratios. YOLO v2 introduced anchor boxes — preset width-height templates derived from clustering the training data. Instead of predicting absolute coordinates, the model predicts offsets relative to these anchors, making learning easier.

YOLOv5 and later versions use k-means clustering on the training set’s bounding boxes to auto-generate anchors tuned to the dataset.

Non-maximum suppression (NMS)

Multiple grid cells may predict overlapping boxes for the same object. NMS removes duplicates by:

  1. Sorting predictions by confidence.
  2. Keeping the highest-confidence box.
  3. Removing any remaining box that overlaps it beyond an IoU threshold (typically 0.45).
  4. Repeating until no overlapping boxes remain.

YOLO versions at a glance

VersionKey changeTypical use
YOLOv3Multi-scale detection across 3 feature mapsBaseline for many projects
YOLOv5PyTorch-native, Ultralytics ecosystemMost popular starting point
YOLOv8Anchor-free head, improved small-object accuracyCurrent recommended default
YOLO11Ultralytics latest, optimized export pipelineEdge and mobile

Using YOLO in Python

Ultralytics provides a high-level API:

from ultralytics import YOLO

model = YOLO("yolov8n.pt")  # nano model, fast
results = model("street.jpg")

for box in results[0].boxes:
    print(box.xyxy, box.conf, box.cls)

Training on custom data requires a YAML config pointing to images and labels in YOLO format (one text file per image with class, x-center, y-center, width, height — all normalized 0–1).

Common misconception

People often assume YOLO sacrifices accuracy for speed. Early versions did lag behind two-stage detectors on benchmarks, but YOLOv8 matches or exceeds Faster R-CNN accuracy on COCO while running 5–10× faster.

When to use YOLO

  • Real-time video: Surveillance, sports analytics, drone footage.
  • Edge devices: Raspberry Pi, Jetson Nano, and mobile phones with exported TFLite or CoreML models.
  • Industrial inspection: Detecting defects on production lines at 30+ FPS.
  • Retail analytics: Counting products on shelves or people in queues.

Evaluation metrics

mAP (mean Average Precision) is the standard. mAP@0.5 counts a detection as correct if IoU with ground truth exceeds 0.5. mAP@0.5:0.95 averages across IoU thresholds from 0.5 to 0.95 in steps of 0.05 — a much stricter measure.

FPS matters equally for real-time applications. Always report mAP and FPS together; a model with 2% higher mAP but 10× lower FPS may be useless for your use case.

The one thing to remember: YOLO’s power comes from treating detection as a single-pass grid prediction, making it fast enough for live video while maintaining accuracy competitive with slower multi-stage methods.

pythonyoloobject-detectioncomputer-vision

See Also

  • Python Adaptive Learning Systems How Python builds learning apps that adjust to each student like a personal tutor who knows exactly what you need next.
  • Python Airflow Learn Airflow as a timetable manager that makes sure data tasks run in the right order every day.
  • Python Altair Learn Altair through the idea of drawing charts by describing rules, not by hand-placing every visual element.
  • Python Automated Grading How Python grades homework and exams automatically, from simple answer keys to understanding written essays.
  • Python Batch Vs Stream Processing Batch processing is like doing laundry once a week; stream processing is like a self-cleaning shirt that cleans itself constantly.