YOLO Object Detection in Python — Core Concepts
YOLO reframed object detection from a multi-stage search into a single regression problem. Instead of proposing candidate regions first and classifying them later (like Faster R-CNN), YOLO divides the image into a grid and predicts bounding boxes and class probabilities for every cell in one forward pass.
How the grid works
The input image is split into an S×S grid (e.g., 13×13). Each grid cell predicts a fixed number of bounding boxes. For each box, the model outputs five values — x-center, y-center, width, height, and an objectness score — plus a probability for every class.
A cell is “responsible” for detecting an object if the object’s center falls inside that cell. This means YOLO can detect up to S×S objects with non-overlapping centers in a single pass.
Anchor boxes
Raw grid cells struggle with objects of very different aspect ratios. YOLO v2 introduced anchor boxes — preset width-height templates derived from clustering the training data. Instead of predicting absolute coordinates, the model predicts offsets relative to these anchors, making learning easier.
YOLOv5 and later versions use k-means clustering on the training set’s bounding boxes to auto-generate anchors tuned to the dataset.
Non-maximum suppression (NMS)
Multiple grid cells may predict overlapping boxes for the same object. NMS removes duplicates by:
- Sorting predictions by confidence.
- Keeping the highest-confidence box.
- Removing any remaining box that overlaps it beyond an IoU threshold (typically 0.45).
- Repeating until no overlapping boxes remain.
YOLO versions at a glance
| Version | Key change | Typical use |
|---|---|---|
| YOLOv3 | Multi-scale detection across 3 feature maps | Baseline for many projects |
| YOLOv5 | PyTorch-native, Ultralytics ecosystem | Most popular starting point |
| YOLOv8 | Anchor-free head, improved small-object accuracy | Current recommended default |
| YOLO11 | Ultralytics latest, optimized export pipeline | Edge and mobile |
Using YOLO in Python
Ultralytics provides a high-level API:
from ultralytics import YOLO
model = YOLO("yolov8n.pt") # nano model, fast
results = model("street.jpg")
for box in results[0].boxes:
print(box.xyxy, box.conf, box.cls)
Training on custom data requires a YAML config pointing to images and labels in YOLO format (one text file per image with class, x-center, y-center, width, height — all normalized 0–1).
Common misconception
People often assume YOLO sacrifices accuracy for speed. Early versions did lag behind two-stage detectors on benchmarks, but YOLOv8 matches or exceeds Faster R-CNN accuracy on COCO while running 5–10× faster.
When to use YOLO
- Real-time video: Surveillance, sports analytics, drone footage.
- Edge devices: Raspberry Pi, Jetson Nano, and mobile phones with exported TFLite or CoreML models.
- Industrial inspection: Detecting defects on production lines at 30+ FPS.
- Retail analytics: Counting products on shelves or people in queues.
Evaluation metrics
mAP (mean Average Precision) is the standard. mAP@0.5 counts a detection as correct if IoU with ground truth exceeds 0.5. mAP@0.5:0.95 averages across IoU thresholds from 0.5 to 0.95 in steps of 0.05 — a much stricter measure.
FPS matters equally for real-time applications. Always report mAP and FPS together; a model with 2% higher mAP but 10× lower FPS may be useless for your use case.
The one thing to remember: YOLO’s power comes from treating detection as a single-pass grid prediction, making it fast enough for live video while maintaining accuracy competitive with slower multi-stage methods.
See Also
- Python Adaptive Learning Systems How Python builds learning apps that adjust to each student like a personal tutor who knows exactly what you need next.
- Python Airflow Learn Airflow as a timetable manager that makes sure data tasks run in the right order every day.
- Python Altair Learn Altair through the idea of drawing charts by describing rules, not by hand-placing every visual element.
- Python Automated Grading How Python grades homework and exams automatically, from simple answer keys to understanding written essays.
- Python Batch Vs Stream Processing Batch processing is like doing laundry once a week; stream processing is like a self-cleaning shirt that cleans itself constantly.