Skip to main content
Hong Kong
AIMenta
intermediate · Computer Vision

Object Detection

A computer vision task that locates and classifies multiple objects in an image, returning bounding boxes and class labels for each.

Object detection is the computer vision task of identifying and localising all instances of specified object classes in an image or video frame. The output is typically a set of bounding boxes — each with a class label and a confidence score. Detection combines **classification** (what is this?) with **localisation** (where is it?).

## The detection taxonomy

**Two-stage detectors** first generate region proposals (candidate bounding boxes likely to contain objects), then classify each proposal. Architecturally separate, generally more accurate.
- R-CNN (2014) → Fast R-CNN (2015) → Faster R-CNN (2015, with RPN): the foundational two-stage family.

**Single-stage detectors** predict class labels and bounding boxes in a single forward pass. Faster but historically less accurate on small objects.
- YOLO (You Only Look Once, 2016) and successive generations (YOLOv5, YOLOv8, YOLOv10): the dominant real-time detection family.
- SSD (Single Shot Detector): multi-scale detection from multiple feature map layers.
- RetinaNet: introduced focal loss to address class imbalance in dense detection.

**Transformer-based detectors** apply attention across image patches:
- DETR (2020): end-to-end detection without NMS (non-maximum suppression); reframes detection as a set prediction problem.
- Grounding DINO: open-vocabulary detection from natural language prompts — "find all yellow hard hats in the image."

## Evaluation metrics

**mAP** (mean Average Precision) is the standard detection metric. It averages precision-recall area under curve across all classes and across IoU thresholds (typically 0.5 and 0.5-0.95). Higher mAP means fewer missed objects and fewer false alarms. For latency-sensitive applications, frames per second (FPS) at a given accuracy level is the relevant metric.

## Applications in APAC enterprise contexts

Object detection is a production technology across AIMenta's target sectors:

- **Manufacturing**: real-time defect detection on production lines; presence/absence checks for assembly components; PPE compliance monitoring (hard hat, vest, glove detection).
- **Logistics**: package identification at high-speed conveyor belts; vehicle damage detection at fleet inspection; inventory counting in warehouse aisles via camera.
- **Retail**: shelf stock monitoring (out-of-stock detection); customer flow analysis; loss prevention (abandoned item detection at self-checkout).
- **Construction and facilities**: safety compliance monitoring; progress documentation via periodic aerial imagery.

## Deployment considerations

- **Real-time requirements**: YOLOv8n (nano) runs at 200+ FPS on a consumer GPU at >37 mAP on COCO. For 30fps inline production monitoring, use a quantised YOLOv8 variant on an embedded GPU (NVIDIA Jetson, Hailo-8).
- **Custom classes**: pre-trained models cover the 80 COCO classes (person, car, bottle, etc.). For industrial defect types or specific product SKUs, fine-tune on labelled examples. 200-500 labelled images per class is often sufficient with transfer learning.
- **Video vs image**: tracking algorithms (SORT, ByteTrack, DeepSORT) extend frame-level detection to persistent object IDs across video — required for counting, dwell-time analysis, and trajectory monitoring.

Where AIMenta applies this

Service lines where this concept becomes a deliverable for clients.

Beyond this term

Where this concept ships in practice.

Encyclopedia entries name the moving parts. The links below show where AIMenta turns these concepts into engagements — across service pillars, industry verticals, and Asian markets.

Continue with All terms · AI tools · Insights · Case studies