Object detection is the computer vision task of identifying and localising all instances of specified object classes in an image or video frame. The output is typically a set of bounding boxes — each with a class label and a confidence score. Detection combines **classification** (what is this?) with **localisation** (where is it?).
## The detection taxonomy
**Two-stage detectors** first generate region proposals (candidate bounding boxes likely to contain objects), then classify each proposal. Architecturally separate, generally more accurate.
- R-CNN (2014) → Fast R-CNN (2015) → Faster R-CNN (2015, with RPN): the foundational two-stage family.
**Single-stage detectors** predict class labels and bounding boxes in a single forward pass. Faster but historically less accurate on small objects.
- YOLO (You Only Look Once, 2016) and successive generations (YOLOv5, YOLOv8, YOLOv10): the dominant real-time detection family.
- SSD (Single Shot Detector): multi-scale detection from multiple feature map layers.
- RetinaNet: introduced focal loss to address class imbalance in dense detection.
**Transformer-based detectors** apply attention across image patches:
- DETR (2020): end-to-end detection without NMS (non-maximum suppression); reframes detection as a set prediction problem.
- Grounding DINO: open-vocabulary detection from natural language prompts — "find all yellow hard hats in the image."
## Evaluation metrics
**mAP** (mean Average Precision) is the standard detection metric. It averages precision-recall area under curve across all classes and across IoU thresholds (typically 0.5 and 0.5-0.95). Higher mAP means fewer missed objects and fewer false alarms. For latency-sensitive applications, frames per second (FPS) at a given accuracy level is the relevant metric.
## Applications in APAC enterprise contexts
Object detection is a production technology across AIMenta's target sectors:
- **Manufacturing**: real-time defect detection on production lines; presence/absence checks for assembly components; PPE compliance monitoring (hard hat, vest, glove detection).
- **Logistics**: package identification at high-speed conveyor belts; vehicle damage detection at fleet inspection; inventory counting in warehouse aisles via camera.
- **Retail**: shelf stock monitoring (out-of-stock detection); customer flow analysis; loss prevention (abandoned item detection at self-checkout).
- **Construction and facilities**: safety compliance monitoring; progress documentation via periodic aerial imagery.
## Deployment considerations
- **Real-time requirements**: YOLOv8n (nano) runs at 200+ FPS on a consumer GPU at >37 mAP on COCO. For 30fps inline production monitoring, use a quantised YOLOv8 variant on an embedded GPU (NVIDIA Jetson, Hailo-8).
- **Custom classes**: pre-trained models cover the 80 COCO classes (person, car, bottle, etc.). For industrial defect types or specific product SKUs, fine-tune on labelled examples. 200-500 labelled images per class is often sufficient with transfer learning.
- **Video vs image**: tracking algorithms (SORT, ByteTrack, DeepSORT) extend frame-level detection to persistent object IDs across video — required for counting, dwell-time analysis, and trajectory monitoring.
Where AIMenta applies this
Service lines where this concept becomes a deliverable for clients.
Beyond this term
Where this concept ships in practice.
Encyclopedia entries name the moving parts. The links below show where AIMenta turns these concepts into engagements — across service pillars, industry verticals, and Asian markets.
Other service pillars
By industry