Skip to main content
Mainland China
AIMenta
intermediate · Computer Vision

Image Segmentation

Assigning a class label to every pixel in an image — separating regions like road, building, or pedestrian.

Image segmentation is the computer vision task of partitioning an image into meaningful regions, assigning a label to every pixel or grouping pixels that belong to the same semantic category or object instance. It is a step beyond object detection (which draws bounding boxes) — segmentation provides pixel-level precision about where things are, not just whether they are present.

## The main segmentation tasks

**Semantic segmentation** assigns every pixel in an image to a class label (road, sky, pedestrian, building) without distinguishing between individual instances. All pixels belonging to "car" receive the same label regardless of how many cars are present.

**Instance segmentation** labels each pixel AND separates individual instances of the same class. A crowd scene would label each person as a separate instance rather than all pixels as "person."

**Panoptic segmentation** combines both: it covers every pixel in the image (like semantic segmentation) while distinguishing individual instances of countable objects (like instance segmentation). It is the most complete task and the current state of the art in scene understanding.

## Key models

- **FCN** (Fully Convolutional Network, 2015): early work showing that CNNs without fully-connected layers could produce pixel-wise output.
- **U-Net** (2015): encoder-decoder with skip connections. The dominant architecture in medical image segmentation due to strong performance on small datasets.
- **Mask R-CNN** (Facebook AI, 2017): extends Faster R-CNN with a segmentation head. Long-time standard for instance segmentation.
- **Segment Anything Model (SAM)** (Meta, 2023): foundation model for segmentation. Prompted with points, bounding boxes, or text; segments arbitrary objects without task-specific training. Rapidly adopted for interactive annotation workflows.
- **Vision transformer-based models** (SegFormer, Mask2Former): transformer backbones now dominate the accuracy leaderboard on standard benchmarks (Cityscapes, ADE20K, COCO).

## Applications in APAC industries

Segmentation is the enabling technology for several high-value use cases across AIMenta's target markets:

- **Manufacturing quality control**: pixel-level defect detection on PCBs, semiconductor wafers, battery cells, and consumer goods. The AIMenta Jakarta engagement on vision-based quality grading is a direct application.
- **Healthcare**: tumour boundary delineation in CT and MRI scans; pathology slide analysis; retinal vessel segmentation.
- **Agriculture and agribusiness**: crop health monitoring from satellite or drone imagery; fruit ripeness grading; pest detection.
- **Retail and logistics**: shelf occupancy monitoring; package damage detection; automated sorting by product type.

## Practical considerations for enterprise deployment

- **Data labelling**: pixel-level annotation is expensive — 1 image can take 30-60 minutes to label. Use active learning and semi-supervised techniques to reduce labelling costs.
- **Edge vs cloud**: real-time segmentation (inline quality control at line speed) requires edge inference. Quantised models (INT8) and lightweight architectures (MobileNetV3, EfficientDet) are necessary at 30fps+ throughput.
- **SAM for bootstrapping**: use SAM or its fine-tunable successors for automated pre-annotation, then have human reviewers correct rather than label from scratch. Reduces annotation cost by 50-80% in practice.

Where AIMenta applies this

Service lines where this concept becomes a deliverable for clients.

Beyond this term

Where this concept ships in practice.

Encyclopedia entries name the moving parts. The links below show where AIMenta turns these concepts into engagements — across service pillars, industry verticals, and Asian markets.

Continue with All terms · AI tools · Insights · Case studies