Lecture #11 Detection and Segmentation | Notion

🟢 1. Semantic Segmentation

Semantic Segmentation aims to predict a category for each pixel in an image.
Example: An image with 2 dogs, 1 cat, and a wolf → output a pixel-wise mask with class labels.
Common in autonomous driving, medical imaging, and scene understanding.

🪟 2. Sliding Window in Segmentation

Idea: Slide a window across the image → classify the central pixel.
✅ Works in theory, but:
- ❌ Super computationally expensive.
- Many redundant computations.
- Doesn’t scale well to high-res images.

🔄 3. Fully Convolutional Networks (FCN)

FCNs solve this by removing fully connected layers.
They apply convolution and upsampling across the entire image.
✅ Output: Dense pixel-wise predictions.
Faster and more efficient than sliding windows.

⬆️ 4. Upsampling & Unpooling

✅ Upsampling

Increases the spatial resolution of feature maps.
Examples: Displaying low-res images on high-res screens.