Lecture #12 CNN Visualization & Interpretation | Notion

🧱 1. First Layer in CNNs

In most architectures (AlexNet, VGG, ResNet), the first layer is a convolutional layer.
It acts as a filter extractor: learning edges, color blobs, gradients, etc.
These early filters are similar across architectures — simple features.

🌗 2. Grayscale Conversion

Images are often converted to grayscale to simplify inputs.
Pixel values range between 0 and 255, which are often normalized to [0, 1] or standardized to zero-mean.

🧩 3. Nearest Neighbor in Final Layers

Final layers like fully connected (e.g., 4096-d in VGG) are used for similarity matching.
You can use nearest neighbor search in this feature space to find semantically similar images.

📉 4. t-SNE (t-distributed Stochastic Neighbor Embedding)

A non-linear dimensionality reduction technique.
Projects high-dimensional data into 2D or 3D for visualization.
Preserves local structure (points close together in high-dim remain close in 2D).
✅ Great for visualizing clusters (e.g., image classes).
🔁 Difference from PCA:
- PCA: linear, preserves global structure, may distort local distances.
- t-SNE: non-linear, preserves local structure, better for visual insight.