🧱 1. First Layer in CNNs
- In most architectures (AlexNet, VGG, ResNet), the first layer is a convolutional layer.
- It acts as a filter extractor: learning edges, color blobs, gradients, etc.
- These early filters are similar across architectures — simple features.
🌗 2. Grayscale Conversion
- Images are often converted to grayscale to simplify inputs.
- Pixel values range between 0 and 255, which are often normalized to [0, 1] or standardized to zero-mean.
🧩 3. Nearest Neighbor in Final Layers
- Final layers like fully connected (e.g., 4096-d in VGG) are used for similarity matching.
- You can use nearest neighbor search in this feature space to find semantically similar images.
📉 4. t-SNE (t-distributed Stochastic Neighbor Embedding)
- A non-linear dimensionality reduction technique.
- Projects high-dimensional data into 2D or 3D for visualization.
- Preserves local structure (points close together in high-dim remain close in 2D).
- ✅ Great for visualizing clusters (e.g., image classes).
- 🔁 Difference from PCA:
- PCA: linear, preserves global structure, may distort local distances.
- t-SNE: non-linear, preserves local structure, better for visual insight.