CNN Architecture | Machine Learning Guide | Ephizen

What are CNNs?#

Convolutional Neural Networks are specialized for processing grid-like data, especially images.

Key Insight

CNNs learn to see. They detect edges first, then shapes, then objects - just like human vision.

Why CNNs for Images?#

Regular neural networks don't scale for images:

python

# A 224x224 RGB image
pixels = 224 * 224 * 3  # 150,528 input neurons!

# Fully connected to 1000 neurons
parameters = 150528 * 1000  # 150 million parameters!

CNNs exploit image structure to use far fewer parameters.

Core Concepts#

Convolution

Small filters slide across the image, detecting patterns

Pooling

Reduces spatial dimensions, keeps important features

Feature Maps

Each layer produces maps highlighting detected features

Classification

Final layers flatten features and classify

Convolution Operation#

Input Image      Filter (3x3)      Output
┌─────────┐      ┌───────┐        ┌───────┐
│ 1 0 1 0 │      │ 1 0 1 │        │ 4   3 │
│ 0 1 0 1 │  *   │ 0 1 0 │   =    │ 2   4 │
│ 1 0 1 0 │      │ 1 0 1 │        └───────┘
│ 0 1 0 1 │      └───────┘
└─────────┘

# The filter slides across the image, computing dot products

Building a CNN in PyTorch#

python

import torch
import torch.nn as nn
import torch.nn.functional as F

class SimpleCNN(nn.Module):
    def __init__(self, num_classes=10):
        super().__init__()

        # Convolutional layers
        self.conv1 = nn.Conv2d(3, 32, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
        self.conv3 = nn.Conv2d(64, 128, kernel_size=3, padding=1)

        # Pooling
        self.pool = nn.MaxPool2d(2, 2)

        # Fully connected layers
        self.fc1 = nn.Linear(128 * 4 * 4, 512)
        self.fc2 = nn.Linear(512, num_classes)

        # Dropout for regularization
        self.dropout = nn.Dropout(0.5)

    def forward(self, x):
        # Conv block 1: 32x32 -> 16x16
        x = self.pool(F.relu(self.conv1(x)))

        # Conv block 2: 16x16 -> 8x8
        x = self.pool(F.relu(self.conv2(x)))

        # Conv block 3: 8x8 -> 4x4
        x = self.pool(F.relu(self.conv3(x)))

        # Flatten
        x = x.view(-1, 128 * 4 * 4)

        # Fully connected
        x = F.relu(self.fc1(x))
        x = self.dropout(x)
        x = self.fc2(x)

        return x

Layer Types#

Famous CNN Architectures#

Feature	Architecture	Year	Key Innovation
LeNet	1998	Pioneering CNN	7 layers
AlexNet	2012	ReLU, Dropout, GPU	8 layers
VGG	2014	3x3 filters only	16-19 layers
ResNet	2015	Skip connections	50-152 layers
EfficientNet	2019	Compound scaling	Optimal efficiency

Transfer Learning#

Don't train from scratch - use pretrained models:

python

import torchvision.models as models

# Load pretrained ResNet
model = models.resnet50(pretrained=True)

# Freeze all layers
for param in model.parameters():
    param.requires_grad = False

# Replace final layer for your task
model.fc = nn.Linear(2048, num_classes)

# Only train the final layer
optimizer = torch.optim.Adam(model.fc.parameters())

Data Augmentation#

Increase dataset diversity:

python

from torchvision import transforms

train_transform = transforms.Compose([
    transforms.RandomHorizontalFlip(),
    transforms.RandomRotation(10),
    transforms.ColorJitter(brightness=0.2, contrast=0.2),
    transforms.RandomResizedCrop(224, scale=(0.8, 1.0)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                        std=[0.229, 0.224, 0.225])
])

Training Tips#

🚀

Start Pretrained

Use transfer learning unless you have millions of images.

🔄

Augment Data

Flip, rotate, crop. More variety = better generalization.

📊

Use BatchNorm

Stabilizes training, allows higher learning rates.

👁️

Monitor Overfitting

If training acc >> validation acc, add dropout or reduce model size.

What CNNs Learn#

Feature Hierarchy in Deep CNNs

Loading chart...

Key Takeaways#

Remember

CNNs are the foundation of computer vision. Start with pretrained models for most tasks. Understand convolution, pooling, and feature maps. Modern architectures like ResNet and EfficientNet handle most use cases.

Convolution Operation#

Input Image Filter (3x3) Output ┌─────────┐ ┌───────┐ ┌───────┐ │ 1 0 1 0 │ │ 1 0 1 │ │ 4 3 │ │ 0 1 0 1 │ * │ 0 1 0 │ = │ 2 4 │ │ 1 0 1 0 │ │ 1 0 1 │ └───────┘ │ 0 1 0 1 │ └───────┘ └─────────┘ # The filter slides across the image, computing dot products

Building a CNN in PyTorch#

python

import torch
import torch.nn as nn
import torch.nn.functional as F

class SimpleCNN(nn.Module):
    def __init__(self, num_classes=10):
        super().__init__()

        # Convolutional layers
        self.conv1 = nn.Conv2d(3, 32, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
        self.conv3 = nn.Conv2d(64, 128, kernel_size=3, padding=1)

        # Pooling
        self.pool = nn.MaxPool2d(2, 2)

        # Fully connected layers
        self.fc1 = nn.Linear(128 * 4 * 4, 512)
        self.fc2 = nn.Linear(512, num_classes)

        # Dropout for regularization
        self.dropout = nn.Dropout(0.5)

    def forward(self, x):
        # Conv block 1: 32x32 -> 16x16
        x = self.pool(F.relu(self.conv1(x)))

        # Conv block 2: 16x16 -> 8x8
        x = self.pool(F.relu(self.conv2(x)))

        # Conv block 3: 8x8 -> 4x4
        x = self.pool(F.relu(self.conv3(x)))

        # Flatten
        x = x.view(-1, 128 * 4 * 4)

        # Fully connected
        x = F.relu(self.fc1(x))
        x = self.dropout(x)
        x = self.fc2(x)

        return x

Feature

Architecture

Year

Key Innovation

LeNet

1998

Pioneering CNN

7 layers

AlexNet

2012

ReLU, Dropout, GPU

8 layers

VGG

2014

3x3 filters only

16-19 layers

ResNet

2015

Skip connections

50-152 layers

EfficientNet

2019

Compound scaling

Optimal efficiency

Transfer Learning#

Don't train from scratch - use pretrained models:

python

import torchvision.models as models

# Load pretrained ResNet
model = models.resnet50(pretrained=True)

# Freeze all layers
for param in model.parameters():
    param.requires_grad = False

# Replace final layer for your task
model.fc = nn.Linear(2048, num_classes)

# Only train the final layer
optimizer = torch.optim.Adam(model.fc.parameters())

Data Augmentation#

Increase dataset diversity:

python

from torchvision import transforms

train_transform = transforms.Compose([
    transforms.RandomHorizontalFlip(),
    transforms.RandomRotation(10),
    transforms.ColorJitter(brightness=0.2, contrast=0.2),
    transforms.RandomResizedCrop(224, scale=(0.8, 1.0)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                        std=[0.229, 0.224, 0.225])
])

Training Tips#

🚀

Start Pretrained

Use transfer learning unless you have millions of images.

🔄

Augment Data

Flip, rotate, crop. More variety = better generalization.

📊

Use BatchNorm

Stabilizes training, allows higher learning rates.

👁️

Monitor Overfitting

If training acc >> validation acc, add dropout or reduce model size.

What are CNNs?#

Why CNNs for Images?#

Core Concepts#

Convolution

Pooling

Feature Maps

Classification

Convolution Operation#

Building a CNN in PyTorch#

Layer Types#

Famous CNN Architectures#

Transfer Learning#

Data Augmentation#

Training Tips#

Start Pretrained

Augment Data

Use BatchNorm

Monitor Overfitting

What CNNs Learn#

Feature Hierarchy in Deep CNNs

Key Takeaways#

Ready to level up your skills?

What are CNNs?#

Why CNNs for Images?#

Core Concepts#

Convolution

Pooling

Feature Maps

Classification

Convolution Operation#

Building a CNN in PyTorch#

Layer Types#

Famous CNN Architectures#

Transfer Learning#

Data Augmentation#

Training Tips#

Start Pretrained

Augment Data

Use BatchNorm

Monitor Overfitting

What CNNs Learn#

Feature Hierarchy in Deep CNNs

Key Takeaways#

Ready to level up your skills?