Virtual Labs

Autoencoders for Representation Learning

Procedure

Step 1: Import Required Libraries

Import PyTorch for model building, torchvision for datasets, NumPy for numerical operations, and Matplotlib for visualization.

Step 2: Load Fashion-MNIST Dataset

Load the Fashion-MNIST dataset, which contains 60,000 training images and 10,000 test images. Each image is a 28×28 grayscale image belonging to one of 10 fashion classes. Create data loaders with a batch size of 128 for efficient training and evaluation.

Normalize the pixel values to the range [0, 1]. Since the dataset is already cleanly structured, no special cleaning is required.

Fashion-MNIST Dataset

Fashion-MNIST is a dataset of grayscale images representing 10 different fashion categories. Each image is 28×28 pixels. The 10 classes are:

T-shirt/top
Trouser
Pullover
Dress
Coat
Sandal
Shirt
Sneaker
Bag
Ankle boot

This dataset is commonly used for testing machine learning algorithms because it is more challenging than standard MNIST digits while maintaining the same image format.

Step 3: Create a Noise Addition Function

Define a function that adds Gaussian noise to the input image while keeping the clean image unchanged as the target. Use a noise factor of 0.25 when training the denoising autoencoder.

Step 4: Define the Basic Autoencoder Architecture

Build a fully connected autoencoder with:

Encoder: flatten the 28×28 input into 784 features and compress it through multiple dense layers to a 2-dimensional latent vector
Decoder: expand the 2-dimensional latent vector back to 784 features and reshape it to 28×28
Use ReLU in the hidden layers and Sigmoid in the final output layer

The basic autoencoder is trained on clean images so that it learns to reconstruct the input without noise removal.

Encoder:

Flatten 28×28 input → 784 dimensions
Linear layers: 784 → 512 → 256 → 128 → 64 → 32 → 16 → 8 → 4 → 2 (latent)
Include BatchNorm1d and ReLU after each layer
Add Dropout (0.2, 0.2, 0.1) for regularisation in early layers

Decoder (mirror structure):

Linear layers: 2 → 4 → 8 → 16 → 32 → 64 → 128 → 256 → 512 → 784
BatchNorm1d and ReLU between layers
Sigmoid activation at output to produce [0, 1] pixel values
Reshape to 28×28 image

The 2-dimensional latent space allows for direct visualization of learned representations as scatter plots, where we can observe how the autoencoder organizes different fashion categories in the compressed feature space.

Step 5: Train the Basic Autoencoder

For each epoch:

Pass clean images through the encoder and decoder
Compute MSE loss between the clean input and the reconstruction
Backpropagate gradients and update weights using the AdamW optimiser
Track training and validation loss to monitor convergence

Save the best-performing basic autoencoder model based on the lowest validation loss.

Step 6: Training Configuration

Loss Function: Combined MSE and L1 loss for better detail preservation
Optimiser: AdamW with appropriate learning rate and weight decay for regularisation
Use ReduceLROnPlateau Learning Rate Scheduler to automatically reduce learning rate when training plateaus
Training Duration: Train for sufficient epochs until convergence
Gradient Clipping: Apply gradient clipping for training stability and to prevent exploding gradients
Model Checkpointing: Save the best performing model based on the lowest validation loss

Step 7: Train the Denoising Autoencoder

For each epoch:

Add Gaussian noise to input images
Forward pass-through encoder and decoder
Compute combined loss (MSE + L1) between clean targets and reconstructions
Backpropagate gradients with clipping
Update weights and adjust learning rate
Track and save the best model

Step 8: Visualisation 1 - Basic Reconstruction

Display 8 test samples showing:

Original clean images (row 1)
Noisy input images (row 2)
Reconstructed denoised images (row 3)

This demonstrates the denoising capability side-by-side.

Step 9: Visualisation 2 - Error Maps

Create heat maps showing pixel-wise reconstruction errors for 6 samples. Brighter regions indicate higher errors; darker regions show better reconstruction. This helps identify which image regions are harder to reconstruct.

Step 10: Visualisation 3 - Noise Robustness Test

Test model performance at different noise levels (0.1, 0.25, 0.4, 0.6) on the same image. Display original, noisy input, and reconstruction for each noise level to show how well the model handles varying corruption.

Step 11: Visualisation 4 - Latent Space Projection

Extract 2-D latent representations for all test images. Since the latent dimension is already 2, plot all test samples directly as points in a 2-D scatter plot, with different colours representing the 10 fashion classes.

This visualisation reveals how the autoencoder organises different fashion categories in the learned latent space. Similar items should cluster together, demonstrating that the autoencoder has learned meaningful representations.

Step 12: Quantitative Evaluation

Calculate performance metrics on the test set:

MSE: Mean Squared Error (lower is better)
PSNR: Peak Signal-to-Noise Ratio in dB (higher is better)
SSIM: Structural Similarity Index (higher is better, range 0-1)