RAS598 · Fall 2024 · Arizona State University

Fashion MNIST
Classification with
Deep Learning

A comparative study of CNN, ResNet, and traditional ML models on 70,000 fashion images — achieving 91% accuracy with interpretable predictions.

91%
Best Accuracy
7
Models Compared
70K
Images Classified
10
Categories
Scroll
Fashion MNIST
A modern drop-in replacement for classic MNIST — 70,000 grayscale images of fashion products across 10 balanced categories.
10 Categories
0T-shirt/top
5Sandal
1Trouser
6Shirt
2Pullover
7Sneaker
3Dress
8Bag
4Coat
9Ankle boot
Dataset Statistics
Training images 60,000
Test images 10,000
Image dimensions 28 × 28 px
Mean pixel intensity 72.94
Std deviation 90.02
Samples per class 6,000
Data Pipeline
Four-step preprocessing transforms raw pixel data into model-ready inputs.
01
Load
Import from Keras Fashion MNIST API
02
Normalize
Scale pixels ÷ 255 to [0, 1] range
03
Reshape
(28, 28, 1) for CNN — flatten for ML
04
Split
Train / validation / test partitions
Seven Models Compared
Deep learning architectures benchmarked against five traditional ML methods to quantify the accuracy-complexity tradeoff.

ResNet

Deep Learning
~91%
Test Accuracy
Residual blocks with skip connections solve vanishing gradients. Batch normalization and global average pooling keep it lightweight at only 38K parameters.

CNN

Deep Learning
~91%
Test Accuracy
Sequential architecture with progressive filter complexity (32→64→128). Three convolutional blocks with max pooling for spatial feature extraction.

Random Forest

Traditional ML
~89%
Test Accuracy
Ensemble of decision trees operating on flattened 784-d vectors. Best traditional method — balances accuracy with training speed.

SVM

Traditional ML
~87%
Test Accuracy
RBF kernel finds optimal hyperplanes for non-linear class separation. Strong accuracy but computationally expensive at scale.

MLP

Traditional ML
~85%
Test Accuracy
Fully connected neural network with hidden layers. Captures non-linear patterns but lacks spatial awareness of image structure.

Logistic Regression

Traditional ML
~83%
Test Accuracy
Linear baseline classifier. Fastest training time and full interpretability — serves as the performance floor for comparison.
Network Design
Layer-by-layer breakdown of both deep learning architectures.

ResNet

Residual Network with skip connections

Input 28×28×1
Conv2D(32, 3×3) + ReLU
MaxPool(2×2) → 13×13
ResBlock — Conv→BN→Conv→BN + Skip
MaxPool(2×2) → 6×6
ResBlock — Conv→BN→Conv→BN + Skip
GlobalAvgPool → Flatten
Dense(10, Softmax)
Total: 38,154 parameters (148 KB)

CNN

Sequential convolutional network

Input 28×28×1
Conv2D(32, 3×3) → 26×26×32
MaxPool(2×2) → 13×13×32
Conv2D(64, 3×3) → 11×11×64
MaxPool(2×2) → 5×5×64
Conv2D(128, 3×3) → 3×3×128
Flatten → 1,152Dense(128, ReLU)
Dense(10, Softmax)
Filters: 32 → 64 → 128 (progressive)
Performance Analysis
Both deep learning models converge to 91% validation accuracy with healthy training dynamics.

ResNet Training

Train accuracy95%
Val accuracy91%
Train loss0.10
Val loss0.30
Epochs20
Overfit gap~4%

CNN Training

Train accuracy96%
Val accuracy91%
Train loss0.12
Val loss0.28
Epochs10
Overfit gap~5%
Model Accuracy Comparison
91%
ResNet
91%
CNN
89%
RF
87%
SVM
85%
MLP
83%
LogReg
Per-Class F1 Scores (ResNet)
Class F1 Score Prec Recall F1
Trouser
0.980.990.98
Bag
0.970.990.98
Sandal
0.960.990.97
Sneaker
0.940.980.96
Ankle boot
1.000.920.96
Dress
0.950.880.91
Pullover
0.870.880.88
Coat
0.880.870.88
T-shirt/top
0.810.910.86
Shirt
0.790.710.74
Confusion Matrix & Grad-CAM
Understanding what the model gets right, where it fails, and why it makes its predictions.

Confusion Matrix

ResNet predictions across all 10 categories (10,000 test samples)

Confusion Matrix
Shirt T-shirt/top 156
Ankle boot Sneaker 56
Pullover Coat 43
Coat Pullover 40

Grad-CAM Visualization

Gradient-weighted Class Activation Mapping — where the model looks to make decisions

Grad-CAM Visualization
OriginalTrue label: Ankle boot
HeatmapYellow/red = high attention
OverlayPredicted: Ankle boot ✓
Limitations & Future Work

Current Limitations

Grayscale images on white backgrounds limit real-world generalization
Shirt vs. T-shirt confusion (F1: 0.74) due to similar silhouettes at 28×28
~4–5% train-val gap indicates mild overfitting in both deep models
No transfer learning explored for potential accuracy gains

Future Directions

Extend to colored datasets with diverse backgrounds (e.g., DeepFashion)
Add attention mechanisms for fine-grained collar/sleeve distinction
Apply dropout and stronger data augmentation to reduce overfitting
Fine-tune pretrained models like MobileNet or EfficientNet
Project Documents
Access the full presentation, detailed report, and source code documentation.
PDF

Presentation Slides

Complete 21-slide deck covering project motivation, methodology, all model architectures, results, confusion matrix analysis, and Grad-CAM interpretability.

21 slides ~15 min read
PDF

Report — Part 1

Full technical report: abstract, introduction, methodology, preprocessing pipeline, model descriptions, performance analysis, confusion matrices, and classification reports.

3 pages Results + Code
PDF

Report — Part 2

Complete source code with outputs: data loading, visualization, pixel analysis, ResNet & CNN implementation, model training logs, confusion matrix generation, and classification reports.

6 pages Source Code
Authors
Arizona State University — RAS598: Robotic and Autonomous Systems, Fall 2024
KA
Karan Athrey
AS
Abhijit Sinha
AC
Anusha Chatterjee