RAS598 · Fall 2024 · Arizona State University

Fashion MNIST
Classification with
Deep Learning

A comparative study of CNN, ResNet, and traditional ML models on 70,000 fashion images — achieving 91% accuracy with interpretable predictions.

91%

Best Accuracy

Models Compared

70K

Images Classified

ResNet

Deep Learning

~91%

Test Accuracy

Residual blocks with skip connections solve vanishing gradients. Batch normalization and global average pooling keep it lightweight at only 38K parameters.

CNN

Deep Learning

~91%

Test Accuracy

Sequential architecture with progressive filter complexity (32→64→128). Three convolutional blocks with max pooling for spatial feature extraction.

Random Forest

Traditional ML

~89%

Test Accuracy

Ensemble of decision trees operating on flattened 784-d vectors. Best traditional method — balances accuracy with training speed.

SVM

Traditional ML

~87%

Test Accuracy

RBF kernel finds optimal hyperplanes for non-linear class separation. Strong accuracy but computationally expensive at scale.

MLP

Traditional ML

~85%

Test Accuracy

Fully connected neural network with hidden layers. Captures non-linear patterns but lacks spatial awareness of image structure.

Logistic Regression

Traditional ML

~83%

Test Accuracy

Linear baseline classifier. Fastest training time and full interpretability — serves as the performance floor for comparison.

04 — Architecture

Network Design

Layer-by-layer breakdown of both deep learning architectures.

ResNet

Residual Network with skip connections

Input 28×28×1

Conv2D(32, 3×3) + ReLU

MaxPool(2×2) → 13×13

ResBlock — Conv→BN→Conv→BN + Skip

MaxPool(2×2) → 6×6

ResBlock — Conv→BN→Conv→BN + Skip

GlobalAvgPool → Flatten

Dense(10, Softmax)

Total: 38,154 parameters (148 KB)

CNN

Sequential convolutional network

Input 28×28×1

Conv2D(32, 3×3) → 26×26×32

MaxPool(2×2) → 13×13×32

Conv2D(64, 3×3) → 11×11×64

MaxPool(2×2) → 5×5×64

Conv2D(128, 3×3) → 3×3×128

Flatten → 1,152 → Dense(128, ReLU)

Dense(10, Softmax)

Filters: 32 → 64 → 128 (progressive)

05 — Results

Performance Analysis

Both deep learning models converge to 91% validation accuracy with healthy training dynamics.

ResNet Training

Train accuracy95%

Val accuracy91%

Train loss0.10

Val loss0.30

Epochs20

Overfit gap~4%

CNN Training

Train accuracy96%

Val accuracy91%

Train loss0.12

Val loss0.28

Epochs10

Overfit gap~5%

Model Accuracy Comparison

91%

ResNet

91%

CNN

89%

87%

SVM

85%

MLP

83%

LogReg

Per-Class F1 Scores (ResNet)

Class F1 Score Prec Recall F1

Trouser ★

0.980.990.98

Bag ★

0.970.990.98

Sandal ★

0.960.990.97

Sneaker

0.940.980.96

Ankle boot

1.000.920.96

Dress

0.950.880.91

Pullover

0.870.880.88

Coat

0.880.870.88

T-shirt/top

0.810.910.86

Shirt ⚠

0.790.710.74

06 — Analysis

Confusion Matrix & Grad-CAM

Understanding what the model gets right, where it fails, and why it makes its predictions.

Confusion Matrix

ResNet predictions across all 10 categories (10,000 test samples)

Shirt → T-shirt/top 156

Ankle boot → Sneaker 56

Pullover → Coat 43

Coat → Pullover 40

Grad-CAM Visualization

Gradient-weighted Class Activation Mapping — where the model looks to make decisions

OriginalTrue label: Ankle boot

HeatmapYellow/red = high attention

OverlayPredicted: Ankle boot ✓

09 — Looking Ahead

Limitations & Future Work

Current Limitations

◆Grayscale images on white backgrounds limit real-world generalization

◆Shirt vs. T-shirt confusion (F1: 0.74) due to similar silhouettes at 28×28

◆~4–5% train-val gap indicates mild overfitting in both deep models

◆No transfer learning explored for potential accuracy gains

Future Directions

◇Extend to colored datasets with diverse backgrounds (e.g., DeepFashion)

◇Add attention mechanisms for fine-grained collar/sleeve distinction

◇Apply dropout and stronger data augmentation to reduce overfitting

◇Fine-tune pretrained models like MobileNet or EfficientNet

08 — Documents

Project Documents

Access the full presentation, detailed report, and source code documentation.

PDF

Presentation Slides

Complete 21-slide deck covering project motivation, methodology, all model architectures, results, confusion matrix analysis, and Grad-CAM interpretability.

21 slides ~15 min read

View Download

PDF

Report — Part 1

Full technical report: abstract, introduction, methodology, preprocessing pipeline, model descriptions, performance analysis, confusion matrices, and classification reports.

3 pages Results + Code

View Download

PDF

Report — Part 2

Complete source code with outputs: data loading, visualization, pixel analysis, ResNet & CNN implementation, model training logs, confusion matrix generation, and classification reports.

6 pages Source Code

View Download

Fashion MNISTClassification withDeep Learning

ResNet

CNN

Random Forest

SVM

MLP

Logistic Regression

ResNet

CNN

ResNet Training

CNN Training

Confusion Matrix

Grad-CAM Visualization

Current Limitations

Future Directions

Presentation Slides

Report — Part 1

Report — Part 2

Fashion MNIST
Classification with
Deep Learning