End-to-end pipeline to classify Sri Lankan tea leaf diseases from images using PyTorch transfer learning (MobileNetV2, VGG16, ResNet50) with a clean CLI, YAML configs, and reproducible training.
Dataset: TeaLeafBD — Kaggle
- ✨ Features
- 🏗️ Architecture
- 📁 Folder Structure
- ⚡ Quick Start
- 🐳 Optional: GPU Dev Container
▶️ Train & Evaluate
| Capability | Details |
|---|---|
| Dataset Ready | Works with the TeaLeafBD dataset; helper script to split train/val/test. |
| Multiple Backbones | Baseline CNN, VGG16, ResNet50, MobileNetV2 (transfer learning). |
| Reproducible Configs | Run with a single YAML file: --config config.yaml (hyperparams, paths). |
| Clean CLI | train.py supports train, fine-tune, evaluate; AMP, freeze/unfreeze, checkpoints. |
| Reports/Outputs | Saves best checkpoint and artifacts under outputs/ & reports/. |
| Assignment-friendly | Mirrors coursework needs with baseline + TL models and clear structure. |
flowchart LR
A([Images: TeaLeafBD]) -->|split_dataset.py| D["data/train, val, test"]
D --> T["train.py CLI"]
subgraph Models
direction TB
M1["Baseline CNN"]
M2["VGG16 (TL)"]
M3["ResNet50 (TL)"]
M4["MobileNetV2 (TL)"]
end
T --> M1
T --> M2
T --> M3
T --> M4
T --> L["PyTorch Trainer"]
L --> CKPT[("outputs/best_model.pth")]
L --> RPT["reports/metrics, logs"]
src/
train.py
models/
factory.py
baseline_cnn.py
scripts/
split_dataset.py
data/
raw/ train/ val/ test/
outputs/
reports/
notebooks/
experiments/
# 1) Clone
git clone https://github.com/Dumidu1212/tea-leaf-disease-detection-system.git
cd tea-leaf-disease-detection-system
# 2) Create & activate venv
python -m venv .venv
# Linux/Mac
source .venv/bin/activate
# Windows (PowerShell)
.venv\Scripts\Activate.ps1
# 3) Install dependencies
pip install -r requirements.txt
Get the dataset (TeaLeafBD, Kaggle)
# install kaggle CLI if needed
pip install kaggle
# authenticate (place kaggle.json under ~/.kaggle with chmod 600 on Linux/Mac)
# then download & extract
kaggle datasets download -d bmshahriaalam/tealeafbd-tea-leaf-disease-detection -p datasets
unzip datasets/tealeafbd-tea-leaf-disease-detection.zip -d data/raw
Now split into train/val/test (ImageFolder layout)
python scripts/split_dataset.py --src data/raw --dst data
# Result:
# data/
# train/Healthy/..., BrownBlight/..., GrayBlight/..., RedSpider/...
# val/...
# test/...
docker run --gpus all -it --rm \
-v "$PWD":/work -w /work \
pytorch/pytorch:2.3.1-cuda11.8-cudnn8-runtime
# inside container
pip install -r requirements.txt
- Train (transfer learning, freeze backbone first)
python src/train.py \
--data_dir data \
--model mobilenet_v2 \
--epochs 10 \
--freeze_backbone
- Fine-tune (unfreeze last blocks using the best checkpoint)
python src/train.py \
--data_dir data \
--model mobilenet_v2 \
--epochs 10 \
--unfreeze \
--ckpt outputs/best_model.pth
- Evaluate on the held-out test set
python src/train.py \
--data_dir data \
--model mobilenet_v2 \
--evaluate \
--ckpt outputs/best_model.pth
# Edit paths/hparams in config.yaml as needed, then:
python src/train.py --config config.yaml