VGG19 Transfer Learning with Keras: Weather Image Classification using Keras VGG19 transfer learning
Building a Weather Image Classifier with Keras and VGG19 (End-to-End)
Introduction
In this post, we’ll build a complete weather image classification pipeline in Python using Keras with a VGG19 backbone and explore Keras VGG19 transfer learning.
You’ll see how to split raw images into train and validation sets, set up data augmentation, attach a lightweight classification head to VGG19, train and evaluate with callbacks, and finally run predictions on unseen images.
The link for the video tutorial is here : https://youtu.be/uw3WK0TcGH4&list=UULFTiWJJhaH6BviSWKLJUM9sg
You can find the full code here : https://ko-fi.com/s/efaafe52c5
You can find more tutorials in my blog here : https://eranfeit.net/blog/
Here is Code VGG19 Transfer Learning :
Installation :
# Requirements : Nvidia GPU card & and Cuda tool kit install # I am using this card : https://amzn.to/3mTa7HX # Working Anaconda enviroment #dataset : https://www.kaggle.com/datasets/vijaygiitk/multiclass-weather-dataset conda create -n weather-predict-CNN python=3.7 conda activate weather-predict-CNN pip install tensorflow pip install tensorflow-gpu pip install pillow pip install SciPy pip install matplotlib pip install pandas pip install numpy
You can find the full code here : https://ko-fi.com/s/efaafe52c5
Link to the weather dataset : https://www.kaggle.com/datasets/vijaygiitk/multiclass-weather-dataset
Part 1 — Dataset Split into Train & Validation Folders
Short description:
We prepare a clean train/validation split from category folders (cloudy, foggy, rainy, shine, sunrise). Files of zero length are skipped. A reproducible shuffle creates balanced splits, and images are copied into the destination folders.
### Import the os module to interact with the filesystem. import os ### Import random for shuffling the dataset before splitting. import random ### Import shutil for copying files from source to destination. import shutil ### Set the path to the original dataset's root directory. dataOrgFolder = "C:/Python-cannot-upload-to-GitHub/Weather/original-dataset/" ### Set the path to the base dataset directory that will hold Train/validate splits. dataBaseFolder = "C:/Python-cannot-upload-to-GitHub/Weather/dataset" ### List all subdirectories (classes) inside the original dataset folder. dataDirList = os.listdir(dataOrgFolder) ### Print the detected class directories to verify structure. print(dataDirList) ### Define the fraction of images that will go to the training set. splitSize = .85 ### Define a function to split and copy images from SOURCE to TRAINING and VALIDATION folders. def split_data (SOURCE , TRAINING , VALIDATION , SPLIT_SIZE): ### Initialize a list to collect valid file names. files = [] ### Iterate over all files inside the SOURCE directory. for filename in os.listdir(SOURCE) : ### Build the absolute path for each file. file = SOURCE + filename ### Print the file path (useful for debugging and progress tracking). print(file) ### Guard against zero-length files to avoid copy errors or corrupt samples. if os.path.getsize(file) > 0 : ### Add valid file to the list for later splitting. files.append(filename) else: ### Notify that the file will be skipped due to zero size. print(filename + " has 0 length , will not copy this file !!") ### Print the total number of valid files found in the SOURCE directory. print(len(files)) ### Compute the number of training images according to SPLIT_SIZE. trainLength = int(len(files) * SPLIT_SIZE ) ### Compute the number of validation images as the remainder. validLength = int( len(files) - trainLength ) ### Shuffle the dataset deterministically for a fair split. suffleDataSet = random.sample(files, len(files)) ### Slice the shuffled list to create the training subset. trainingSet = suffleDataSet[0:trainLength] ### The remaining files form the validation subset. validSet = suffleDataSet[trainLength:] ### Copy training files from SOURCE to TRAINING destination. for filename in trainingSet: ### Build source path of the file to copy. f = SOURCE + filename ### Build training destination path for the file. dest = TRAINING + filename ### Execute the copy operation for a training file. shutil.copy(f, dest) ### Copy validation files from SOURCE to VALIDATION destination. for filename in validSet: ### Build source path of the validation file. f = SOURCE + filename ### Build validation destination path for the file. dest = VALIDATION + filename ### Execute the copy operation for a validation file. shutil.copy(f, dest) ### Define class-specific folders for 'cloudy' category: source, train, and validation. cloudySourceFolder = "C:/Python-cannot-upload-to-GitHub/Weather/original-dataset/cloudy/" ### Destination folder for training 'cloudy' images. cloudyTrainFolder = "C:/Python-cannot-upload-to-GitHub/Weather/dataset/Train/cloudy/" ### Destination folder for validation 'cloudy' images. cloudyValidFolder = "C:/Python-cannot-upload-to-GitHub/Weather/dataset/validate/cloudy/" ### Define class-specific folders for 'foggy' category. foggySourceFolder = "C:/Python-cannot-upload-to-GitHub/Weather/original-dataset/foggy/" ### Destination folder for training 'foggy' images. foggyTrainFolder = "C:/Python-cannot-upload-to-GitHub/Weather/dataset/Train/foggy/" ### Destination folder for validation 'foggy' images. foggyValidFolder = "C:/Python-cannot-upload-to-GitHub/Weather/dataset/validate/foggy/" ### Define class-specific folders for 'rainy' category (note the 'rainyySourceFolder' name). rainyySourceFolder = "C:/Python-cannot-upload-to-GitHub/Weather/original-dataset/rainy/" ### Destination folder for training 'rainy' images. rainyTrainFolder = "C:/Python-cannot-upload-to-GitHub/Weather/dataset/Train/rainy/" ### Destination folder for validation 'rainy' images. rainyValidFolder = "C:/Python-cannot-upload-to-GitHub/Weather/dataset/validate/rainy/" ### Define class-specific folders for 'shine' category. shineSourceFolder = "C:/Python-cannot-upload-to-GitHub/Weather/original-dataset/shine/" ### Destination folder for training 'shine' images. shineTrainFolder = "C:/Python-cannot-upload-to-GitHub/Weather/dataset/Train/shine/" ### Destination folder for validation 'shine' images. shineValidFolder = "C:/Python-cannot-upload-to-GitHub/Weather/dataset/validate/shine/" ### Define class-specific folders for 'sunrise' category. sunriseSourceFolder = "C:/Python-cannot-upload-to-GitHub/Weather/original-dataset/sunrise/" ### Destination folder for training 'sunrise' images. sunriseTrainFolder = "C:/Python-cannot-upload-to-GitHub/Weather/dataset/Train/sunrise/" ### Destination folder for validation 'sunrise' images. sunriseValidFolder = "C:/Python-cannot-upload-to-GitHub/Weather/dataset/validate/sunrise/" ### Execute splitting for each weather category, copying files into Train/validate folders. split_data(cloudySourceFolder , cloudyTrainFolder , cloudyValidFolder , splitSize) ### Split the 'foggy' images into training and validation folders. split_data(foggySourceFolder , foggyTrainFolder , foggyValidFolder , splitSize) ### Split the 'rainy' images into training and validation folders. split_data(rainyySourceFolder , rainyTrainFolder , rainyValidFolder , splitSize) ### Split the 'shine' images into training and validation folders. split_data(shineSourceFolder , shineTrainFolder , shineValidFolder , splitSize) ### Split the 'sunrise' images into training and validation folders. split_data(sunriseSourceFolder , sunriseTrainFolder , sunriseValidFolder , splitSize)
You can find the full code here : https://ko-fi.com/s/efaafe52c5
Elaborated description — Part 1: Dataset Split into Train & Validation
This part builds a clean, reproducible folder structure for supervised learning by shuffling each class folder and copying ~85% of images into a training set and the remainder into a validation set. The function deliberately skips zero-length files to avoid corrupt samples that can break flow_from_directory
. For long-running copy jobs, logging each file path helps with traceability and quick debugging when a class appears underrepresented after the split. To make shuffling deterministic across runs, consider adding random.seed(42)
near the top; that way you can reproduce results and investigate training differences without the split changing each time.
Pay close attention to directory naming and casing. In this script the split is written under dataset/Train/...
and dataset/validate/...
, while the training code later expects weather-data/train
and weather-data/validation
. Before training, either move/rename folders to match the training paths or point flow_from_directory
at the split you just created. Also, prefer os.path.join(SOURCE, filename)
over concatenation (e.g., SOURCE + filename
) to avoid subtle path issues between Windows (\
) and Linux (/
). If you work in WSL or move to a server, this small change prevents a lot of friction.
If class counts are imbalanced, the random split may amplify that imbalance and hurt validation reliability. You can mitigate this by (a) enforcing class-wise caps, (b) using stratified splitting logic (by definition you are per-class already, but ensure class folders themselves are balanced), or (c) later applying class weights during training. For large datasets, copying files can be slow; you can switch to symlinks (where supported) instead of physical copies to save disk and time. Finally, verify the number of items copied per class with a quick tally to ensure the split ratios match expectations before proceeding.
Part 2 — Training a VGG19 Classifier in Keras
Short description:
We compose a high-level training script: loading VGG19 (no top layers), freezing the base, adding a Flatten + Dense softmax head, and training with ImageDataGenerator
augmentation, ModelCheckpoint
, and EarlyStopping
. We also visualize accuracy and loss.
### Import a regex flag (as provided) though not used; safe to keep for parity with original code. from re import I ### Import Dense and Flatten layers to build the classification head. from keras.layers import Dense, Flatten ### Import Model base class for potential functional API usage. from keras.models import Model ### Import VGG19 architecture and preprocessing utilities. from keras.applications.vgg19 import VGG19 ### Import preprocess_input if you plan to use VGG19 preprocessing; here we rescale instead. from keras.applications.vgg19 import preprocess_input ### Import image utilities for potential manual image handling. from keras.preprocessing import image ### Import ImageDataGenerator for streaming images with augmentation. from keras.preprocessing.image import ImageDataGenerator ### Import Sequential to stack layers simply. from keras.models import Sequential ### Import NumPy for numerical operations. import numpy as np ### Import matplotlib for training curves visualization. import matplotlib.pyplot as plt ### Import glob to infer number of classes from directory names. from glob import glob ### Define the input size expected by our model head (VGG19 base can adapt). IMAGE_SIZE = [150,150] ### Set the training images directory path expected by flow_from_directory. trainImagesFolder = "C:/Python-cannot-upload-to-GitHub/Weather/weather-data/train" ### Set the validation images directory path expected by flow_from_directory. validationImagesFolder = "C:/Python-cannot-upload-to-GitHub/Weather/weather-data/validation" ### Configure data augmentation for the training set (rescale + geometric transforms). train_datagen = ImageDataGenerator(rescale=1. / 255, shear_range = 0.4, zoom_range= 0.4, rotation_range=0.4, horizontal_flip= True) ### Configure a minimal pipeline for the validation set (rescale only). valid_datagen = ImageDataGenerator( rescale= 1. / 255) ### Stream training batches from folders; Keras infers class names from subfolders. train_data_set = train_datagen.flow_from_directory(trainImagesFolder, target_size=(150,150), batch_size=32, class_mode='categorical') ### Stream validation batches from folders; must mirror the train directory structure. valid_data_set = valid_datagen.flow_from_directory(validationImagesFolder, target_size=(150,150), batch_size=32, class_mode='categorical') ### Instantiate VGG19 without the top classifier so we can attach our own head. myVgg = VGG19(input_shape=IMAGE_SIZE + [3], weights='imagenet', include_top=False) ### Freeze all layers in the VGG19 base to train only the new head initially. for layer in myVgg.layers: layer.trainable = False ### Infer class directories to determine the number of categories. Classes = glob('C:/Python-cannot-upload-to-GitHub/Weather/weather-data/train/*') ### Print the class folder paths to confirm discovery. print(Classes) ### Count how many classes are present for the Dense softmax layer. classesNum = len(Classes) ### Log the class count for traceability. print ('Number of Classes : ') ### Print the actual integer number of classes. print(classesNum) ### Build a compact Sequential model: base CNN + Flatten + softmax classifier. model = Sequential() ### Add the frozen VGG19 feature extractor as the first layer. model.add(myVgg) ### Flatten 2D feature maps into a 1D vector for the Dense layer. model.add(Flatten()) ### Final Dense layer with softmax activation across discovered classes. model.add(Dense(classesNum , activation='softmax')) ### Print a human-readable summary of the model architecture. print (model.summary()) ### Compile the model with Adam optimizer, categorical cross-entropy loss, and accuracy metric. model.compile(optimizer='Adam', loss='categorical_crossentropy', metrics=['accuracy']) ### Import training callbacks: save the best model and stop early when validation stalls. from keras.callbacks import ModelCheckpoint , EarlyStopping ### Save only the best weights based on validation accuracy to a given path. checkpoint = ModelCheckpoint('C:/Python-cannot-upload-to-GitHub/Weather/weather-data/MyVgg19Option2.h5', monitor='val_accuracy', verbose=1, save_best_only=True) ### Stop training if validation accuracy does not improve for 5 epochs (patience). earlystop = EarlyStopping(monitor='val_accuracy', patience=5 , verbose=1) ### Train the model using directory iterators with validation and callbacks enabled. result = model.fit(train_data_set, validation_data=valid_data_set , epochs=15, verbose=1 , callbacks=[checkpoint,earlystop]) ### Plot training accuracy versus validation accuracy for quick diagnostics. plt.plot(result.history['accuracy'], label='train accuracy') ### Add validation accuracy curve to the same figure. plt.plot(result.history['val_accuracy'], label='val accuracy') ### Display a legend to distinguish training from validation curves. plt.legend() ### Render the accuracy chart. plt.show() ### Plot training loss versus validation loss to detect overfitting/underfitting patterns. plt.plot(result.history['loss'], label='train loss') ### Add validation loss curve to the same figure. plt.plot(result.history['val_loss'], label='val loss') ### Display a legend for the loss curves. plt.legend() ### Render the loss chart. plt.show()
You can find the full code here : https://ko-fi.com/s/efaafe52c5
Elaborated description — Part 2: Training a VGG19 Classifier in Keras
Here you construct a transfer-learning pipeline with VGG19 as a frozen feature extractor and a lightweight Flatten→Dense softmax head for classification. Freezing the base drastically reduces trainable parameters, improving stability on smaller datasets and speeding up training. After the head converges, you can optionally fine-tune by unfreezing deeper VGG blocks (e.g., block5_*
) and training with a lower learning rate to squeeze out extra accuracy. If memory is limited or your dataset is modest, consider replacing Flatten()
with GlobalAveragePooling2D()
to reduce parameters and overfitting risk; adding Dropout(0.3–0.5)
before the Dense layer often helps too.
Data augmentation is applied with ImageDataGenerator
to improve generalization: shear, zoom, rotation, and horizontal flips model real-world variation without needing more labeled data. Because you’ve used rescale=1./255
, the network will learn on normalized inputs; that’s perfectly valid. An alternative is to use preprocess_input
specific to VGG19 (mean subtraction in BGR order). If you switch to that, do it consistently in both training and inference. The directory iterators infer class names from subfolder order; capture train_data_set.class_indices
and save it alongside your model so that the class→index mapping stays consistent when you deploy.
Two callbacks make training production-friendly. ModelCheckpoint saves only the best weights based on val_accuracy
, protecting you from regression in later epochs. EarlyStopping halts training when validation stops improving, saving compute and avoiding overfitting; you can also add ReduceLROnPlateau to shrink the learning rate automatically when progress stalls. After training, the accuracy and loss plots give an immediate health check: widening gaps (high train accuracy but low val accuracy) suggest overfitting, calling for stronger augmentation, dropout, or fine-tuning fewer layers; both curves rising/flat suggest optimization issues (try a smaller LR or more epochs).
Part 3 — Predicting a New Image
Short description:
We load the saved model and send a single image through the same preprocessing pipeline (resize to 150×150, scale to [0,1]). The highest softmax value determines the predicted weather class.
### Import TensorFlow for loading the Keras model and running inference. import tensorflow as tf ### Import Model class (optional for type hints or advanced usage). from keras.models import Model ### Import image utilities (not strictly required here but aligned with original code). from keras.preprocessing import image ### Import specific helpers to load and convert images to arrays. from keras.preprocessing.image import load_img, img_to_array ### Import NumPy for array manipulation. import numpy as np ### Define the class label ordering used during training. categories = ["cloudy" , "foggy" , "rainy" , "shine" , "sunrise"] ### Load the trained Keras model from disk (ensure this path matches the checkpoint you saved). model = tf.keras.models.load_model("C:/Python-cannot-upload-to-GitHub/Weather/weather-data/MyVgg19.h5") ### Define a helper function to prepare a single image for prediction. def prepareImage(pathForImage): ### Load the image and resize to the same target size used in training. image = load_img(pathForImage , target_size=(150,150)) ### Convert the PIL image to a NumPy array. imgResult = img_to_array(image) ### Add a batch dimension so the model sees shape (1, H, W, C). imgResult = np.expand_dims(imgResult , axis = 0) ### Rescale pixels to [0,1] to match training rescaling. imgResult = imgResult / 255. ### Return the prepared batch array for inference. return imgResult ### Provide the path to a test image you want to classify. testImagePath = "C:/Python-cannot-upload-to-GitHub/Weather/Test/rain_4.jpg" ### Prepare the test image using the helper so it matches model expectations. imgForModel = prepareImage(testImagePath) ### Run the forward pass to obtain class probabilities from the softmax layer. resultArray = model.predict(imgForModel , verbose=1) ### Optionally print the raw probabilities for inspection. #print (resultArray) ### Select the index of the highest probability as the predicted class id. answer = np.argmax(resultArray, axis=1) ### Print the predicted index (for debugging). print (answer) ### Extract the index value from the one-element array. index = answer[0] ### Map the index back to the human-readable class name and print the result. print ("this image is : "+ categories[ index ])
You can find the full code here : https://ko-fi.com/s/efaafe52c5
Elaborated description — Part 3: Predicting a New Image
Inference mirrors the training preprocessing for reliable results: the image is resized to 150×150, converted to a NumPy array, expanded with a batch dimension, and scaled to [0,1]
. Consistency is crucial—if you later switch training to preprocess_input
, make the same change here. The model outputs a softmax probability vector; taking argmax
yields the predicted class index. Ensure that the categories
list is aligned with the exact ordering used during training. The safest approach is to serialize class_indices
from the training generator and load it at inference time to avoid mismatches if folder names change or the generator enumerates differently.
For richer outputs, consider reporting top-k predictions (e.g., top-3 with their probabilities) and adding a confidence threshold to abstain on low-confidence cases. If you plan to batch-process many images, vectorize by stacking multiple prepared arrays and calling a single model.predict
to leverage GPU/CPU efficiently. When deploying to edge or mobile, export to TF-Lite to reduce size and latency. Finally, keep the model path consistent with training: if you saved MyVgg19Option2.h5
as the best checkpoint, load that exact file here; mismatched checkpoints are a common cause of confusing predictions.
Connect :
☕ Buy me a coffee — https://ko-fi.com/eranfeit
🖥️ Email : feitgemel@gmail.com
🤝 Fiverr : https://www.fiverr.com/s/mB3Pbb
Enjoy,
Eran