This tutorial provides a step-by-step guide on how to implement and train a U-Net model for animals segmentation using TensorFlow/Keras.
The tutorial is divided into four parts:
Part 1: Data Preprocessing and Preparation
In this part, you load and preprocess the persons dataset, including resizing images and masks, converting masks to binary format, and splitting the data into training, validation, and testing sets.
Part 2: U-Net Model Architecture
This part defines the U-Net model architecture using Keras. It includes building blocks for convolutional layers, constructing the encoder and decoder parts of the U-Net, and defining the final output layer.
Part 3: Model Training
Here, you load the preprocessed data and train the U-Net model. You compile the model, define training parameters like learning rate and batch size, and use callbacks for model checkpointing, learning rate reduction, and early stopping.
Part 4: Model Evaluation and Inference
The final part demonstrates how to load the trained model, perform inference on test data, and visualize the predicted segmentation masks.
Check out our tutorial here : https://www.youtube.com/watch?v=oHc4yrV64wU
Link for the full code here : https://ko-fi.com/s/a88e66f66b
Link for my blog : https://eranfeit.net/blog/
Here is the code for Animal Segmentation Model
Part 1 : Working with the Oxford-IIIT Pet Dataset for Image Segmentation
Introduction
In this tutorial, we will explore how to load, preprocess, and save the Oxford-IIIT Pet Dataset for image segmentation tasks.
This dataset is widely used in computer vision research because it contains images of 37 categories of pets, along with corresponding masks that define the main object (the pet), the background, and the border around the object.
The goal of this code is to prepare the dataset for training and testing segmentation models, such as U-Net, by resizing images, normalizing pixel values, and formatting the masks for machine learning.
By the end, we will have structured NumPy arrays containing both images and masks for training and evaluation.
Preparing Libraries and Dataset Parameters
We start by importing the necessary Python libraries and defining dataset parameters.
These steps set up the environment for reading and processing images and masks.
### Import pandas for reading the dataset annotation files import pandas as pd ### Import OpenCV for image loading and resizing import cv2 ### Import NumPy for numerical operations and array handling import numpy as np ### Define the target image height for resizing Height = 128 ### Define the target image width for resizing Width= 128 ### Define the number of categories in the mask (object, background, border) NumOfCategories = 3 ### The mask images contain three types of values: # Value = 1 indicates the main object (the animal) # Value = 2 indicates the background # Value = 3 indicates the border of the object
Link for the full code here : https://ko-fi.com/s/a88e66f66b
Initializing Data Structures and Loading Training Images
Next, we create arrays for storing training and testing data.
We then load the training images and their corresponding masks, resize them, normalize them, and save them into lists.
### Create lists for training images and masks allImages = [] maskImages = [] ### Create lists for test images and masks allTestImages = [] maskTestImages = [] ### Define dataset path path = "E:/Data-sets/Unet-Multi-class/" ### Define the path to the training and testing annotation files trainFile = path + "annotations/trainval.txt" testFile = path + "annotations/test.txt" ### Load the training annotations print("Load train data : ") ### Read the training file which contains image names df = pd.read_csv(trainFile, sep=" ", header=None) ### Extract the list of training file names names = df[0].values print ("Train data info :") print(len(names)) ### Loop over each training file name for name in names : ### Build the path for the image file imageFileName = path + "images/" + name + ".jpg" print(imageFileName) ### Load the image using OpenCV img = cv2.imread(imageFileName , cv2.IMREAD_COLOR) ### Resize the image to the defined width and height img = cv2.resize(img, (Width,Height)) ### Normalize the image by dividing pixel values by 255 img = img / 255.0 ### Convert the image to float32 format img = img.astype(np.float32) ### Append the processed image to the list allImages.append(img) ### Build the path for the mask file maskFileName = path + "annotations/trimaps/" + name + ".png" ### Load the mask in grayscale mode mask = cv2.imread(maskFileName , cv2.IMREAD_GRAYSCALE) ### Resize the mask to the same size as the image mask = cv2.resize(mask , (Width, Height)) ### Append the processed mask to the list maskImages.append(mask)
Link for the full code here : https://ko-fi.com/s/a88e66f66b
Converting Training Data to NumPy Arrays and Analyzing Masks
After collecting all training images and masks, we convert them into NumPy arrays for easier handling.
We also explore mask values by resizing and replacing categories for better understanding.
### Convert training images list into a NumPy array allImagesNP = np.array(allImages) ### Convert training masks list into a NumPy array maskImagesNP = np.array(maskImages) ### Convert masks to integer type maskImagesNP = maskImagesNP.astype(int) ### Print array details for images and masks print(allImagesNP.shape) print(allImagesNP.dtype) print(maskImagesNP.shape) print(maskImagesNP.dtype) ### Resize one mask to a smaller size for visualization x = cv2.resize(maskImagesNP[0], (16,16), interpolation=cv2.INTER_NEAREST) print(x) ### Loop through each row in the reduced mask for i in range(len(x)): ### Loop through each column in the row for j in range(len(x[i])): ### Get the pixel value v = x[i][j] ### Replace the values according to the rules if v==1 : # the object x[i][j] = 0 if v==2 : # the background x[i][j] = 22 if v==3 : # the border x[i][j] = 333 ### Print the updated mask values print(x)
Link for the full code here : https://ko-fi.com/s/a88e66f66b
Loading Test Data and Saving Processed Arrays
Finally, we repeat the same process for test data and save all preprocessed arrays into .npy
files for future use in training deep learning models.
### Print message before loading test data print("load test data :") ### Read the test annotation file df = pd.read_csv(testFile, sep=" ", header=None) ### Extract test file names names = df[0].values print ("Test data info :") print(len(names)) ### Loop through each test image for name in names : imageFileName = path + "images/" + name + ".jpg" print(imageFileName) ### Load and preprocess the image img = cv2.imread(imageFileName , cv2.IMREAD_COLOR) img = cv2.resize(img, (Width,Height)) img = img / 255.0 img = img.astype(np.float32) allTestImages.append(img) ### Load and preprocess the mask maskFileName = path + "annotations/trimaps/" + name + ".png" mask = cv2.imread(maskFileName , cv2.IMREAD_GRAYSCALE) mask = cv2.resize(mask , (Width, Height)) maskTestImages.append(mask) ### Convert test lists into NumPy arrays allTestImagesNP = np.array(allTestImages) maskTestImagesNP = np.array(maskTestImages) maskTestImagesNP = maskTestImagesNP.astype(int) ### Print details of test arrays print(allTestImagesNP.shape) print(allTestImagesNP.dtype) print(maskTestImagesNP.shape) print(maskTestImagesNP.dtype) ### Save training and test arrays into .npy files print("Save the Data :") np.save("e:/temp/Unet-Animals-train-images.npy", allImagesNP) np.save("e:/temp/Unet-Animals-train-mask.npy", maskImagesNP) np.save("e:/temp/Unet-Animals-test-images.npy", allTestImagesNP) np.save("e:/temp/Unet-Animals-test-mask.npy", maskTestImagesNP) print("Finish save the data !")
Link for the full code here : https://ko-fi.com/s/a88e66f66b
Part 2 : Building a U-Net Model for Image Segmentation with TensorFlow and Keras
Save the following code parts as one file named : “Step02UnetModel.py” in the same folder
Introduction
In this tutorial, we will build a U-Net model for image segmentation using TensorFlow and Keras.
U-Net is one of the most popular deep learning architectures for segmentation tasks because it can accurately capture spatial features through its encoder-decoder structure with skip connections.
This code defines the building blocks of the U-Net, constructs the encoder, bridge, and decoder, and finally compiles the model.
By the end, you will understand how each component works and how to implement U-Net for your own projects.
Importing Required Libraries
We start by importing the essential TensorFlow Keras layers and the Model
class.
These components allow us to construct the encoder, decoder, and final output layers of the U-Net.
### Import Input for defining the input layer of the model ### Import Conv2D for convolution operations ### Import BatchNormalization to normalize activations ### Import Activation for nonlinear transformations ### Import MaxPool2D for downsampling in the encoder ### Import UpSampling2D for upsampling in the decoder ### Import Concatenate for merging skip connections from tensorflow.keras.layers import Input, Conv2D, BatchNormalization , Activation, MaxPool2D, UpSampling2D, Concatenate ### Import Model class to define the complete U-Net architecture from tensorflow.keras.models import Model
Link for the full code here : https://ko-fi.com/s/a88e66f66b
Defining the Convolutional Block
The convolutional block is the core element of U-Net.
It applies two convolutional layers with batch normalization and ReLU activation.
An optional max pooling layer reduces spatial dimensions when used in the encoder.
### Define a function for the convolutional block def conv_block(inputs, filters, pool=True): ### Apply first convolutional layer with 3x3 filter x = Conv2D(filters , 3 , padding="same")(inputs) ### Apply batch normalization for stable training x = BatchNormalization()(x) ### Apply ReLU activation function x = Activation("relu")(x) ### Apply second convolutional layer with 3x3 filter x= Conv2D(filters, 3, padding="same")(x) ### Apply batch normalization x = BatchNormalization()(x) ### Apply ReLU activation x= Activation("relu")(x) ### If pooling is enabled, apply max pooling to reduce dimensions if pool == True: p = MaxPool2D((2,2))(x) return x, p else : return x
Link for the full code here : https://ko-fi.com/s/a88e66f66b
Building the U-Net Architecture
Now we define the U-Net model by stacking encoder blocks, a bridge, and decoder blocks with skip connections.
### Define a function to build the U-Net model def build_unet(shape , num_classes): ### Input layer with defined shape inputs = Input(shape) ### Encoder section: progressively downsample the input x1 , p1 = conv_block(inputs, 16, pool=True) x2 , p2 = conv_block(p1, 32, pool=True) x3 , p3 = conv_block(p2 , 48 , pool=True) x4 , p4 = conv_block(p3, 64, pool=True) ### Bridge section: bottom of the U-Net without pooling b1 = conv_block(p4 , 128 , pool=False) ### Decoder section: upsample and concatenate with encoder features u1 = UpSampling2D((2,2), interpolation="bilinear")(b1) c1 = Concatenate()([u1, x4]) x5 = conv_block(c1, 64, pool=False) u2 = UpSampling2D((2,2),interpolation="bilinear")(x5) c2 = Concatenate()([u2, x3]) x6 = conv_block(c2,48,pool=False) u3 = UpSampling2D((2,2),interpolation="bilinear")(x6) c3 = Concatenate()([u3, x2]) x7 = conv_block(c3, 32 , pool=False) u4 = UpSampling2D((2,2) ,interpolation="bilinear")(x7) c4 = Concatenate()([u4, x1]) x8 = conv_block(c4 , 16 , pool=False) ### Output layer with softmax for multi-class segmentation output = Conv2D(num_classes,1, padding="same", activation="softmax")(x8) ### Return the complete U-Net model return Model(inputs, output)
Link for the full code here : https://ko-fi.com/s/a88e66f66b
Running the Model
Finally, we create an instance of the U-Net with an input size of 128×128 pixels and 3 output classes.
We then print the summary to visualize the architecture.
### Run the script only if executed directly if __name__ =="__main__": ### Build the U-Net model with input shape (128,128,3) and 3 output classes model = build_unet((128,128,3), 3) ### Print the model summary to see the architecture print(model.summary())
Link for the full code here : https://ko-fi.com/s/a88e66f66b
Part 3 : Training a U-Net Model for Animal Segmentation with TensorFlow and Keras
Introduction
In this tutorial, we will train a U-Net model for image segmentation on the Oxford-IIIT Pet Dataset.
We will start by loading preprocessed image and mask arrays, convert masks into categorical format for multi-class segmentation, split the dataset into training and validation sets, and then train the U-Net model with callbacks for better performance.
Finally, we will visualize the training and validation accuracy and loss to understand how the model performs over epochs.
Loading Data and Preparing Masks
The first step is to load the preprocessed NumPy arrays containing training images and masks.
Since segmentation masks contain class values, we convert them into categorical format so that they are ready for training a multi-class model.
### Import NumPy for handling arrays import numpy as np ### Load preprocessed training images from .npy file allImagesNP = np.load("e:/temp/Unet-Animals-train-images.npy") ### Load preprocessed training masks from .npy file maskImagesNP = np.load("e:/temp/Unet-Animals-train-mask.npy") ### Print shapes of images and masks arrays print(allImagesNP.shape) print(maskImagesNP.shape) ### Define input size and number of categories Weight = 128 Width = 128 numOfCategories = 3 ### Import utility to convert labels to categorical format from keras.utils import np_utils ### Select first mask for testing the conversion test = maskImagesNP[0] ### Convert values from range 1–3 to 0–2 test = test -1 ### Convert mask into categorical one-hot encoding test2 = np_utils.to_categorical(test, num_classes=numOfCategories) ### Print the mask before and after conversion print(test) print(test2) ### Apply conversion to all masks maskImagesNP = maskImagesNP - 1 maskForTheModel = np_utils.to_categorical(maskImagesNP , num_classes=numOfCategories) ### Print type after conversion print("print the type after the convert :") print(maskForTheModel.dtype) ### Convert mask array to integers maskForTheModel = maskForTheModel.astype(int) print(maskForTheModel.dtype)
Link for the full code here : https://ko-fi.com/s/a88e66f66b
Splitting Data into Training and Validation Sets
We now split the dataset into training and validation sets to evaluate model performance.
### Import train_test_split for dataset splitting from sklearn.model_selection import train_test_split ### Split into training and validation sets (90% train, 10% validation) X_train, X_val , y_train , y_val = train_test_split(allImagesNP, maskForTheModel, test_size=0.1 , random_state=42) ### Print the shapes of resulting arrays print("X_train , X_val , y_train , y_val --------->>>> shapes :") print(X_train.shape) print(y_train.shape) print(X_val.shape) print(y_val.shape)
Link for the full code here : https://ko-fi.com/s/a88e66f66b
Building and Training the U-Net Model
Next, we load our U-Net architecture from the previous step, compile the model, and define callbacks to optimize training.
### Import TensorFlow import tensorflow as tf ### Import the custom U-Net model definition from Step02UnetModel import build_unet ### Import training callbacks from keras.callbacks import ModelCheckpoint, ReduceLROnPlateau, EarlyStopping ### Define input shape and number of classes shape = (128,128,3) num_classes = 3 ### Set learning rate, batch size, and number of epochs lr = 1e-4 batch_size = 4 epochs = 10 ### Build U-Net model model = build_unet(shape , num_classes) print(model.summary()) ### Compile the model with categorical crossentropy and Adam optimizer model.compile(loss="categorical_crossentropy", optimizer = tf.keras.optimizers.Adam(lr), metrics=['accuracy']) ### Define steps per epoch and validation steps stepsPerEpoch = np.ceil(len(X_train)/batch_size) validationSteps = np.ceil(len(X_val)/batch_size) ### File path for saving the best model best_model_file="e:/temp/Animals-Unet.h5" ### Define training callbacks callbacks = [ ModelCheckpoint(best_model_file, verbose=1, save_best_only=True), ReduceLROnPlateau(monitor="val_loss", patience=3, factor=0.1, verbose=1, min_lr=1e-6), EarlyStopping(monitor='val_loss',patience=5 , verbose=1) ] ### Train the U-Net model history = model.fit(X_train, y_train, batch_size=batch_size, epochs=epochs, verbose=1, validation_data = (X_val, y_val), validation_steps = validationSteps, steps_per_epoch = stepsPerEpoch, shuffle=True, callbacks=callbacks)
Link for the full code here : https://ko-fi.com/s/a88e66f66b
Visualizing Training Results
Finally, we plot the training and validation accuracy and loss to understand the learning progress of the model.
### Import Matplotlib for plotting graphs import matplotlib.pyplot as plt ### Extract accuracy and loss from training history acc = history.history['accuracy'] val_acc = history.history['val_accuracy'] loss = history.history['loss'] val_loss = history.history['val_loss'] ### Define epoch range epochs = range(len(acc)) ### Plot training and validation accuracy plt.plot(epochs, acc , 'r', label="Train Accuracy") plt.plot(epochs, val_acc, 'b' , label="Validation Accuracy") plt.xlabel('Epoch') plt.ylabel('Accuracy') plt.title("Train and Validation Accuracy") plt.legend(loc='lower right') plt.show() ### Plot training and validation loss plt.plot(epochs, loss , 'r', label="Train Loss") plt.plot(epochs, val_loss, 'b' , label="Validation Loss") plt.xlabel('Epoch') plt.ylabel('Loss') plt.title("Train and Validation Loss") plt.legend(loc='upper right') plt.show()
Link for the full code here : https://ko-fi.com/s/a88e66f66b
Part 4 – Running Inference with a Trained U-Net Model for Image Segmentation
Introduction
In this tutorial, we will use a trained U-Net model to perform inference on test images from the Oxford-IIIT Pet Dataset.
The goal is to load the saved model, run predictions on unseen test data, and generate segmentation masks that separate the pet from the background and borders.
We will visualize results using OpenCV, extract objects from images using masks, and learn how to apply post-processing to refine the segmentation results.
Loading the Trained U-Net Model and Test Data
We start by loading the saved U-Net model (.h5
file) and the preprocessed test images and masks.
### Import required libraries import numpy as np import tensorflow as tf import cv2 ### Define the path to the best trained model best_model_file="e:/temp/Animals-Unet.h5" ### Load the trained U-Net model from file model = tf.keras.models.load_model(best_model_file) ### Print the model summary to confirm successful loading print(model.summary()) ### Define image size and number of segmentation categories Height = 128 Width= 128 NumOfCategories = 3 ### Load preprocessed test images and masks from .npy files allTestImagesNP = np.load("e:/temp/Unet-Animals-test-images.npy") maskTestImagesNP = np.load("e:/temp/Unet-Animals-test-mask.npy") ### Adjust mask values from range 1–3 to 0–2 maskTestImagesNP = maskTestImagesNP -1 ### Import utility for categorical encoding from keras.utils import np_utils ### Convert test images to categorical format (for consistency) maskImagesForModel = np_utils.to_categorical(allTestImagesNP,num_classes=NumOfCategories) ### Convert data type from float to integer maskImagesForModel = maskImagesForModel.astype(int) ### Print shapes of image and mask arrays print("Shapes : ") print(allTestImagesNP.shape) print(maskTestImagesNP.shape)
Link for the full code here : https://ko-fi.com/s/a88e66f66b
Running Prediction on a Test Image
Next, we select one test image, prepare it for the model, and run a prediction.
The model outputs a probability mask for each pixel, which we then process.
### Select the 5th test image for prediction img = allTestImagesNP[4] ### Add batch dimension before feeding image to the model imgForModel = np.expand_dims(img, axis=0) ### Run prediction using the U-Net model p = model.predict(imgForModel) print(p) ### Extract the predicted mask for the image resultMask = p[0] ### Print mask shape to confirm 3 channels (3 categories) print(resultMask.shape) ### Reduce the mask to a single-channel image by taking the argmax resultMask = np.argmax(resultMask, axis= -1) ### Print shape after reduction print ("Result after aregmax axis -1 :") print(resultMask.shape) ### Add an extra dimension back to the mask resultMask = np.expand_dims(resultMask , axis=-1) print("result after expand dims -1") print(resultMask.shape) ### Scale values to 0–255 for visualization resultMask = resultMask * (255 / NumOfCategories) ### Convert mask to unsigned integer type resultMask = resultMask.astype(np.uint8) ### Resize mask to 16x16 for quick visualization x = cv2.resize(resultMask, (16,16), interpolation=cv2.INTER_NEAREST) print(x)
Link for the full code here : https://ko-fi.com/s/a88e66f66b
Visualizing the Predicted Mask
We now visualize the results alongside the original image.
We convert the mask into a displayable format and prepare it for further processing.
### Convert single-channel mask into 3-channel for display predictedMakImg = np.concatenate([resultMask, resultMask, resultMask], axis=2) ### Display the original image and predicted mask cv2.imshow("original image ", img) cv2.imshow("Predicted mask ", predictedMakImg) ### Convert predicted mask into grayscale gray = predictedMakImg.copy() gray = cv2.cvtColor(gray , cv2.COLOR_BGR2GRAY) print("Gray Shape", gray.shape) ### Find unique values inside the grayscale mask unique_vals = np.unique(gray) print("Unique : ", unique_vals.shape)
Link for the full code here : https://ko-fi.com/s/a88e66f66b
Extracting the Object from the Image
Finally, we refine the predicted mask, convert categories into black and white, and apply it to the original image to extract only the object.
### Convert object and border values to white color gray[gray == 170] = 255 gray[gray == 0] = 255 ### Convert all other values to black gray[gray == 85] = 0 ### Display the refined grayscale mask cv2.imshow("Gray", gray) ### Apply the mask to extract the object from the original image masked_img = cv2.bitwise_and(img, img, mask=gray) ### Resize the masked image for better visualization masked_img = cv2.resize(masked_img, (256,256)) ### Show the masked image cv2.imshow("masked_img", masked_img) ### Wait for key press to close windows cv2.waitKey(0)
Link for the full code here : https://ko-fi.com/s/a88e66f66b
Connect :
☕ Buy me a coffee — https://ko-fi.com/eranfeit
🖥️ Email : feitgemel@gmail.com
🤝 Fiverr : https://www.fiverr.com/s/mB3Pbb
Enjoy,
Eran