How to Train and Detect Objects with Pixellib and TensorFlow

/ TensorFlow tutorials

Introduction to Pixelib

Pixelib is a lightweight, open-source Python library that makes image and video segmentation simple, fast, and practical—even if you’re just getting started.
In this tutorial, you’ll learn how to use Pixelib to run state-of-the-art segmentation with just a few lines of code, apply pre-trained models, and adapt workflows for real projects like background removal, object highlighting, and dataset preparation.

We’ll walk through installing Pixelib, loading models, processing images and videos, and sharing tips for reliable results on everyday hardware.
By the end, you’ll have a clean, repeatable pipeline powered by Pixelib that you can reuse in your own computer vision tasks and tutorials.

The link for the video : https://youtu.be/i9MEXrLtFOQ&list=UULFTiWJJhaH6BviSWKLJUM9sg

Link for the full code : https://ko-fi.com/s/992c1e2498

Link for my blog post : https://eranfeit.net/blog/

Here is the code :

Part 1 – Using Pixelib to Load a Custom Dataset and Visualize a Sample

This section shows a minimal, working snippet that validates your dataset before training.
We load the dataset directory prepared from labelme JSON files and display a quick sample to confirm classes and masks look correct.
It is a fast feedback loop that saves time and reduces training mistakes.
It also helps ensure your class labels and folder structure are aligned with Pixelib’s expectations.

Introduction :

Data validation is the first step toward a stable training run.
A quick visualization confirms the annotations, paths, and class mappings without running a full training job.
If anything looks off, you can fix it early and avoid wasted epochs and compute.
This simple check becomes a habit that improves every segmentation project.

Description :

A lean pipeline drives better iteration speed in computer vision projects.
By loading your custom dataset with Pixelib and visualizing a sample image, you gain immediate insight into annotation quality.
You can spot misaligned masks, incorrect class names, or directory errors in seconds.
This early signal helps you keep your data pipeline healthy and repeatable.

Integrating labelme makes annotation straightforward and flexible.
Each image produces a JSON file that describes polygons, labels, and shapes.
Pixelib can interpret those JSON files when organized under a consistent dataset directory.
This compatibility reduces glue code and lets you move faster from labeling to training.

Visualization is also a communication tool for your team or your audience.
A single rendered image with overlays demonstrates that your classes are detected and masks are drawn as intended.
It is easier to share, document, and debug with concrete visuals instead of only logs.
Clear visuals also make tutorials and reports more persuasive and easier to follow.

Finally, this step lays the groundwork for repeatable experiments.
Once the dataset loads correctly and the sample view looks right, you can scale to training with confidence.
You can change classes, add images, or refine labels while keeping the same validation step.
This rhythm keeps your project stable as it grows.

# first we have to label the Banana / Apple / Tomato in the images # we will use lables me  # pip install pyqt5 # pip install labelme  # after labeling the images. lets test it. # Each image has a json file   ### Import the core Pixelib package to access custom training utilities.  import pixellib  ### Pull in the instance_custom_training helper which manages data loading and training configuration.  from pixellib.custom_train import instance_custom_training  ### Create a training/visualization object that will handle dataset operations for instance segmentation.  vis_img = instance_custom_training()  ### Point to the dataset directory that contains images and their corresponding labelme JSON files.  vis_img.load_dataset("Object-Detection\\Pixellib\\customModel")  ### Render one labeled image to quickly confirm classes, masks, and colors look correct.  vis_img.visualize_sample()

Link for the full code : https://ko-fi.com/s/992c1e2498

part 2 – Training a Custom Mask R-CNN with Pixelib (Single Part)

Introduction :

This part focuses on the essentials of training: configuring the model, loading weights, attaching your dataset, and launching the training loop with augmentation.
You will also learn practical tips for batch sizing and epochs so the same code scales from a modest desktop GPU to cloud notebooks.
With these steps, you can turn raw labelme annotations into a working instance segmentation model.
The structure keeps your pipeline maintainable, fast to iterate, and easy to re-use across projects.

Description :

A strong training setup starts with clear class definitions and a suitable backbone.
Pixelib’s Mask R-CNN wrapper lets you choose ResNet101 for deeper feature extraction or ResNet50 for lighter compute, while num_classes aligns the head with your dataset.
This clarity prevents mismatches and ensures gradients flow to the right heads during training.
It also makes future experiments straightforward when expanding classes or switching architectures.

Pretrained COCO weights accelerate convergence and improve stability on small datasets.
Instead of learning features from scratch, the network begins with rich, general visual representations.
You can keep epochs moderate, monitor validation loss, and only extend training if the curve continues to improve.
This approach saves time and compute while maintaining quality.

Batch size is the most common lever for fitting models on consumer GPUs.
If memory is tight, set batch_size=1; if you have more headroom, try batch_size=2.
The key is to avoid out-of-memory errors while keeping the GPU busy.
Pair this with augmentation to increase data diversity and robustness, especially for small datasets.

Organization pays off when saving trained models.
By writing checkpoints to a stable path, you can compare multiple runs, track validation metrics, and deploy the best artifact later.
This also helps when you revisit a project months later—your models, configs, and results remain tidy and discoverable.
Combined with clear comments and naming, this creates a professional, production-ready workflow.

# first we have to label the Banana / Apple / Tomato in the images # we will use lables me  # pip install pyqt5 # pip install labelme  # after labeling the images. lets test it. #Each image has a json file   ### Import Pixelib to access its training utilities for instance segmentation. import pixellib  ### Import the custom training helper, which wraps Mask R-CNN configuration and training steps. from pixellib.custom_train import instance_custom_training  ### Initialize a training object that will manage configuration, data loading, and training. train_maskRcnn = instance_custom_training()  # num_classes=3 since we have 3 classes : Banana , Apple , Tomato ### Configure the model: choose the backbone, number of classes, and batch size to fit your GPU memory. train_maskRcnn.modelConfig(network_backbone="resnet101", num_classes=3, batch_size=1)  #https://github.com/matterport/Mask_RCNN/releases # you can download here the 2.0 version for the model  ### Load COCO pretrained weights to speed up convergence and stabilize early training. train_maskRcnn.load_pretrained_model("c:/models/mask_rcnn_coco.h5")  ### Attach the labeled dataset (images + labelme JSON files) so the trainer can build loaders and class maps. train_maskRcnn.load_dataset("Object-Detection/Pixellib/customModel")  #Note: The batch_sizes given are samples used for google colab.  # If you are using a less powerful GPU, reduce your batch size,  # for example a PC with a 4G RAM GPU you should use a batch size of 1 for both resnet50 or resnet101.  # I used a batch size of 1 to train my model on my PC’s GPU,  # train for less than 100 epochs and it produced a validation loss of 0.263.  # This is favourable because my dataset is not large.  # A PC with a more powerful GPU you can use a batch size of 2.  # If you have a large dataset with more classes and much more images use  # google colab where you have free access to a single 12GB NVIDIA Tesla K80 GPU  # that can be used up to 12 hours continuously.  # Most importantly try and use a more powerful GPU and train for  # more epochs to produce a custom model that will perform efficiently  # across multiple classes.   # Achieve better results by training with much more images.  # 300 images for each each class is recommended to be the minimum required for  # training.  ### Launch training with augmentation, set the epoch count, and choose where to save trained models. train_maskRcnn.train_model(num_epochs=100, augmentation=True, path_trained_models="c:/models")  # these are the results model # now , We have to find the best one

Link for the full code : https://ko-fi.com/s/992c1e2498

part 3 – Evaluating and Selecting the Best Pixelib Mask R-CNN Model

Introduction :

The snippet configures training settings, attaches pretrained weights and a labeled dataset, and runs evaluation on checkpoints.
Scores are computed at IoU 0.5 to provide a simple, comparable metric across epochs.
The approach scales from evaluating a single file to scanning an entire directory of saved models.
This makes it easy to automate model selection and keep only the best artifacts.

Description :

Model evaluation ensures that your final checkpoint reflects real performance on your data.
By standardizing on a threshold like IoU 0.5, you get consistent numbers across experiments.
Pixelib’s API returns a score that you can log, compare, and track over time.
This helps you avoid subjective choices when multiple checkpoints look promising.

Using pretrained COCO weights accelerates convergence and stabilizes training.
Once the network has learned general features, your dataset fine-tunes those features for your classes.
Evaluating checkpoints across epochs reveals where the model peaks.
You can select that peak and discard weaker checkpoints to simplify deployment.

Practical configuration details matter for reproducibility.
Declaring the backbone, class count, and batch size keeps your runs consistent across machines.
Pointing evaluation to a dedicated directory of checkpoints avoids mixing in unrelated files.
This organization speeds up experimentation and reduces human error.

The results table is your guide for what to keep and what to retrain.
Low early-epoch scores are expected as the model warms up.
Mid-to-late epochs often deliver the best IoU values on validation data.
Choose the highest score, archive that artifact, and document its configuration.

# first we have to label the Banana / Apple / Tomato in the images # we will use lables me  # pip install pyqt5 # pip install labelme  # after labeling the images. lets test it. #Each image has a json file   ### Import Pixelib to access instance segmentation training and evaluation utilities. import pixellib  ### Import the custom training helper that wraps Mask R-CNN configuration, training, and evaluation. from pixellib.custom_train import instance_custom_training  ### Initialize a trainer/evaluator object that will hold configuration and dataset references. train_maskRcnn = instance_custom_training()  # num_classes=3 since we have 3 classes : Banana , Apple , Tomato ### Configure the model with a ResNet101 backbone, declare class count, and select a batch size that fits GPU memory. train_maskRcnn.modelConfig(network_backbone="resnet101",num_classes=3, batch_size=1)  #https://github.com/matterport/Mask_RCNN/releases # you can download here the 2.0 version for the model  ### Load COCO pretrained weights to improve convergence and stabilize training dynamics. train_maskRcnn.load_pretrained_model("c:/models/mask_rcnn_coco.h5")   ### Attach the dataset created from labelme JSON annotations so evaluation can build class mappings and loaders. train_maskRcnn.load_dataset("Object-Detection/Pixellib/customModel")  # The model directory has several files in this format : mask_rcnn_model.* # It is saved with the epoch number  # we would like to evaluate each model and find the best one  # lets test a specific model :  #train_maskRcnn.evaluate_model("c:/models/mask_rcnn_model.051-0.252276.h5")  # The evaluation for this epoch is :  0.636364   # we would like to evaluate all the models. # since the direcroty is not empty , I will just copy all the models to a new directory .  # lets test the result of all models  ### Evaluate every checkpoint in the given directory and print IoU scores at the default threshold (0.5). train_maskRcnn.evaluate_model("c:/models/eval")  # These are the results : # c:/models/eval\mask_rcnn_model.001-1.361029.h5 evaluation using iou_threshold 0.5 is 0.000000   # c:/models/eval\mask_rcnn_model.002-0.597196.h5 evaluation using iou_threshold 0.5 is 0.000000   # c:/models/eval\mask_rcnn_model.004-0.463875.h5 evaluation using iou_threshold 0.5 is 0.272727   # c:/models/eval\mask_rcnn_model.006-0.376810.h5 evaluation using iou_threshold 0.5 is 0.272727   # c:/models/eval\mask_rcnn_model.008-0.342451.h5 evaluation using iou_threshold 0.5 is 0.363636   # c:/models/eval\mask_rcnn_model.010-0.301472.h5 evaluation using iou_threshold 0.5 is 0.454545   # c:/models/eval\mask_rcnn_model.015-0.267621.h5 evaluation using iou_threshold 0.5 is 0.590909   # # this is the best model - since it has the high evaluate number : 0.636 # c:/models/eval\mask_rcnn_model.051-0.252276.h5 evaluation using iou_threshold 0.5 is 0.636364   # mask_rcnn_model.051-0.252276.h5 #

Link for the full code : https://ko-fi.com/s/992c1e2498

part 4 Running Pixelib Inference on a Trained Mask R-CNN

Introduction :

This part shows how to configure Pixelib for inference with your custom classes.
You will load the best checkpoint from training and generate segmented outputs for sample images.
Bounding boxes will be drawn on the results for easier inspection.
OpenCV will display the images so you can confirm classes and masks at a glance.

Description:

Inference begins with a consistent class schema.
You must declare the background class first, followed by your object classes.
Matching num_classes with your model head ensures predictions map to the correct labels.
This alignment avoids off-by-one errors and mislabeled results.

Next, you will load the checkpoint that scored best during evaluation.
Using your “winning” model increases precision and stability versus earlier epochs.
Pixelib’s load_model attaches the weights and prepares the network for forward passes.
No retraining or extra configuration is required at this stage.

The segmentImage method performs the full prediction pass and writes the output file.
Turning on show_bboxes helps you validate object localization while masks show precise extents.
Saving images to a known path makes it easy to archive predictions and compare runs.
You can script multiple calls to process folders or batch evaluate images.

Finally, you will visualize results with OpenCV.
Reading the saved file and opening a preview window provides instant feedback.
This tight feedback loop helps you spot misclassifications or missing detections early.
You can iterate on thresholds, augmentation, or data quality and re-check in seconds.

### Import Pixelib core so we can access custom instance segmentation utilities for inference. import pixellib  ### Import OpenCV to read the saved outputs and preview them in display windows. import cv2  ### Import the custom_segmentation class which offers easy Mask R-CNN inference on custom models. from pixellib.instance import custom_segmentation   ### Create a segmentation object that will hold configuration and run inference. segment_image = custom_segmentation()  ### Configure inference: set number of classes and the ordered class names including background first. segment_image.inferConfig(num_classes=3, class_names=["BG","Banana","Apple","Tomato"] )  # BG - refers to the background image , and basicly it is like the default # it is the first class and must be declared along with the names of the classes  # num_classes = the number of detected classes - we have in the demo 3 classes  #Class names - list of the classes  ### Load the best-performing trained checkpoint so we can run accurate predictions. segment_image.load_model("c:/models/eval/mask_rcnn_model.051-0.252276.h5")   ### Run inference on the first test image, request bounding boxes, and save the visualized output to disk. segment_image.segmentImage(     "C:/Python-Code/ObjectDetection/PixelLib/AppleTestImage.jpg",     show_bboxes=True,     output_image_name="C:/Python-Code/ObjectDetection/PixelLib/AppleTestImageOut.jpg" )  ### Read the saved output with OpenCV to display it in a preview window. outImage1 = cv2.imread("C:/Python-Code/ObjectDetection/PixelLib/AppleTestImageOut.jpg") ### Show the preview window to verify masks and boxes for the first image. cv2.imshow('outImage1', outImage1)  ### Run inference on the second test image and write the visualized result to disk. segment_image.segmentImage(     "C:/Python-Code/ObjectDetection/PixelLib/bananaTestImage.jpg",     show_bboxes=True,     output_image_name="C:/Python-Code/ObjectDetection/PixelLib/bananaTestImageOut.jpg" )  ### Read the second output file to prepare for display. outImage2 = cv2.imread("C:/Python-Code/ObjectDetection/PixelLib/bananaTestImageOut.jpg") ### Show the second preview window so you can compare predictions across images. cv2.imshow('outImage2', outImage2)  ### Keep the OpenCV windows open until a key is pressed. cv2.waitKey(0)

Link for the full code : https://ko-fi.com/s/992c1e2498

Part 5 – Segment and Extract Multiple Objects from One Image with Pixelib

Introduction :

This part shows how to configure Pixelib for inference, run segmentation on an image with several objects, and export each detected object as its own image.
You will define the background and custom classes, load your trained checkpoint, and enable object extraction during segmentation.
Then you will iterate over the returned array of segmented objects, display them with OpenCV, and confirm the pipeline works end-to-end.
The result is a repeatable object extraction flow you can plug into curation, labeling, and production systems.

Description :

Object extraction with Pixelib starts with a consistent class schema.
Declaring the background class first and listing all object classes keeps inference aligned with the model head.
Matching the num_classes argument to your dataset ensures label mapping is correct.
This consistency prevents mismatches and simplifies debugging.

A strong inference setup builds on your best checkpoint.
Loading a trained Mask R-CNN weights file allows Pixelib to run precise forward passes on custom categories such as Banana, Apple, and Tomato.
Because the model has already learned from your dataset, segmentation quality is high, and object extraction becomes reliable.
This approach saves time compared to ad-hoc cropping or classical image processing.

Pixelib’s segmentImage supports flags to draw bounding boxes, extract segmented objects, and save them automatically.
When extract_segmented_objects=True and save_extracted_objects=True are set, Pixelib returns an in-memory collection of crops and writes them to disk.
This dual output is ideal for both quick previews and downstream pipelines.
You gain reusable assets without writing custom masking and cropping code.

OpenCV provides instant visual validation.
Reading the saved outputs and showing them in display windows helps confirm class coverage and mask quality.
This feedback loop is essential when you iterate on data balance, augmentation, or thresholds.
With a single rerun, you can see whether changes improved object extraction fidelity.

### Import Pixelib to access custom instance segmentation utilities for inference and object extraction. import pixellib  ### Import OpenCV to read and display images for quick visual validation. import cv2  ### Import NumPy to handle arrays returned for extracted objects. import numpy as np  ### Import the custom_segmentation interface that wraps Mask R-CNN inference for custom models. from pixellib.instance import custom_segmentation   ### Create the segmentation object that will hold configuration and run predictions. segment_image = custom_segmentation()  ### Configure inference: choose backbone, set the number of classes, and order class names with BG first. segment_image.inferConfig(network_backbone="resnet101", num_classes=3, class_names=["BG","Banana","Apple","Tomato"])  ### Load the trained Mask R-CNN checkpoint that performed best during evaluation. segment_image.load_model("c:/models/eval/mask_rcnn_model.032-0.200773.h5")   ### Run segmentation on an image with several objects, draw boxes, and enable automatic object extraction and saving. segmask, output = segment_image.segmentImage(     "C:/GitHub/Object-Detection/Pixellib/moreThanOneApple.jpg",     show_bboxes=True,     extract_segmented_objects=True,     save_extracted_objects=True,     output_image_name="C:/GitHub/Object-Detection/Pixellib/moreThanOneAppleOut.jpg" )  ### Retrieve the list of in-memory cropped objects from the segmentation response. res = segmask["extracted_objects"]  ### Initialize a simple counter to label OpenCV windows for each extracted object. title = 0  ### Loop over all extracted object images, convert to uint8, and display each in its own window. for img in res:     title = title + 1     imageArr = np.uint8(img)     cv2.imshow(str(title), imageArr)  ### Keep the display windows open until a key is pressed, then cleanly close them. cv2.waitKey(0) cv2.destroyAllWindows()

Link for the full code : https://ko-fi.com/s/992c1e2498

Part 6 – Running Live Camera Segmentation with Pixelib

Introduction :

This section configures Pixelib for inference, opens a live camera stream with OpenCV, and segments each frame in real time.
You will define class names (including background), load a trained checkpoint, and call segmentFrame inside a loop.
OpenCV handles capture and display so you can watch detections update as you move objects in front of the camera.
Press q to stop the stream and close the windows.

Description :

Real-time pipelines begin with consistent model configuration.
Declaring num_classes and the ordered class names (with background first) ensures predictions map correctly to your labels.
Using the same settings you trained with avoids class index mismatches and mislabeled overlays.
This alignment is crucial when moving from offline images to live streams.

Loading your best checkpoint maximizes accuracy without extra tuning.
Pixelib’s load_model attaches the weights and prepares the network for inference on incoming frames.
Because your model has already been evaluated and selected, you can expect stable detections during live playback.
Any future improvements—more data, longer training, or different backbones—can be swapped in by changing the weights path.

The heart of live inference is the capture-process-display loop.
OpenCV’s VideoCapture(0) opens the default webcam and returns frames continuously.
Each frame is passed into segmentFrame, which returns masks, boxes, and annotated output if requested.
Displaying the frame provides instant visual feedback for data collection, demos, or quick QA.

Finally, usability matters for rapid testing.
A simple keybind to quit keeps your workflow smooth.
You can add FPS counters, confidence thresholds, or saving logic later without changing the overall structure.
This compact loop is a solid foundation for real-world integrations like kiosks, scanners, or robotics.

### Import Pixelib for instance segmentation utilities used during live inference. import pixellib  ### Import OpenCV to capture frames from the webcam and display results in windows. import cv2  ### Import NumPy for array handling (useful for post-processing or debugging). import numpy as np  ### Pull in the Pixelib custom_segmentation interface for Mask R-CNN inference on custom models. from pixellib.instance import custom_segmentation   ### Initialize the segmentation object that will hold configuration and run live predictions. segment_image = custom_segmentation()  ### Configure inference: backbone, number of classes, and ordered class names with background first. segment_image.inferConfig(network_backbone="resnet101", num_classes=3, class_names=["BG","Banana","Apple","Tomato"])  ### Load the trained checkpoint selected during evaluation to ensure accurate live predictions. segment_image.load_model("c:/models/eval/mask_rcnn_model.032-0.200773.h5")   ### Open the default camera (index 0) to start capturing frames for real-time processing. capture = cv2.VideoCapture(0)  ### Read frames in a loop, segment each frame, and show the result until 'q' is pressed. while True:     ### Grab a frame from the webcam. 'ret' indicates success, 'frame' is the image.     ret, frame = capture.read()      ### Run segmentation on the current frame with bounding boxes and optional object extraction.     segmask, out = segment_image.segmentFrame(         frame,         show_bboxes=True,         extract_segmented_objects=True,         save_extracted_objects=True,         text_thickness=1,         text_size=0.6,         box_thickness=2,         verbose=None     )      ### Display the original frame (or you can switch to 'out' if you want overlays).     cv2.imshow("frame", frame)      ### Exit the loop when 'q' is pressed to stop the live stream cleanly.     if cv2.waitKey(25) & 0xff == ord('q'):         break  ### Close any OpenCV windows that were opened during display. cv2.destroyAllWindows()

Link for the full code : https://ko-fi.com/s/992c1e2498

Connect :

☕ Buy me a coffee — https://ko-fi.com/eranfeit

🖥️ Email : feitgemel@gmail.com

🌐 https://eranfeit.net

🤝 Fiverr : https://www.fiverr.com/s/mB3Pbb

Enjoy,

Eran

error: Content is protected !!