Detectron2 custom dataset Training Made Easy

Leave a Comment / Image Segmentation

Last Updated on 17/11/2025 by Eran Feit

Detectron2 custom dataset training means taking your own images (not COCO), labeling them with polygon masks, registering them in Detectron2, and fine-tuning Mask R-CNN so it can detect and segment your specific objects.
In this tutorial, we’ll walk through that full process using a fruit dataset (apples, bananas, grapes, strawberries, oranges, lemons): annotation, COCO export, dataset registration, training on Windows CPU and Ubuntu/WSL GPU, and finally inference on new test images.
By the end, you’ll have a working instance segmentation model that was trained on your data, not a generic dataset — and you’ll actually see it draw masks around your objects.

In this tutorial, we’ll walk through a complete pipeline using a fruit dataset: apples, bananas, strawberries, grapes, oranges, and lemons.
We’ll annotate the images with polygons, export COCO-style labels, register that dataset in Detectron2 with register_coco_instances, and confirm the masks render correctly.
Then we’ll train a Mask R-CNN model in two setups — Windows on CPU and Ubuntu/WSL on GPU — so you can start simple and then scale up.

Finally, we’ll load the trained weights, run inference on fresh test images, and visualize the predictions with colored instance masks.
By the end, you’ll have a working Detectron2 instance segmentation model that understands your custom dataset and can tell you exactly which fruit is which, pixel by pixel.
This answers the core question most people actually Google: “How do I train Detectron2 on my own data?” — not just “What is Detectron2?”

Along the way, we’re going to reuse high-value concepts like DatasetCatalog, MetadataCatalog, COCO annotations, and the Detectron2 model zoo.
Those are the core building blocks behind custom dataset training, and they’re what make Detectron2 so popular for real-world instance segmentation projects.

Detectron2 Architecure

Introduction: training a custom fruit detector with Detectron2 the easy way

This guide shows how to build a complete Detectron2 instance segmentation pipeline for fruit images, using Mask R-CNN to detect apples, bananas, grapes, strawberries, oranges, and lemons — and draw masks around them, not just boxes.
We’ll walk through data annotation, dataset registration, training on Windows (CPU), training on Ubuntu/WSL with GPU acceleration, and finally running inference on new test images.

o see how I run real-time object detection on live camera input, check out my Jetson Nano walkthrough: https://eranfeit.net/how-to-classify-objects-in-live-camera-using-jetson-nano/

Link for the video tutorial : https://youtu.be/JbEy4Eefy0Y

You can download the code here : https://eranfeit.lemonsqueezy.com/buy/4e044baf-d59e-46f4-bf77-c9021c6d07d6

or here : https://ko-fi.com/s/2119cfd494

You can find more tutorials in my blog : https://eranfeit.net/blog/

🚀 Want to get started with Computer Vision or take your skills to the next level ?

If you’re just beginning, I recommend this step-by-step course designed to introduce you to the foundations of Computer Vision – Complete Computer Vision Bootcamp With PyTorch & TensorFlow

If you’re already experienced and looking for more advanced techniques, check out this deep-dive course – Modern Computer Vision GPT, PyTorch, Keras, OpenCV4

Getting our fruit dataset labeled in COCO format

A Detectron2 instance segmentation project always begins with data.
In this part we prepare a fruit dataset (apples, bananas, grapes, etc.), draw polygon masks around each fruit, and export the annotations in COCO format.
COCO format is what Detectron2 expects: it stores images, object categories, masks, and bounding boxes.
We create separate folders for Train, Validate, and Test.
Train and Validate get labeled.
Test can stay unlabeled because we just want to run predictions later.

Before we can train Detectron2 instance segmentation, we need high-quality labels.
We take fruit images (apples, bananas, strawberries, grapes, oranges, lemons) and for every fruit in every image we draw a polygon mask.
Polygon masks give Mask R-CNN the shape of each object so we get pixel-accurate segmentation, not just rectangles.

We export the labels to a COCO-style JSON file.
COCO format is nice because it already matches what Detectron2’s dataset loader expects, including category IDs and segmentation polygons.
Each dataset split (Train and Validate) will have its own JSON file.

he important part for “train detectron2 custom dataset” is consistency:
The category names in your JSON must match the class list you expect at inference time.
If you spelled “Orange” during labeling, keep it “Orange” everywhere.
That’s how Detectron2 will map predicted class IDs back to human names.

We keep a third split, Test, with no labels.
We’ll later feed these images to our trained model to visualize predictions and confirm that Detectron2 instance segmentation is working.

Below is the prep workflow in a code-style block so you can copy it into your project notes or README.
This block explains each instruction line-by-line in a “### comment then command” style.

### Prepare COCO-style annotations using an online annotation tool. # Coco format annotation using an online annotation tool  ### The dataset images are fruit photos (apples, bananas, etc.) that we collected. # dataset from here : images from Google :  # Folder under Detectron2 :-> Train-custom-Object-Detection-model\Fruits_for_detectron2  ### These are the classes we want Detectron2 to learn. Labels : Apple , Strawberry, Orange, Grapes, Banana , Lemon   ### Open the annotation tool in your browser. # open the annotation tool website  ### Upload all training images so we can draw polygons around each fruit. # upload the train images  ### Choose "Object detection" / "Instance segmentation" mode. # choose Object detection   ### For each object, draw a polygon around the fruit to create a precise mask. # select polygon on the right side bottom for each item in each image  ### When you finish labeling, export the annotations in COCO format. # choose "actions" then "export annotions" and choose Coco format  ### Copy the exported JSON file into the Train folder next to the images. # copy the Json file to the train images folder  ### Repeat labeling for the validation split so we can evaluate during training. # Repeat : # Do the same with the validate images !!!  ### We do not need labels for the test split because we only want predictions later. # The test images folder can stay without annotations

Summary of this part:

We defined our fruit categories for Detectron2 instance segmentation.
We labeled Train and Validate with polygons and exported COCO JSON per split.
We kept an unlabeled Test folder for inference.
Clean, consistent labels at this step are the foundation for every good Mask R-CNN model.

If you’re interested in classic image classification (not masks), I also compare deep models in this guide: https://eranfeit.net/tensorflow-image-classification-tutorial-resnet50-vs-mobilenet /

Loading the fruit dataset into Detectron2 and previewing it

Now that we have COCO-style JSON files, we register them with Detectron2.
Registration basically tells Detectron2:
“This dataset name maps to these images and this annotation file.”
After that, we can iterate through the dataset, inspect the metadata, and visualize the first example with bounding boxes and masks.

Detectron2 exposes two helpers that make this easy: register_coco_instances and MetadataCatalog.
register_coco_instances connects a dataset name to an image directory and its COCO annotations.
MetadataCatalog then stores extra info like class names so the visualizer can draw correct labels.

We load my_dataset_train and my_dataset_val and immediately query them using DatasetCatalog.get().
That gives us a list of dictionaries, one per image, including file path, image height/width, and a list of annotations.
Each annotation has polygon data for instance segmentation.

We use OpenCV (cv2.imread) to read the first training image.
Then we call Detectron2’s Visualizer to overlay the masks and boxes from the annotations onto the image.
Seeing this preview early is critical:
If the annotations look wrong here, training will also be wrong.

This step is part of “custom object detection with detectron2,” but we are actually doing instance segmentation with Mask R-CNN.
That means the model won’t just say “banana here,” it will outline the banana.

Here’s the full data loading and visualization block.
Each command is documented with a ### explanation line immediately above it.

### Import Detectron2 so we can work with instance segmentation and training utilities. import detectron2  ### Import OpenCV for image reading and window display. import cv2   ### Import the Visualizer to draw annotations (masks, boxes, class labels). from detectron2.utils.visualizer import Visualizer  ### Import catalogs that store dataset metadata and items. from detectron2.data import MetadataCatalog, DatasetCatalog  ### Import the helper that registers COCO-style datasets. from detectron2.data.datasets import register_coco_instances   ### Register our datasets so Detectron2 knows where images and COCO JSON live. # register datasets   ### Register the training dataset with a name, empty metadata dict, the JSON file, and the images folder. register_coco_instances(     "my_dataset_train",      {},      "Train-custom-Object-Detection-model/Fruits_for_detectron2/Train/labels_my-project-name_2023-12-04-07-26-09.json",     "Train-custom-Object-Detection-model/Fruits_for_detectron2/Train" )  ### Register the validation dataset the same way. register_coco_instances(     "my_dataset_val",      {},      "Train-custom-Object-Detection-model/Fruits_for_detectron2/Validate/labels_my-project-name_2023-12-04-07-39-25.json",     "Train-custom-Object-Detection-model/Fruits_for_detectron2/Validate" )  ### Grab metadata for the train split (class names, colors, etc.). train_metedata = MetadataCatalog.get("my_dataset_train")  ### Grab the list of training image dictionaries. train_datasets_dicts = DatasetCatalog.get("my_dataset_train")  ### Grab metadata for the validation split. val_metedata = MetadataCatalog.get("my_dataset_val")  ### Grab the list of validation image dictionaries. val_datasets_dicts = DatasetCatalog.get("my_dataset_val")  ### Look at the first training sample to understand its structure. first_dict = train_datasets_dicts[0]  ### Print that first sample info (file path, annotations, size). print(first_dict)  ### Extract useful fields from the first training record. file_name = first_dict['file_name'] height = first_dict['height'] width = first_dict['width'] image_id = first_dict['image_id'] annotations = first_dict['annotations']  ### Read the actual image using OpenCV. img = cv2.imread(file_name)  ### Create a Visualizer object, telling it to use our training metadata. visual = Visualizer(img[:, :, ::-1] , metadata=train_metedata, scale = 0.5)  ### Ask the Visualizer to draw the ground-truth annotations on top of the image. vis = visual.draw_dataset_dict(first_dict)  ### Convert the visualized output to an image array we can view. img2 = vis.get_image()  ### Convert BGR to RGB for correct color when displaying in some viewers. img_rgb = cv2.cvtColor(img2, cv2.COLOR_BGR2RGB)  ### Show the preview window so we can confirm masks and boxes look correct. cv2.imshow("img_rgb", img_rgb)  ### Wait for a key press before closing the preview window. cv2.waitKey(0)

Summary of this part:

We used register_coco_instances to make Detectron2 aware of our dataset.
We previewed one training image with true masks and boxes using Visualizer.
This confirms our COCO JSON is valid and our fruit classes map correctly.
Visual sanity checks now save hours later when training.

Training the model on Windows (CPU-friendly version)

Now we actually train Detectron2 on our fruit dataset.
Here we use a Mask R-CNN backbone (ResNet-50 with FPN) from the Detectron2 model zoo, but we fine-tune it for our 6 fruit classes.
This is classic “detectron2 instance segmentation”: the model predicts bounding boxes, classes, and pixel-accurate instance masks.

We define paths, dataset names, and an output directory (My-Train-Detectron2).
We also set num_classes = 6 to match Apple, Strawberry, Orange, Grapes, Banana, Lemon.
If these numbers don’t match your dataset, training will crash.

We build a config object using get_cfg() and model_zoo.get_config_file(...).
Then we update it for our custom dataset:

Which dataset to train on
Which dataset to test/evaluate on
Batch size (IMS_PER_BATCH)
Learning rate (BASE_LR)
Training length (MAX_ITER)
Number of workers for the data loader

We set cfg.MODEL.DEVICE = "cpu" so this script can run on a normal Windows machine with no GPU.
This is slower, but it’s great for tutorials and debugging.
You can still overfit a few samples just to prove that Detectron2 training works.

We save the config to a pickle file so we can reload it later for inference.
Then we launch DefaultTrainer, which handles the training loop for us.
This is where Detectron2 learns how to segment fruit instances.

Here’s the full Windows training script, with explanations (###) above every instruction:

### Import dataset registration helper. from detectron2.data.datasets import register_coco_instances  ### Import DefaultPredictor (for inference) and DefaultTrainer (for training loops). from detectron2.engine import DefaultPredictor , DefaultTrainer  ### Import OS tools for paths and pickle for saving config. import os  import pickle  ### We'll use a model definition from the Detectron2 model zoo. # go to the model zoo to pick a config name that matches instance segmentation  ### Set which config file we'll use (Mask R-CNN with ResNet-50 and FPN). config_file_path = "COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml"  ### Name of the model we want weights from. model_name = "COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml"  ### Where to save training outputs (weights, logs, config). output_dir = "My-Train-Detectron2"  ### We have 6 fruit classes (Apple, Strawberry, Orange, Grapes, Banana, Lemon). num_classes= 6  ### On Windows we train on CPU for simplicity. device = "cpu"  ### Give friendly names to our datasets for training and validation. # Train train_dataset_name = "LP_train"  ### Path to training images. train_images_path = r"Train-custom-Object-Detection-model/Fruits_for_detectron2/Train"  ### Path to COCO annotations for training. train_json_annot_path = r"Train-custom-Object-Detection-model/Fruits_for_detectron2/Train/labels_my-project-name_2023-12-04-07-26-09.json"  ### Validation dataset details. # Validate val_dataset_name = "LP_Test"  ### Path to validation images. val_images_path = r"Train-custom-Object-Detection-model/Fruits_for_detectron2/Validate"  ### Path to COCO annotations for validation. val_json_annot_path = r"Train-custom-Object-Detection-model/Fruits_for_detectron2/Validate/labels_my-project-name_2023-12-04-07-39-25.json"  ### Register the training dataset so Detectron2 can load images and annotations. # Register the dataset #Register the train: register_coco_instances(     name = train_dataset_name,      metadata={},      json_file=train_json_annot_path,      image_root=train_images_path )  ### Register the validation dataset. #Register the Validation: register_coco_instances(     name = val_dataset_name,      metadata={},      json_file=val_json_annot_path,      image_root=val_images_path )  ### Import configuration utilities from Detectron2. from detectron2.config import get_cfg   ### Import model_zoo to load base configs and pretrained weights. from detectron2 import model_zoo  ### Define a helper that builds and customizes the Detectron2 config. def get_train_cfg (a_config_file_path , a_model_name , a_train_dataset_name , a_test_dataset_name , a_num_classes, device , output_dir):      ### Start from a default config object.     cfg = get_cfg()      ### Load a baseline Mask R-CNN config from the model zoo.     cfg.merge_from_file(model_zoo.get_config_file(a_config_file_path))      ### Use pretrained weights for faster convergence.     cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url(a_model_name)      ### Tell Detectron2 which dataset names correspond to training and testing.     cfg.DATASETS.TRAIN = (a_train_dataset_name, )     cfg.DATASETS.TEST = (a_test_dataset_name, )      ### Number of data loader workers (higher = faster if you have CPU cores).     cfg.DATALOADER.NUM_WORKERS = 2      ### How many images per iteration (batch size at the solver level).     cfg.SOLVER.IMS_PER_BATCH = 2 # how many images per batch      ### Base learning rate for the optimizer.     cfg.SOLVER.BASE_LR = 0.00025 # learning rate      ### Training length in iterations (similar idea to epochs).     cfg.SOLVER.MAX_ITER = 1000 # how many iterations (like epochs)      ### When to drop the learning rate. Empty list = don't schedule drops.     cfg.SOLVER.STEPS = [] # this will reduce the learning rate during the training       ### Number of target classes (your custom fruit classes).     cfg.MODEL.ROI_HEADS.NUM_CLASSES = a_num_classes      ### Run on CPU or GPU.     cfg.MODEL.DEVICE = device      ### Where to save checkpoints and logs.     cfg.OUTPUT_DIR = output_dir      ### Return the configured training object.     return cfg   ### main entry point for training on Windows. def main() :      ### Make sure the output directory exists.     os.makedirs(output_dir, exist_ok=True)      ### Build the training config with our dataset names and hyperparameters.     cfg = get_train_cfg(         config_file_path,          model_name,          train_dataset_name,          val_dataset_name,          num_classes ,          device,          output_dir     )      ### Path where we will save the config object for later inference.     cfg_save_path = "My-Train-Detectron2/IS_cfg.pickle" # IS -> Instance segmentation      ### Save the config using pickle so we can reload it later.     with open(cfg_save_path, 'wb') as f:         pickle.dump(cfg, f, protocol=pickle.HIGHEST_PROTOCOL) # save the cfg       ### Create a DefaultTrainer with our config.     trainer = DefaultTrainer(cfg)      ### Start training from scratch (resume=False).     trainer.resume_or_load(resume=False)      ### Launch the training loop.     trainer.train()  ### Only run main() if this file is executed directly. if __name__ == '__main__' :     main()

Summary of this part:

We built a Detectron2 config for Mask R-CNN instance segmentation.
We trained on CPU for clarity, which is great for demos and “it actually runs on my laptop.”
We saved the config to IS_cfg.pickle, which we’ll reload later for inference.
This section directly supports the keyword “train detectron2 custom dataset.”

For another segmentation approach (medical polyps instead of fruit), you can read my U-Net tutorial here: https://eranfeit.net/u-net-medical-segmentation-with-tensorflow-and-keras-polyp-segmentation/

Training on Ubuntu / WSL with GPU acceleration

Training Detectron2 instance segmentation on CPU is fine for testing, but serious training is faster on GPU.
This part shows a GPU-oriented workflow on Ubuntu / WSL, including environment setup with Conda, CUDA-enabled PyTorch, and a training loop very similar to Windows, but with device = "cuda" and a longer training schedule.

We first create and activate a Conda environment, install PyTorch with CUDA support, and build Detectron2 from source.
This gives us GPU acceleration for Mask R-CNN.
Even a mid-range GPU will massively reduce training time compared to CPU.

We again register the dataset splits (LP_train, LP_test) and point to the COCO JSON and images folders.
Note that the code uses Unix-style paths like Fruits_for_detectron2/Train.
When you run under WSL, you can access Windows files via /mnt/c/..., but here we assume the dataset is local in the Ubuntu environment.

We generate a config using get_train_cfg, similar to Windows, but with two important differences:

cfg.MODEL.DEVICE = "cuda" so Detectron2 uses the GPU.
cfg.SOLVER.MAX_ITER = 3500, which means we train longer and (usually) get better accuracy.

We save the config pickle (IS_cfg-Ubunto.pickle) and start training with DefaultTrainer.
After training finishes, we can copy the output folder (weights, logs, pickle file) back to Windows to run inference from a familiar environment.

Here’s the full Ubuntu / GPU training script in one block with explanations above each line:

### Notes for WSL / Ubuntu environment setup: #WSL - Lynux # create WSL Ubunto enviroment  ### Activate Conda and create a clean env for Detectron2 + CUDA. # open WSL in c: #conda create -n detectorn99 python=3.9 #conda activate detectorn99  ### Install a CUDA-enabled PyTorch build. #install Pytorch #conda install pytorch==1.13.0 torchvision==0.14.0 torchaudio==0.13.0 pytorch-cuda=11.6 -c pytorch -c nvidia  ### Clone Detectron2, enter its folder, and build it in develop mode. #cd detectron2 # inside the git clone folder #cd Train Folder  #python setup.py build develop  ### Run the actual training script. # run the python train code: #python Step4-Train-The-Model-Ubunto-WSL.py  ### Import dataset registration and trainer utilities. # register the dataset from detectron2.data.datasets import register_coco_instances from detectron2.engine import DefaultTrainer  ### Import OS and pickle to save config objects. import os import pickle  ### We'll still use a Mask R-CNN config from the Detectron2 model zoo (instance segmentation capable). # you can see many models in the table : "Instance segmentation"  ### Define which config and model weights to start from. # object detection  config_file_path = "COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml" model_name = "COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml"  ### Output directory for Ubuntu/GPU run. output_dir = r"My-Train-DetectRon2-Ubuntu"  ### Our dataset still has 6 fruit classes. num_classes = 6  ### We want to train on GPU here. device = "cuda"  ### Training dataset name and paths. #train train_dataset_name = "LP_train"  ### Example paths for images and annotations when running inside Ubuntu/WSL. #train_images_path = r"/mnt/c/Python-Code/Best-Object-Detection-models/Detectron2/Fruits_for_detectron2/Train" #train_json_annot_path = r"/mnt/c/Python-Code/Best-Object-Detection-models/Detectron2/Fruits_for_detectron2/Train/labels_my-project-name_2023-12-04-07-26-09.json" train_images_path = r"Fruits_for_detectron2/Train" train_json_annot_path = r"Fruits_for_detectron2/Train/labels_my-project-name_2023-12-04-07-26-09.json"  ### Validation dataset name and paths. #Validation val_dataset_name = "LP_test" val_images_path = r"Fruits_for_detectron2/Validate" val_json_annot_path = r"Fruits_for_detectron2/Validate/labels_my-project-name_2023-12-04-07-39-25.json"  ### Actually register both datasets with Detectron2. # register the dataset: # =====================  ### Register the train split. # register the train register_coco_instances(     name = train_dataset_name,      metadata={},      json_file=train_json_annot_path,      image_root=train_images_path )  ### Register the validation/test split. # register the test register_coco_instances(     name = val_dataset_name,      metadata={},      json_file=val_json_annot_path,      image_root=val_images_path )  ### Import config helpers and model zoo from Detectron2. from detectron2.config import get_cfg from detectron2 import model_zoo  ### Build a training config specialized for GPU training and longer runs. def get_train_cfg(a_config_file_path , a_model_name , a_train_dataset_name, a_test_dataset_name, a_num_classes, device, output_dir) :     ### Start with a base Detectron2 config.     cfg = get_cfg()       ### Merge in a known-good Mask R-CNN config from the model zoo.     cfg.merge_from_file(model_zoo.get_config_file(a_config_file_path))  # get the "config_file_path"          ### Use pretrained weights so training converges faster.     cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url(a_model_name)     #model_zoo.get_checkpoint_url(a_model_name)      ### Tell Detectron2 which datasets are train and test.     cfg.DATASETS.TRAIN = (a_train_dataset_name, )     cfg.DATASETS.TEST = (a_test_dataset_name, )      ### Number of worker processes for data loading.     # number of workers -> how to stress the CPU     cfg.DATALOADER.NUM_WORKERS = 2           ### Images per batch (higher uses more VRAM).     cfg.SOLVER.IMS_PER_BATCH = 2 # how many images per batch      ### Learning rate for optimization.     cfg.SOLVER.BASE_LR = 0.00025 # learning rate      ### Total iterations (similar to epochs). Longer than Windows run.     cfg.SOLVER.MAX_ITER = 3500 # how many iterations (like epohcs)  !!!!!!!!!!!!      ### Keep learning rate constant instead of stepping it down.     cfg.SOLVER.STEPS = [] # this will not reduce the learning rate during training      ### Tell the ROI heads how many classes we expect.     cfg.MODEL.ROI_HEADS.NUM_CLASSES = a_num_classes      ### Use CUDA/GPU for faster training.     cfg.MODEL.DEVICE = device      ### Where to save our checkpoints and logs.     cfg.OUTPUT_DIR = output_dir      ### Give the config back to the caller.     return cfg  ### main() will launch training on Ubuntu / WSL with GPU. def main():          ### Make sure the output directory exists.     os.makedirs(output_dir, exist_ok=True)          ### Build a training config with our dataset names, class count, and GPU device.     cfg = get_train_cfg(         config_file_path,          model_name,          train_dataset_name,          val_dataset_name,          num_classes,          device,          output_dir     )      ### Save the config so we can reuse it for inference later.     cfg_save_path = "My-Train-DetectRon2-Ubuntu/IS_cfg-Ubunto.pickle" #IS -> Instance segmentation      ### Dump the config object to disk using pickle.     with open(cfg_save_path, 'wb') as f:         pickle.dump(cfg , f, protocol=pickle.HIGHEST_PROTOCOL) # this will save our cfg      ### Create a Detectron2 DefaultTrainer with our GPU config.     trainer = DefaultTrainer(cfg)      ### Load from scratch and start training.     trainer.resume_or_load(resume=False)     trainer.train()      ### (Optional) you can test after training using the final checkpoint.     # After training, add a testing step     #trainer.test(ckpt=None)  # Use the last checkpoint by default   ### Standard Python pattern to run main(). if __name__ == '__main__' :     main()  ### After training finishes, copy the output directory back to Windows ### so you can run inference there with the saved weights. # After train : #Copy the My-Train-DetectRon2 folder to the Windows enviroment to continute to the test new image

Summary of this part:

We created a GPU-ready Detectron2 training workflow in Ubuntu / WSL.
We trained Mask R-CNN for more iterations on cuda, which improves accuracy and speed.
We saved a config pickle (IS_cfg-Ubunto.pickle) we’ll reuse for inference.
The trained weights (model_final.pth) are now sitting in the output folder and ready for testing.
This directly supports “mask r-cnn detectron2 tutorial” and “train detectron2 custom dataset.”

I also cover how to run object detection on recorded video streams using Python and OpenCV here: https://eranfeit.net/how-to-classify-objects-in-videos-using-jetson-nano-opencv-python/

Testing the trained Detectron2 model on new images

This final technical step loads the saved config, points it to the final trained weights, runs inference on new test images, and displays the predictions with colored masks and labels.
This is where Detectron2 proves that instance segmentation is working on fruit it has never seen before.

We reload the pickle file we saved during training (the config).
We update that config so it knows where to find the trained weights (model_final.pth) and we set a confidence threshold.

We create a DefaultPredictor, which is a simple wrapper that:

Preprocesses the image
Runs the model
Returns predicted classes, boxes, masks, and scores

We define the class names list CLASSES.
This list should match the order Detectron2 used internally (the same order as in training).
We’ll print the first predicted class ID and map it to a human-readable label, like “Banana.”

We use Visualizer again, but now we draw predicted instance masks instead of ground truth.
We display them with OpenCV so we can see how well the model segments fruit in our test images.

Here is the full inference / testing script, annotated:

### Import DefaultPredictor to run inference with our trained config. from detectron2.engine import DefaultPredictor   ### OS and pickle for loading saved config objects and paths. import os  import pickle   ### Load the saved Detectron2 config from training. cfg_saved_path = "My-Train-Detectron2/IS_cfg.pickle" # IS -> Instance segmentation with open (cfg_saved_path , 'rb') as f :     ### Read the config back into memory.     cfg = pickle.load(f) # get the configuration file   ### Define where our training artifacts (weights) were stored. output_dir = "My-Train-Detectron2"   ### Point the config to the final trained weights. cfg.MODEL.WEIGHTS = os.path.join(output_dir , "model_final.pth")  ### Set the confidence threshold for predictions (only show objects above 0.5). cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5 # over 50% , which object should display  ### Force inference to run on CPU in this script (you can set "cuda" if available). cfg.MODEL.DEVICE = "cpu"  ### Build a predictor object from the config. This will run forward passes for us. predictor = DefaultPredictor(cfg)  ### Class names in the same order Detectron2 learned them. CLASSES = ["Apple", "Strawberry", "Ornage" , "Grapes" , "Banana", "Lemon"]  ### Define paths of test images (unlabeled images). #path of the test images  image_path_1 = "Train-custom-Object-Detection-model/Fruits_for_detectron2/Test/apples-vs-bananas.jpg" image_path_2 = "Train-custom-Object-Detection-model/Fruits_for_detectron2/Test/pexels-pixabay-70746.jpg"  ### We'll run predictions and visualize the first test image. # show the predicitions on the test image  ### Import OpenCV for reading and displaying images, NumPy for array ops, ### and Visualizer to draw model predictions. import cv2  import numpy as np  from detectron2.utils.visualizer import Visualizer   ### Read the first test image. im = cv2.imread(image_path_1)  ### Run the trained model on this image. outputs = predictor(im)  ### Print the raw output dict so we can see boxes, masks, scores, and classes. print("=========================================") print(outputs) print("=========================================")  ### Extract predicted class IDs as a tensor and move them to CPU. pred_classes = outputs['instances'].pred_classes.cpu()  print("Pred Classes : ") print(pred_classes)  ### Convert the tensor to a NumPy array for easier handling. pred_classes = pred_classes.numpy()  ### Flag will tell us if we detected at least one object. flag = np.size(pred_classes) print("Flag :") print(flag)  ### If we found at least one object, grab the first prediction. if flag > 0 :     pred_classes = pred_classes[0] # grab the first elemnet      print("pred_classes:")     print(pred_classes)      ### Map the class ID (0,1,2,...) back to a human-readable fruit label.     print(CLASSES[pred_classes])      ### Convert the image to RGB for visualization.     img_rgb = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)      ### Create a Visualizer for predictions instead of ground truth.     v = Visualizer(img_rgb, metadata={}, scale=0.6) # init the visualizer      ### Draw the predicted instance masks and boxes.     v = v.draw_instance_predictions(outputs["instances"].to("cpu"))      ### Convert RGB back to BGR so OpenCV can show it.     img_bgr = cv2.cvtColor(v.get_image(), cv2.COLOR_RGB2BGR)      ### Display the prediction window with colored masks.     cv2.imshow("v", img_bgr)     cv2.waitKey(0) else:     ### If nothing was detected above threshold, say so.     print("Pred_classes is empty")  ### Close any OpenCV windows after we're done. cv2.destroyAllWindows()

Summary of this part:

We reloaded the trained Detectron2 config and attached our final weights.
We used DefaultPredictor to run inference and printed predicted classes.
We visualized predicted instance segmentation masks for fruit objects.
Now we have working Detectron2 instance segmentation end-to-end: dataset → training → prediction.

FAQ :

What is Detectron2 instance segmentation?

Detectron2 instance segmentation predicts a pixel mask and a class label for every object, so you know not only what each object is but also its exact shape.

Why are we using Mask R-CNN?

Mask R-CNN is a proven architecture for instance segmentation. It can detect each fruit and draw an accurate mask around it in the same forward pass.

Do I really need COCO format?

Yes. Detectron2 expects COCO-style JSON for images, categories, boxes, and masks. Using COCO format keeps training smooth and avoids custom loaders.

Can I train Detectron2 on CPU?

Yes, but it will be slower. CPU training is fine for demos and very small datasets, and it helps you debug config issues before moving to GPU.

When should I switch to GPU?

Switch to GPU when you have more data, want faster iteration, or care about accuracy. GPU training with more iterations usually gives better results.

What does MAX_ITER control?

MAX_ITER sets how long Detectron2 will train. A small value is good for testing. A larger value gives the model more time to learn your dataset.

Why do we save the config as a pickle file?

Saving the config makes inference easy later. You reload the pickle, attach final weights, and you’re instantly ready to run predictions.

How do I map class IDs back to names?

Create a CLASSES list in the same order you used for training. Then use the predicted class ID as an index to print the human-friendly label.

Can this pipeline handle other objects, not just fruit?

Yes. You can point it at any object category — tools, products, lab samples, etc. All you need is consistent labeled data and a matching class list.

Why are masks better than boxes for quality control?

Masks let you measure exact shape, size, and surface area. That’s important in tasks like fruit grading, defect detection, or counting overlapping objects.

Conclusion: what you can do next with Detectron2

We built a full Detectron2 instance segmentation workflow around fruit images.
We started with polygon annotation and COCO export, registered the dataset with register_coco_instances, and visually validated that the masks line up with the fruit.
That alone already protects you from most beginner mistakes.

We trained Mask R-CNN in two modes:
a CPU-friendly Windows flow with a short schedule (good for testing that everything runs), and a GPU-accelerated Ubuntu / WSL flow with longer training.
That second path is what you’ll want for real accuracy, especially if you’re planning to ship results or make a YouTube demo.

We then reloaded the trained config, attached the final weights, and used DefaultPredictor to run inference on brand-new test images.
We visualized predictions with colored instance masks so we can confirm “yes, Detectron2 can see and segment my apples and bananas.”

From here you can:

Add more classes (for example: “moldy banana” vs “fresh banana”).
Train longer or tweak learning rate.
Deploy the predictor into a small web app or a quality-control pipeline.
Reuse the same structure for any object category, not just fruit.

If you want a simple segmentation technique without deep learning, I also explain K-Means image segmentation in Python here: https://eranfeit.net/python-image-segmentation-made-easy-with-opencv-and-k-means-algorithm/

Connect :

☕ Buy me a coffee — https://ko-fi.com/eranfeit

🖥️ Email : feitgemel@gmail.com

🌐 https://eranfeit.net

🤝 Fiverr : https://www.fiverr.com/s/mB3Pbb

Enjoy,

Eran

Leave a Comment Cancel Reply