Last Updated on 15/11/2025 by Eran Feit
Getting started with Detectron2 custom dataset training
When you move beyond simple demos and pre-trained models, detectron2 custom dataset training is what turns object detection into a real-world tool. Instead of relying only on COCO classes, you teach Detectron2 to recognize the exact objects that matter to your project – fruits on a conveyor belt, products on a shelf, medical objects, or anything else you can label. With the right annotations and configuration, Detectron2 becomes a flexible engine you can adapt to almost any domain.
At a high level, the goal is simple: prepare a custom dataset in COCO format, register it inside Detectron2, pick a model from the model zoo (like Faster R-CNN), and fine-tune it until it performs well on your specific objects. The workflow starts with careful annotation – drawing bounding boxes or polygons around your objects and exporting them as a COCO-style JSON file. Once Detectron2 “understands” the dataset structure, it can load your images, match them with labels, and begin learning how to detect each class.
During training, Detectron2 uses a backbone network to extract feature maps, a region proposal network to suggest candidate object locations, and ROI heads to classify and refine bounding boxes. You control the key hyperparameters: number of iterations, learning rate, batch size, and the number of classes that match your custom labels. With just a few lines of configuration code, you can switch from CPU training on Windows to GPU-accelerated training on Ubuntu/WSL, dramatically speeding up experiments while keeping the same dataset and model logic.
Finally, once the model has been trained on your custom dataset, you can load the saved configuration and weights to run inference on new images. At this stage, you’re no longer just running a generic “object detector” – you’re running a detector that understands your custom problem. This is where detectron2 custom dataset training really shines: you can test the model on real-world photos, visualize predictions, debug misclassifications, and continuously improve your dataset and configuration until the model reaches production-ready performance.
A friendly overview of detectron2 custom dataset training
Detectron2 custom dataset training is all about building a bridge between your raw images and a powerful object detection model that truly understands them. You start with a collection of images that represent your problem domain – for example, apples, bananas, oranges, strawberries, grapes, and lemons spread across different scenes and lighting conditions. By annotating these images and converting them into COCO format, you give Detectron2 a structured description of where each object is and which class it belongs to.
Once the dataset is annotated, you “register” it inside Detectron2 so the framework knows how to load your images and labels. This registration step connects a dataset name (like my_dataset_train or LP_train) to the image folder and COCO JSON file. From that point on, Detectron2 can iterate over your custom dataset just like it does with built-in datasets such as COCO: it can visualize samples, apply augmentations, and feed them into the training loop.

The next part of the journey is configuration. You choose a base model from the Detectron2 model zoo – for example, COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml – and then customize the configuration object. You set the number of classes to match your labels, decide on the device (CPU or GPU), tune the learning rate, batch size, and number of iterations, and define where to save the trained weights. This configuration encapsulates all the training logic, making it easy to reuse or move between environments like Windows and Ubuntu/WSL.
When training starts, Detectron2 repeatedly loops over your registered dataset, computing losses and updating the model weights so it gradually improves at detecting your custom objects. After training, you load the final weights, set an appropriate prediction threshold, and run inference on new test images. Visualizing the predictions – bounding boxes, labels, and confidence scores – helps you see how well the model learned from your custom dataset. If performance is not yet where you want it, you can refine annotations, collect more diverse images, or tweak the configuration to push your detectron2 custom dataset training to the next level.
Link for the video tutorial : https://youtu.be/MhOWCbwhaYo
Code for the tutorial here : https://eranfeit.lemonsqueezy.com/buy/c4e2e778-bd18-434f-bec5-738914090d33
or here : https://ko-fi.com/s/1fe324a978
Link for Medium users : https://medium.com/object-detection-tutorials/how-to-train-detectron2-on-custom-object-detection-data-61f67bf27b77
You can follow my blog here : https://eranfeit.net/blog/
Want to get started with Computer Vision or take your skills to the next level ?
If you’re just beginning, I recommend this step-by-step course designed to introduce you to the foundations of Computer Vision – Complete Computer Vision Bootcamp With PyTorch & TensorFlow
If you’re already experienced and looking for more advanced techniques, check out this deep-dive course – Modern Computer Vision GPT, PyTorch, Keras, OpenCV4

Walking through our Detectron2 custom dataset training code
This tutorial’s code is built as a complete, end-to-end pipeline that guides you from raw images all the way to a working Detectron2 model that understands your custom objects. The main target is to show, in a practical and reusable way, how to go from fruit photos (apples, bananas, strawberries, oranges, grapes and lemons) to a trained Faster R-CNN model that can detect these classes in new images. Instead of focusing only on theory, every step is backed by Python code you can copy, adapt, and run on your own machine.
The first part of the code focuses on dataset creation and registration. You start by annotating your images in COCO format using an online tool, then you connect those annotations to Detectron2 with register_coco_instances. This is where your custom dataset officially becomes “known” to Detectron2: the code tells the framework where the images live, where the JSON labels are stored, and what the dataset should be called for training and validation. There’s also a short visualization snippet that loads the first image and overlays the annotations, so you can visually confirm that your COCO labels are correct before wasting time on training.
The next section of the tutorial explains how to configure and launch training. Here, the code builds a configuration object using Detectron2’s get_cfg and the model zoo file COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml. You’ll see clearly where to set the number of classes, choose the device (CPU or GPU), define batch size, learning rate, and maximum iterations. The training loop itself is handled by DefaultTrainer, which takes the configuration and runs the full training process for you, while the code saves the config to disk so you can reuse it later.
Because many developers switch between Windows and Ubuntu/WSL for deep learning, the tutorial includes a dedicated section for training under WSL with GPU acceleration. The code walks through creating a Conda environment, installing the correct PyTorch and CUDA versions, cloning Detectron2, and then running the same training logic with Linux-style paths. The goal is to make it easy to train the very same Detectron2 custom dataset on a more powerful environment without rewriting your whole script.
Finally, the last part of the code demonstrates how to load the trained configuration and weights to test the model on new images. It shows how to restore the saved cfg from a pickle file, point it to the model_final.pth checkpoint, set a prediction threshold, and run inference through DefaultPredictor. You then visualize the predictions using Visualizer, draw the detected fruits on the test image, and check the predicted class names. By the end of this tutorial code, you have a full loop: annotate, register, train, and test a Detectron2 model on your own custom object detection data.
Getting started with Detectron2 custom dataset training
When you want more than COCO’s default classes, detectron2 custom dataset training is the way to go.
Instead of being limited to “person”, “car”, or “dog”, you can teach Detectron2 to understand the exact objects that matter to you, like apples, bananas, or any domain-specific items in your images.
At a high level, the goal is simple.
You prepare a custom dataset in COCO format, register it inside Detectron2, pick a model from the model zoo (for example Faster R-CNN), and fine-tune it so it recognizes your own classes.
In this tutorial we focus on a fruit dataset with six labels: Apple, Strawberry, Orange, Grapes, Banana, Lemon.
You will see how to annotate images, register the COCO JSON files, configure the model, train on both Windows and Ubuntu/WSL, and finally test the trained model on new images.
By the end of this guide, you’ll have a full detectron2 custom dataset training pipeline that you can easily reuse for other projects.
You can simply swap the image folders and class names, and keep the overall workflow exactly the same.
Walking through our Detectron2 custom dataset training code
The code in this post is designed as a complete, end-to-end tutorial.
It starts from raw fruit images and ends with a trained Faster R-CNN model that can detect those fruits in unseen photos.
First, you annotate the dataset in COCO format and verify that Detectron2 can correctly visualize your annotations.
Next, you configure the training loop, choose the number of classes, and decide whether to run on CPU (Windows) or GPU (Ubuntu/WSL).
The training section uses Detectron2’s DefaultTrainer to handle the heavy lifting.
You only need to define a clean configuration object and tell Detectron2 which datasets to use for training and validation.
Finally, you load the saved configuration and weights and run inference on a test image.
That last step closes the loop: you can actually see bounding boxes and class names drawn on a real image, proving that your detectron2 custom dataset training worked.
Getting your fruit images annotated in COCO format
Here are some sample images for annotation. You can find the dataset along with the code here .




Before Detectron2 can learn anything, it needs labeled data.
In this project we annotate fruit images (Apple, Strawberry, Orange, Grapes, Banana, Lemon) in COCO format using an online tool.
- Collect your images (for example, from Google) and place them under
Train-custom-Object-Detection-model/Fruits_for_detectron2with subfoldersTrain,Validate, andTest. - Open an annotation tool such as MakeSense.
- Upload the images from the Train folder, choose Object detection, and draw polygons around each fruit.
- Export the annotations as COCO JSON and save the file into the same
Trainfolder. - Repeat the process for the Validate folder so you have a second COCO JSON file for validation.
- Leave the Test images without annotations so you can use them later for evaluation and demos.
Here is a compact “code-style” version of those steps if you like to keep your dataset instructions next to your scripts:
# Step 1: Annotation of your images # Coco format annotation using this site: # https://www.makesense.ai/ # Dataset: images from Google or your own collection # Folder under Detectron2: # Train-custom-Object-Detection-model/Fruits_for_detectron2 # Labels: Apple, Strawberry, Orange, Grapes, Banana, Lemon # 1. Open https://www.makesense.ai/ # 2. Upload the train images # 3. Choose "Object detection" # 4. Select polygon for each fruit in each image # 5. Use "Actions" -> "Export annotations" -> choose COCO format # 6. Copy the JSON file into the Train images folder # Repeat the same flow for the Validate images. # The Test images folder can stay without annotations. Loading COCO data into Detectron2 and visualizing annotations
In this part we register the COCO datasets and confirm that Detectron2 reads the annotations correctly.
This is a crucial sanity check in any detectron2 custom dataset training workflow.
### We import Detectron2, NumPy, and OpenCV so we can work with our dataset and image data. import detectron2 import numpy as np import cv2 ### We bring in Detectron2's Visualizer and catalog helpers to manage annotations and metadata. from detectron2.utils.visualizer import Visualizer from detectron2.data import MetadataCatalog, DatasetCatalog ### We import the function that lets Detectron2 register COCO-style datasets. from detectron2.data.datasets import register_coco_instances ### We register the training dataset with its COCO JSON file and image folder. register_coco_instances( "my_dataset_train", {}, "Train-custom-Object-Detection-model/Fruits_for_detectron2/Train/labels_my-project-name_2023-12-04-07-26-09.json", "Train-custom-Object-Detection-model/Fruits_for_detectron2/Train", ) ### We register the validation dataset so Detectron2 can later evaluate our model. register_coco_instances( "my_dataset_val", {}, "Train-custom-Object-Detection-model/Fruits_for_detectron2/Validate/labels_my-project-name_2023-12-04-07-39-25.json", "Train-custom-Object-Detection-model/Fruits_for_detectron2/Validate", ) ### We pull metadata and the list of records for the training dataset from the catalogs. train_metdata = MetadataCatalog.get("my_dataset_train") train_dataset_dicts = DatasetCatalog.get("my_dataset_train") ### We do the same for the validation dataset so we can inspect or use it later. val_metdata = MetadataCatalog.get("my_dataset_val") val_dataset_dicts = DatasetCatalog.get("my_dataset_val") ### We import Matplotlib's pyplot in case we want additional plotting options. from matplotlib import pyplot as plt ### We inspect the first training record to understand its structure. first_dict = train_dataset_dicts[0] print(first_dict) ### We extract the basic information from the first record. filename = first_dict["file_name"] height = first_dict["height"] width = first_dict["width"] image_id = first_dict["image_id"] annotations = first_dict["annotations"] ### We read the image with OpenCV and prepare a Visualizer that can draw the annotations. img = cv2.imread(filename) visualizer = Visualizer(img[:, :, ::-1], metadata=train_metdata, scale=0.5) vis = visualizer.draw_dataset_dict(first_dict) ### We convert the visualized image back to RGB and keep it in a separate variable. img2 = vis.get_image() img_rgb = cv2.cvtColor(img2, cv2.COLOR_BGR2RGB) ### We display the annotated image in a window and wait for a key press before closing. cv2.imshow("img_rgb", img_rgb) cv2.waitKey(0) Summary.
At this point Detectron2 knows where your images and labels live, and you have visually confirmed that bounding boxes and polygons are drawn correctly.
If something looks wrong here, fix it before moving on to training.
Training Detectron2 on Windows using Faster R-CNN
This section shows how to configure and train Faster R-CNN on Windows, using CPU as the device.
It is a straightforward way to start detectron2 custom dataset training even if you do not yet have a GPU available.
### We import the dataset registration helper and the default trainer for Detectron2. from detectron2.data.datasets import register_coco_instances from detectron2.engine import DefaultTrainer ### We import os for file operations and pickle for saving our configuration. import os import pickle ### We point to the Faster R-CNN configuration and set a descriptive model name. config_file_path = "COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml" model_name = "COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml" ### We define the output directory, number of classes, and the device (CPU in this case). output_dir = r"My-Train-OD-Detectron2" num_classes = 6 device = "cpu" # or "cuda" on Linux ### We name the training dataset and set the paths to its images and COCO JSON file. train_dataset_name = "LP_train" train_images_path = r"Train-custom-Object-Detection-model/Fruits_for_detectron2/Train" train_json_annot_path = ( r"Train-custom-Object-Detection-model/Fruits_for_detectron2/Train/labels_my-project-name_2023-12-04-07-26-09.json" ) ### We define the validation dataset and its paths so we can monitor performance. test_dataset_name = "LP_test" test_images_path = r"Train-custom-Object-Detection-model/Fruits_for_detectron2/Validate" test_json_annot_path = ( r"Train-custom-Object-Detection-model/Fruits_for_detectron2/Validate/labels_my-project-name_2023-12-04-07-39-25.json" ) ### We register the training dataset with Detectron2 using the COCO helper. register_coco_instances( name=train_dataset_name, metadata={}, json_file=train_json_annot_path, image_root=train_images_path, ) ### We register the validation dataset in the same way. register_coco_instances( name=test_dataset_name, metadata={}, json_file=test_json_annot_path, image_root=test_images_path, ) ### We import the configuration builder and the Detectron2 model zoo helper. from detectron2.config import get_cfg from detectron2 import model_zoo ### We wrap the configuration logic in a function so we can easily reuse or tweak it. def get_train_cfg( a_config_file_path, a_model_name, a_train_dataset_name, a_test_dataset_name, a_num_classes, device, output_dir, ): ### We start from a fresh Detectron2 configuration object. cfg = get_cfg() ### We load the base Faster R-CNN configuration from the model zoo. cfg.merge_from_file(model_zoo.get_config_file(a_config_file_path)) ### We set the initial weights to the checkpoint from the model zoo. cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url(a_model_name) ### We tell Detectron2 which datasets to use for training and testing. cfg.DATASETS.TRAIN = (a_train_dataset_name,) cfg.DATASETS.TEST = (a_test_dataset_name,) ### We set the number of data loader workers to control CPU usage. cfg.DATALOADER.NUM_WORKERS = 2 ### We define images per batch, learning rate, and max iterations for training. cfg.SOLVER.IMS_PER_BATCH = 2 cfg.SOLVER.BASE_LR = 0.00025 cfg.SOLVER.MAX_ITER = 1000 ### We keep SOLVER.STEPS empty so the learning rate does not change mid-training. cfg.SOLVER.STEPS = [] ### We set the number of classes to match our fruit labels. cfg.MODEL.ROI_HEADS.NUM_CLASSES = a_num_classes ### We choose the compute device and the output directory on disk. cfg.MODEL.DEVICE = device cfg.OUTPUT_DIR = output_dir ### We return the fully configured object for use in training. return cfg ### We define the main entry point that sets up and launches training. def main(): ### We create the output directory if it does not already exist. os.makedirs(output_dir, exist_ok=True) ### We build the training configuration with all our chosen parameters. cfg = get_train_cfg( config_file_path, model_name, train_dataset_name, test_dataset_name, num_classes, device, output_dir, ) ### We define where to save the configuration object for reuse later. cfg_save_path = "My-Train-OD-Detectron2/OD_cfg.pickle" ### We serialize the configuration to disk using pickle. with open(cfg_save_path, "wb") as f: pickle.dump(cfg, f, protocol=pickle.HIGHEST_PROTOCOL) ### We create the default trainer with our configuration. trainer = DefaultTrainer(cfg) ### We tell the trainer to start from scratch and then start training. trainer.resume_or_load(resume=False) trainer.train() ### We call main only when this script is run directly. if __name__ == "__main__": main() Summary.
This block sets up a Faster R-CNN model, registers your fruit datasets, and runs training on Windows.
You also save the configuration to disk so you can reuse it later for inference or further experiments.
Training Detectron2 on Ubuntu WSL with GPU acceleration
Once the pipeline works on Windows, you may want faster training using a GPU under Ubuntu/WSL.
This part keeps the same detectron2 custom dataset training logic but switches paths and device settings for the Linux environment.
### We import the Detectron2 helpers for registering datasets and training. from detectron2.data.datasets import register_coco_instances from detectron2.engine import DefaultTrainer ### We import standard libraries for file handling and configuration persistence. import os import pickle ### Notes: these comments show how to create and prepare a WSL environment for training. # WSL - Linux # conda create -n detectorn99 python=3.9 # conda activate detectorn99 # conda install pytorch==1.13.0 torchvision==0.14.0 torchaudio==0.13.0 pytorch-cuda=11.6 -c pytorch -c nvidia # git clone https://github.com/facebookresearch/detectron2.git # cd detectron2 # python setup.py build develop # python Step4-Train-custom-Object-Detection-Ubunto-WSLL.py ### We point again to the Faster R-CNN config and model name from the model zoo. config_file_path = "COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml" model_name = "COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml" ### We choose an output directory and set the number of classes in this environment. output_dir = r"My-Train-OD-DetectRon2" num_classes = 6 ### We now use the CUDA device so the GPU accelerates training. device = "cuda" ### We configure the training dataset name and Linux-style relative paths. train_dataset_name = "LP_train" train_images_path = r"../Fruits_for_detectron2/Train" train_json_annot_path = r"../Fruits_for_detectron2/Train/labels_my-project-name_2023-12-04-07-26-09.json" ### We configure the validation dataset using the validate folder. test_dataset_name = "LP_test" test_images_path = r"../Fruits_for_detectron2/Validate" test_json_annot_path = r"../Fruits_for_detectron2/Validate/labels_my-project-name_2023-12-04-07-39-25.json" ### We register the training dataset so Detectron2 can load images and labels. register_coco_instances( name=train_dataset_name, metadata={}, json_file=train_json_annot_path, image_root=train_images_path, ) ### We register the validation dataset in exactly the same way. register_coco_instances( name=test_dataset_name, metadata={}, json_file=test_json_annot_path, image_root=test_images_path, ) ### We import the configuration utilities and the model zoo again. from detectron2.config import get_cfg from detectron2 import model_zoo ### We define a function that builds a configuration tailored for WSL GPU training. def get_train_cfg( a_config_file_path, a_model_name, a_train_dataset_name, a_test_dataset_name, a_num_classes, device, output_dir, ): ### We start from a fresh Detectron2 configuration. cfg = get_cfg() ### We load the Faster R-CNN base config from the model zoo. cfg.merge_from_file(model_zoo.get_config_file(a_config_file_path)) ### We attach the pretrained weights corresponding to that config. cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url(a_model_name) ### We point the config to our training and validation datasets. cfg.DATASETS.TRAIN = (a_train_dataset_name,) cfg.DATASETS.TEST = (a_test_dataset_name,) ### We control how many loader workers run in parallel. cfg.DATALOADER.NUM_WORKERS = 2 ### We keep the same batch size and learning rate settings as before. cfg.SOLVER.IMS_PER_BATCH = 2 cfg.SOLVER.BASE_LR = 0.00025 ### We extend the number of iterations because GPU training is usually faster. cfg.SOLVER.MAX_ITER = 3500 ### We keep steps empty so the learning rate stays constant. cfg.SOLVER.STEPS = [] ### We set the number of classes, device, and output directory. cfg.MODEL.ROI_HEADS.NUM_CLASSES = a_num_classes cfg.MODEL.DEVICE = device cfg.OUTPUT_DIR = output_dir ### We return the final configuration. return cfg ### We define the main entry point for the WSL training script. def main(): ### We create the output directory if it does not exist. os.makedirs(output_dir, exist_ok=True) ### We build the training configuration with the GPU device selected. cfg = get_train_cfg( config_file_path, model_name, train_dataset_name, test_dataset_name, num_classes, device, output_dir, ) ### We decide where to save the configuration as a pickle file for later use. cfg_save_path = "My-Train-OD-DetectRon2/OD_cfg.pickle" ### We write the configuration to disk using the highest pickle protocol. with open(cfg_save_path, "wb") as f: pickle.dump(cfg, f, protocol=pickle.HIGHEST_PROTOCOL) ### We create and launch the default trainer just like we did on Windows. trainer = DefaultTrainer(cfg) trainer.resume_or_load(resume=False) trainer.train() ### Optionally, we could call trainer.test() here to run evaluation after training. # trainer.test(ckpt=None) ### We run the main function if this script is executed directly. if __name__ == "__main__": main() # After training, you can copy the My-Train-OD-DetectRon2 folder back to Windows. Summary.
This part mirrors the Windows training script but runs on a CUDA device inside WSL, which greatly speeds up training.
You keep the same dataset and configuration structure, so moving between environments is easy.
Running inference with your trained Detectron2 model
With training done, the last step is to test the model on new images.
Here we load the saved configuration, attach the final weights, and visualize detections on a test photo.
### We import the predictor class and utilities to load our saved configuration. from detectron2.engine import DefaultPredictor import os import pickle ### We load the configuration we saved earlier from the pickle file. cfg_save_path = "My-Train-OD-Detectron2/OD_cfg.pickle" with open(cfg_save_path, "rb") as f: cfg = pickle.load(f) ### We set the output directory and point the config to the final trained weights. output_dir = r"My-Train-OD-Detectron2" cfg.MODEL.WEIGHTS = os.path.join(output_dir, "model_final.pth") ### We choose a detection confidence threshold and run inference on CPU. cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.4 cfg.MODEL.DEVICE = "cpu" ### We create the Detectron2 predictor object using our configured model. predictor = DefaultPredictor(cfg) ### We define the list of class names in the same order used during training. CLASSES = ["Apple", "Strawberry", "Orange", "Grapes", "Banana", "Lemon"] ### We point to a test image that was not used for training or validation. # image_path = "Train-custom-Object-Detection-model/Fruits_for_detectron2/Test/pexels-pixabay-70746.jpg" image_path = ( "Train-custom-Object-Detection-model/Fruits_for_detectron2/Test/apples-vs-bananas.jpg" ) ### We import OpenCV, NumPy, and the Visualizer to draw predictions. import cv2 import numpy as np from detectron2.utils.visualizer import Visualizer ### We read the test image and run the predictor to obtain outputs. im = cv2.imread(image_path) outputs = predictor(im) ### We print the full outputs dictionary to inspect raw predictions. print("=========") print(outputs) print("=========") ### We extract the predicted class indices as a CPU tensor. pred_classes = outputs["instances"].pred_classes.cpu() print("pred_classes : ") print(pred_classes) ### We convert the tensor of class indices to a NumPy array. pred_classes = pred_classes.numpy() print("pred_classes as Numpy : ") print(pred_classes) ### We check how many predicted instances we have. flag = np.size(pred_classes) print("Flag : ") print(flag) ### If there is at least one prediction, we decode and visualize the first class. if flag > 0: ### We take the first predicted class index. pred_classes = pred_classes[0] print("pred_classes : ") print(pred_classes) ### We print the human-readable class name using our CLASSES list. print(CLASSES[pred_classes]) ### We convert the original image to RGB for visualization. img_RGB = cv2.cvtColor(im, cv2.COLOR_BGR2RGB) ### We create a Visualizer and draw the instance predictions on top of the image. v = Visualizer(img_RGB, metadata={}, scale=0.4) v = v.draw_instance_predictions(outputs["instances"].to("cpu")) img_bgr = cv2.cvtColor(v.get_image(), cv2.COLOR_RGB2BGR) ### We show the result in a window and wait for a key press. cv2.imshow("v", img_bgr) cv2.waitKey(0) else: ### If there are no predictions, we print a simple message. print("Pred class is empty") ### We close all OpenCV windows at the end of the script. cv2.destroyAllWindows() Summary.
This script proves your model works on real images by displaying bounding boxes and class labels for detected fruits.
If you see no detections, you can revisit the training configuration, data quality, or threshold settings to improve results.
FAQ :
What does Detectron2 custom dataset training mean?
It means registering your own labeled images as a dataset and fine-tuning a Detectron2 model so it can detect your custom object classes.
Do I need a COCO-style JSON file for my dataset?
Using COCO-style JSON files is highly recommended because Detectron2 provides built-in helpers like register_coco_instances that expect this structure.
Which model should I start with for custom object detection?
Faster R-CNN with a ResNet-50 backbone is a solid starting point because it balances accuracy and training time for many custom detection tasks.
Can I train a Detectron2 model on CPU only?
Yes, you can train on CPU, but it will be slower; if possible, move to a GPU-enabled Ubuntu or WSL setup for faster experiments.
Why do my Detectron2 predictions look empty?
Empty predictions usually mean the score threshold is too high, the model is undertrained, or the classes in your CLASSES list do not match the labels in the JSON file.
How do I change the number of classes in my model?
Update the categories in your COCO JSON, set cfg.MODEL.ROI_HEADS.NUM_CLASSES to the new value, and retrain so the classifier head matches your dataset.
What learning rate should I use for Detectron2 training?
A base learning rate around 0.00025 is a good starting point for fine-tuning; you can adjust it up or down based on loss curves and model stability.
How many iterations are enough for training?
For small custom datasets, 1,000–3,500 iterations often work well, but you should watch the validation loss and stop when it stops improving.
Can I reuse the same code for a different dataset?
Yes, you can reuse the same scripts by changing the image paths, JSON files, and class list, keeping the rest of the Detectron2 configuration identical.
How do I deploy a trained Detectron2 model?
You can load the trained weights and config in a lightweight prediction script or wrap them in a web API, then send images to the model for inference in production.
Conclusion — from raw fruit images to a custom Detectron2 model
Training Detectron2 on custom object detection data may seem complex at first, but once you break it into clear steps the process becomes very manageable.
You start by annotating images in COCO format, making sure every object you care about is labeled consistently and stored in clean train, validation, and test splits.
Next, you register those datasets in Detectron2 and visualize a few samples to confirm that your boxes and polygons look correct.
This small sanity check saves a lot of time later, because most training problems come from dataset or label issues rather than from the model itself.
With the data confirmed, you configure Faster R-CNN, choose your number of classes, and decide whether to run on CPU (Windows) or GPU (Ubuntu/WSL).
The same configuration pattern works across both environments, so you can prototype locally and then move to a more powerful machine without rewriting your code.
Finally, you load the saved configuration and weights, run inference on new images, and actually see your custom classes detected in the wild.
At that point you have a complete detectron2 custom dataset training pipeline that you can reuse for new datasets, tune with better hyperparameters, or extend into more advanced tasks like instance or panoptic segmentation.
Connect :
☕ Buy me a coffee — https://ko-fi.com/eranfeit
🖥️ Email : feitgemel@gmail.com
🤝 Fiverr : https://www.fiverr.com/s/mB3Pbb
Enjoy,
Eran
