Last Updated on 26/10/2025 by Eran Feit
Introduction – Detectron2—what it is and why it’s useful

Detectron2 is Facebook AI Research’s modern computer-vision framework built on PyTorch.
It focuses on object detection, instance segmentation, semantic segmentation, panoptic segmentation, and keypoint detection.
Think of it as a toolkit of proven research models plus a clean training and inference engine.
You get state-of-the-art architectures, strong defaults, and a flexible way to customize your own projects.
Detectron2 instance segmentation gives you pixel-accurate masks for every object in an image, not just bounding boxes, so you can measure areas, crop precisely, and visualize scenes clearly. Built on PyTorch, Detectron2 ships with reliable Mask R-CNN configs from the model zoo, letting you go from setup to first results in minutes. You can begin on CPU for quick tests and then switch to CUDA for speed, while keeping the exact same workflow. With sensible defaults, clean APIs (e.g., DefaultPredictor and Visualizer), and easy dataset registration, Detectron2 instance segmentation is a practical choice for both beginners prototyping and teams pushing models into production.
Core ideas in plain language
Detectron2 wraps complex vision models in a consistent, PyTorch-friendly API.
It separates the “what” (model architecture and losses) from the “how” (data loading, training loops, evaluation).
This means you can swap models, datasets, and hyperparameters without rewriting your pipeline.
It’s designed for both research iteration and practical production use.
What tasks can it solve out of the box
- Object detection with models like Faster R-CNN and RetinaNet.
- Instance segmentation with Mask R-CNN and variations of Feature Pyramid Networks (FPN).
- Keypoint detection for humans and other articulated objects.
- Semantic and panoptic segmentation for scene understanding.
Each task comes with reference configs and pre-trained weights.
You can start from a working baseline and adapt as needed.
The model zoo and configurations
The “model zoo” is a curated set of YAML or LazyConfig files that define tested pipelines.
A config typically specifies the backbone, head, image sizes, augmentations, solver, and evaluation settings.
You can merge_from_file a zoo config and then override only a few lines to fit your data or hardware.
This pattern helps you stay close to well-known, reproducible setups.
Data handling and metadata
Detectron2 uses DatasetCatalog to register datasets and MetadataCatalog to store things like class names and colors.
COCO format is first-class, but you can register custom formats with a small adapter function.
Once registered, the same training and evaluation code works across datasets.
This uniformity removes a lot of glue code and reduces mistakes.
Training, evaluation, and visualization
The DefaultTrainer provides a strong baseline for training, checkpointing, and validation.
You can extend it when you need custom hooks, schedulers, or losses.
For quick inference, DefaultPredictor bundles preprocessing, model forward, and postprocessing into one call.
Prefer starting with detection before masks? Here’s my Jetson Nano real-time detection walkthrough.
Link: https://eranfeit.net/how-to-classify-objects-in-live-camera-using-jetson-nano/
Tutorial Introduction
This post walks you through Detectron2 instance segmentation using Mask R-CNN in Python.
You will learn how to prepare your environment, load an image, configure a pre-trained model from the COCO model zoo, and visualize predictions with overlays and masks.
The flow is clean, beginner-friendly, and focused on getting a correct result quickly, even on CPU.
Because our title promises an effortless and friendly path, each step explains the “why” right before the code and keeps the commands readable and copy-paste ready.
By the end, you will confidently run Detectron2 instance segmentation end-to-end, save the output, and understand how to adapt it to your own images.
The tutorial in four parts :
introduction to the code
We’ll move from imports → image prep → model config → visualization and saving.
Each block starts with a short explanation and includes small “###” notes above each command.
Keep your image local and adjust the resize percent if you’re on a slower machine.
When you’re ready for speed, swap the device to CUDA.
You can download the code here : https://ko-fi.com/s/ef766d9ce2
You can find more tutorials in my blog : https://eranfeit.net/blog/
🚀 Want to get started with Computer Vision or take your skills to the next level ?
If you’re just beginning, I recommend this step-by-step course designed to introduce you to the foundations of Computer Vision – Complete Computer Vision Bootcamp With PyTorch & TensorFlow
If you’re already experienced and looking for more advanced techniques, check out this deep-dive course – Modern Computer Vision GPT, PyTorch, Keras, OpenCV4
Getting the essentials in place
Short description: We import PyTorch, Detectron2, OpenCV, and helpers.
This makes the rest of the tutorial predictable and easy to follow.
### We import PyTorch because Detectron2 runs on top of it. import torch ### We bring in Detectron2 to access configs, models, and helpers. import detectron2 ### NumPy helps with array math used by OpenCV and visuals. import numpy as np ### OS and JSON can help with paths and simple metadata if needed later. import os import json ### OpenCV handles reading, resizing, showing, and saving images. import cv2 ### Random is handy for quick experiments or seeding. import random ### Pull standard configs and weights from the Detectron2 model zoo. from detectron2 import model_zoo ### DefaultPredictor bundles preprocessing, model forward, and postprocessing. from detectron2.engine import DefaultPredictor ### Build and customize a Detectron2 configuration. from detectron2.config import get_cfg ### Draw masks, boxes, and labels on images. from detectron2.utils.visualizer import Visualizer ### Dataset metadata for class names and colors. from detectron2.data import MetadataCatalog, DatasetCatalog Summary :
We now have everything we need to run Detectron2 instance segmentation with minimal setup.
Next, we’ll bring in an image and make it small enough for fast CPU testing.
Load and gently resize the image
Short description: We read a local image and scale it down to speed up inference on CPU while keeping the aspect ratio.

### Point to an image on disk (replace with your path if needed). imagePath = "pexels-brett-sayles-1115171.jpg" ### Read the image in BGR (OpenCV default). myImage = cv2.imread(imagePath) ### Choose a scaling percentage to keep things snappy on CPU. scale_precent = 30 ### Compute the new width from the original width and scale percent. width = int(myImage.shape[1] * scale_precent / 100) ### Compute the new height from the original height and scale percent. height = int(myImage.shape[0] * scale_precent / 100) ### Combine into (width, height) as expected by OpenCV. dim = (width, height) ### Resize with area interpolation for good downscaling quality. myImage = cv2.resize(myImage , dim , interpolation=cv2.INTER_AREA) Summary :
A smaller image makes everything feel responsive.
This is perfect for learning and quick experiments.
Curious about classic clustering? I also explain K-means image segmentation with simple Python code.
Link: https://eranfeit.net/python-image-segmentation-made-easy-with-opencv-and-k-means-algorithm/
Configure Detectron2 for Mask R-CNN on COCO
Short description: We load a trusted COCO config, attach its weights, choose CPU, and run a single forward pass.
### Start a fresh configuration object. cfg_inst = get_cfg() ### Pull a solid, well-known Mask R-CNN config (R50-FPN, 3x schedule). cfg_inst.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml")) ### Attach the matching pre-trained weights URL from the model zoo. cfg_inst.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml") ### Keep it accessible: run on CPU (switch to 'cuda' when you’re ready). cfg_inst.MODEL.DEVICE = "cpu" # remove or set to "cuda" if you have a GPU ### Build an easy one-call predictor for preprocessing → model → postprocessing. predictor = DefaultPredictor(cfg_inst) ### Run inference to get instances, masks, boxes, classes, and scores. outputs = predictor(myImage) Summary :
We now have predictions in outputs.
Next, we’ll draw them and save the result.
Want a transformer-based classifier to compare? Here’s my Vision Transformer image classifier tutorial.
Link: https://eranfeit.net/build-an-image-classifier-with-vision-transformer/
Visualize and save your results
Short description: We draw masks and labels with Visualizer, show the images, and save a PNG for your notes.
### Prepare a visualizer with RGB input and COCO metadata for class names. v = Visualizer(myImage[:, :, ::-1] , MetadataCatalog.get(cfg_inst.DATASETS.TRAIN[0]) , scale=1.0) ### Draw predictions (masks, boxes, labels) on CPU. out = v.draw_instance_predictions(outputs["instances"].to("cpu")) ### Convert back to BGR for OpenCV display and saving. img = out.get_image()[:, :, ::-1] ### Show original and predictions side by side in windows. cv2.imshow("img", myImage) cv2.imshow("predict", img) ### Save the visualized predictions to a file you can share or compare later. cv2.imwrite("e:/temp/segmented.png", img) ### Keep the windows open until a key is pressed. cv2.waitKey(0) Summary :
You’ve completed a clean end-to-end Detectron2 instance segmentation pass.
Tweak thresholds, try other configs, or enable CUDA as your next step.
Here is the result :

FAQ :
What is instance segmentation?
Pixel-level masks for each object, not just boxes—great for measurement and clean overlays.
Why choose Detectron2?
Reliable models, clear configs, and tools that make inference and experiments straightforward.
Can I run this on CPU?
Yes. It’s slower than GPU but fine for learning and small images.
How do I enable CUDA?
Install CUDA-compatible PyTorch/Detectron2 and set the device to “cuda.” Match versions carefully.
Which config should I start with?
mask_rcnn_R_50_FPN_3x is a balanced, well-tested starting point.
Should I resize images first?
Yes for quick iterations. Increase size later for more detail.
Where do class names come from?
The COCO metadata provided via MetadataCatalog in Detectron2.
How do I filter low-confidence results?
Set cfg_inst.MODEL.ROI_HEADS.SCORE_THRESH_TEST (e.g., 0.5) before creating the predictor.
How do I save the overlay?
Use cv2.imwrite() on the visualized frame and confirm the output path exists.
Can I process a folder automatically?
Yes—loop over files, call predictor, save results, and skip on-screen windows for speed.
Conclusion:
You just ran Detectron2 instance segmentation end-to-end with Mask R-CNN on a single image.
The steps were small and deliberate: import the tools, prep the image, load a proven COCO config, draw the results, and save a clean output.
That rhythm scales well.
You can raise the image size, switch to CUDA, try stricter score thresholds, or test different model-zoo backbones without changing your mental model.
When you’re ready for datasets, this same structure—inputs, config, predict, visualize—keeps your code tidy and your experiments honest.
Connect
☕ Buy me a coffee — https://ko-fi.com/eranfeit
🖥️ Email : feitgemel@gmail.com
🤝 Fiverr : https://www.fiverr.com/s/mB3Pbb
Enjoy,
Eran
