Detectron2 Panoptic Segmentation Made Easy for Beginners

Leave a Comment / Image Segmentation

Last Updated on 17/11/2025 by Eran Feit

Why panoptic segmentation matters

Panoptic Segmentation — Detectron2 Panoptic Segmentation Made Easy for Beginners 6

Panoptic segmentation sounds like a big research term, but the goal is actually very intuitive: understand the entire scene, pixel by pixel.

Most people start with two related ideas in computer vision:

Semantic segmentation: label every pixel with a class.
For example, “all these pixels are road,” “all those pixels are sky,” “all these pixels are grass.”
But it doesn’t tell you which car is which or how many people are in the frame.
Every “person” pixel is just “person.”
Instance segmentation: find each object separately.
Here we don’t just say “person.”
We say “Person 1,” “Person 2,” “Person 3,” and we draw a separate mask for each of them.
But instance segmentation usually focuses on “things” — objects you can count — and doesn’t always care about the background like sky or road in a detailed pixel way.

Panoptic segmentation gives you both at the same time.
It gives you full-scene understanding by:

Assigning a semantic label to every single pixel (road, sky, wall, grass, etc.).
Separating out each distinct object instance (Person 1 vs Person 2, Car 1 vs Car 2, etc.).

So instead of just saying “there are cars here,” panoptic segmentation can tell you:

exactly which pixels belong to Car 1,
which belong to Car 2,
which pixels are road,
which pixels are sidewalk,
which pixels are building,
and which pixels are sky.

In other words, panoptic segmentation answers two questions at once:

“What is everything in this image?”
“Where exactly is each thing in the image?”

That’s why it’s so valuable in robotics, autonomous driving, retail analytics, safety zones, sports analytics, medical imaging, and more.
You don’t just detect objects.
You understand the full layout of the environment.

For another hands-on look at segmentation in real projects, you can also check out my full U-Net medical segmentation walkthrough in TensorFlow and Keras, where I segment polyps in colonoscopy images step by step: https://eranfeit.net/u-net-medical-segmentation-with-tensorflow-and-keras-polyp-segmentation/

Where Detectron2 comes in

Now let’s connect this to Detectron2.

Detectron2 is a computer vision framework from Meta AI (FAIR) that gives you high-quality pretrained models for tasks like object detection, instance segmentation, semantic segmentation — and yes, panoptic segmentation.

What makes Detectron2 powerful for panoptic segmentation is that you don’t have to build or train a giant neural network yourself.
You can load a “panoptic segmentation” model from the Detectron2 model zoo, point it at an image, and it will:

find each individual object instance (people, cars, bikes, etc.),
label broad background regions (road, sidewalk, sky, building),
and output one unified map where every pixel in the image is explained.

This is sometimes called “things and stuff.”
“Things” are countable objects like a person or a car.
“Stuff” is background material like grass, road, or sky.
Detectron2’s panoptic models give you both, merged into a single output.

Why is that a big deal for beginners?

Because normally, to get a result like that, you’d have to:

collect a labeled dataset,
train a segmentation model,
tune it,
and write a lot of post-processing code to merge semantic masks and instance masks.

With Detectron2 panoptic segmentation, you skip all of that.
You:

load a pretrained panoptic model config,
load its pretrained weights,
run inference on your own image,
and visualize the result.

That’s it.

Even better, you can force Detectron2 to run on CPU.
So you don’t need a GPU to start learning.
You get full-scene pixel masks on a normal machine.

So when we say “Detectron2 Panoptic Segmentation Made Easy for Beginners,” this is what we mean:
You’re getting research-level scene understanding — every pixel labeled, every object separated — using code that is short, readable, and practical for real projects.

If you’re more into real-time object detection instead of full-scene pixel mapping, I also have a Jetson Nano tutorial where I classify objects directly from a live camera feed using Python and OpenCV: https://eranfeit.net/how-to-classify-objects-in-live-camera-using-jetson-nano/

Link for the video tutorial : https://youtu.be/MuzNooUNZSY

You can download the code here : https://eranfeit.lemonsqueezy.com/buy/86c74809-3fc4-4f87-bd5f-9ae0ecf044e2

or here : https://ko-fi.com/s/7317da4e07

You can find more tutorials in my blog : https://eranfeit.net/blog/

🚀 Want to get started with Computer Vision or take your skills to the next level ?

If you’re just beginning, I recommend this step-by-step course designed to introduce you to the foundations of Computer Vision – Complete Computer Vision Bootcamp With PyTorch & TensorFlow

If you’re already experienced and looking for more advanced techniques, check out this deep-dive course – Modern Computer Vision GPT, PyTorch, Keras, OpenCV4

Getting the image ready so Detectron2 can analyze it

Introduction to this part

Before we run any AI model, we need an image in memory.
We’ll use OpenCV (cv2) to read an image from disk.
Then we’ll scale it down to make inference faster, especially if we’re running on CPU.
High-resolution photos from phones can be 4000+ pixels wide, and that slows everything down.

In this section we also import all the libraries we’ll use.
We pull in PyTorch, Detectron2 utilities, OpenCV, and NumPy.
We prepare a target scale (30% of the original size) and compute the resized width and height.
Then we store the resized image back into the same variable, so downstream code only deals with one image.

Resizing is not just about speed.
It also helps the visualizer produce cleaner overlays at a comfortable display size.
When you’re testing locally, smaller images are easier to view in debug windows, save to disk, and share in reports.

This step is important for beginners because it makes the project run smoothly even without a GPU.
A leaner image → fewer pixels → faster segmentation.

Elaborated description of what happens in the code

We import core libraries.
We define the path to the input image.
We read that image with cv2.imread().
We calculate new dimensions using scale_precent.
We perform cv2.resize() with an interpolation mode that is good for shrinking.
At the end, the variable myNewImage contains the resized frame we’ll feed into Detectron2.

If you want a simpler approach that doesn’t even need deep learning weights, I also wrote about K-Means image segmentation in OpenCV, where we cluster pixel colors to split regions: https://eranfeit.net/python-image-segmentation-made-easy-with-opencv-and-k-means-algorithm/

Here is our test image :

Test Image for segmentation — Detectron2 Panoptic Segmentation Made Easy for Beginners 7

### Import core PyTorch. import torch  ### Import Detectron2 main package so we can access its vision tools. import detectron2  ### Import NumPy for array operations that support OpenCV and general math. import numpy as np   ### Import OpenCV for reading, resizing, showing, and saving images. import cv2   ### Bring in Detectron2’s model zoo for pretrained configs and weights. from detectron2 import model_zoo  ### Import DefaultPredictor, which wraps model loading and inference. from detectron2.engine import DefaultPredictor  ### get_cfg gives us a fresh Detectron2 config we can modify. from detectron2.config import get_cfg  ### Visualizer draws segmentation masks, labels, and colors on top of images. from detectron2.utils.visualizer import Visualizer  ### MetadataCatalog stores dataset metadata (class names, colors) used by Visualizer. from detectron2.data import MetadataCatalog  ### Path to the image we want to segment. imagePath = "pexels-brett-sayles-1115171.jpg"  ### Read the image from disk into memory as a NumPy array (BGR color order). myNewImage = cv2.imread(imagePath)  ### Choose how much to scale down the image (percentage of original size). scale_precent = 30   ### Compute new width based on original width * scale_precent / 100. width = int(myNewImage.shape[1] * scale_precent / 100)  ### Compute new height based on original height * scale_precent / 100. height = int(myNewImage.shape[0] * scale_precent / 100)  ### Create a size tuple in (width, height) order for OpenCV. dim = (width, height)  ### Resize the original image to the new dimensions using INTER_AREA (good for shrinking). myNewImage = cv2.resize(myNewImage, dim , interpolation=cv2.INTER_AREA)

Summary of this part

We imported everything we need for vision and segmentation.
We loaded an input image.
We resized it down to 30% so it will run faster on CPU.
Now we have a clean image tensor (myNewImage) ready for Detectron2 to analyze in the next step.

Loading a pretrained panoptic model from Detectron2

Introduction to this part

This section is where Detectron2 really shines for beginners.
Instead of training a neural network from scratch, we load a pretrained panoptic segmentation model from the Detectron2 model zoo.
That model already “knows” common objects and background categories, because it was trained on a large dataset.

We’ll create a config object using get_cfg().
Then we’ll merge in a specific config file for panoptic segmentation: COCO-PanopticSegmentation/panoptic_fpn_R_101_3x.yaml.
That file describes which architecture to use, how the model is structured, and how it was trained.

We also point the config to the matching pretrained weights using model_zoo.get_checkpoint_url().
That gives us a ready-to-run checkpoint.

Finally, we tell Detectron2 to run on "cpu".
That line is very beginner-friendly because it means you do not need CUDA or a dedicated GPU.
It will run slower, but it will run.

Elaborated description of what happens in the code

We create cfg_pan = get_cfg().
We call cfg_pan.merge_from_file(...) to inject all the model details into our config.
We assign cfg_pan.MODEL.WEIGHTS so the predictor knows which trained weights to load.
We set cfg_pan.MODEL.DEVICE = "cpu" so inference stays on the CPU.
Then we create a DefaultPredictor(cfg_pan) object, which bundles preprocessing and inference logic for us.

After this block, we have a predictor that can generate panoptic segmentation with one line: predictor(myNewImage).

### Create a new Detectron2 config object. cfg_pan = get_cfg()  ### Load the predefined panoptic segmentation config from the model zoo. cfg_pan.merge_from_file(model_zoo.get_config_file("COCO-PanopticSegmentation/panoptic_fpn_R_101_3x.yaml"))  ### Set the pretrained weights for that exact config so we don't have to train anything. cfg_pan.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-PanopticSegmentation/panoptic_fpn_R_101_3x.yaml")  ### Force Detectron2 to run on CPU. ### If you have CUDA available, you can remove this line or set it to "cuda". cfg_pan.MODEL.DEVICE = "cpu" # if you have Cuda , dont need this line  ### Build a DefaultPredictor. ### This object handles preprocessing, running the model, and postprocessing. predictor = DefaultPredictor(cfg_pan)

Summary of this part

We prepared a Detectron2 panoptic segmentation model using a known config and pretrained weights.
We locked it to CPU for maximum accessibility.
We wrapped the model in DefaultPredictor, so inference becomes a single function call.

If you want to focus on classification instead of segmentation, you can read my TensorFlow image classification tutorial where I compare ResNet50 and MobileNet, explain training, and walk through evaluation: https://eranfeit.net/tensorflow-image-classification-tutorial-resnet50-vs-mobilenet/

Running panoptic segmentation and drawing the results

Introduction to this part

Now we’ll actually run inference.
When we call predictor(myNewImage), Detectron2 returns a dictionary of outputs.
For panoptic segmentation models, that dictionary includes "panoptic_seg", which itself contains two important pieces.
The first item is a tensor that maps each pixel to a segment.
The second is segments_info, a structure that tells us which class each segment belongs to.

After we get those results, we use Visualizer to create a color overlay.
Visualizer knows the dataset’s class names and colors by reading from MetadataCatalog.
It draws masks, outlines, and labels onto our image so we can understand what the model saw.

Elaborated description of what happens in the code

We call the predictor and unpack:
panoptic_seg, segments_info = predictor(myNewImage)["panoptic_seg"].

We create a Visualizer with:

The image in RGB order (OpenCV uses BGR, so we reverse the last channel order [:, :, ::-1]).
The training dataset metadata from cfg_pan.
A scale parameter for how large to draw elements.

We then ask Visualizer to draw panoptic predictions.
It returns an object whose .get_image() method gives us the final colorized image.
We convert back to BGR for OpenCV display and saving.

### Run the predictor on our resized image to get panoptic segmentation output. panoptic_seg , segments_info = predictor(myNewImage)["panoptic_seg"]  ### Create a Visualizer that can draw masks, labels, and colors. ### Note: We convert BGR (OpenCV) to RGB (Visualizer expects RGB). v = Visualizer(myNewImage[:, :, ::-1],MetadataCatalog.get(cfg_pan.DATASETS.TRAIN[0]), scale=1.0 )  ### Ask the Visualizer to draw the full panoptic segmentation (things + stuff). out = v.draw_panoptic_seg_predictions(panoptic_seg.to("cpu"), segments_info)  ### Convert the visualized result back to BGR so OpenCV can display and save it correctly. img = out.get_image()[:, :, ::-1]

Summary of this part

We ran Detectron2 panoptic segmentation on our image and generated a pixel-accurate scene map.
We visualized every object and every background region in different colors.
The result lives in img, which is now a human-readable overlay.

Previewing and saving your segmented scene

Introduction to this part

Once we have the result, we want to see it and keep it.
In this final step we show two OpenCV windows.
One window displays the original (resized) image.
The other window displays the panoptic segmentation overlay with all the colored masks.
Then we write the overlay to disk as panoptic.png.

This is the “proof moment.”
You can visually confirm that Detectron2 understood your scene: people are separated from background, cars are isolated, sky is identified, etc.
For beginners, this feedback loop is motivating because you immediately see something real from just a few lines of code.

Elaborated description of what happens in the code

cv2.imshow("img", myNewImage) opens a window called “img” with the resized original frame.
cv2.imshow("predict", img) opens a window called “predict” with the panoptic overlay.
cv2.imwrite("e:/temp/panoptic.png", img) saves the overlay to disk so you can reuse it in reports, blog posts, or presentations.
cv2.waitKey(0) pauses the script until you press a key, so the windows don’t disappear immediately.

You can also skip the imshow steps and just keep imwrite if you’re running on a server without a display.

### Show the original resized image in a window called "img". cv2.imshow("img", myNewImage)  ### Show the panoptic segmentation result in a window called "predict". cv2.imshow("predict", img)  ### Save the segmentation overlay to disk so you can use it later. cv2.imwrite("e:/temp/panoptic.png", img)  ### Wait for a key press so the OpenCV windows stay open. cv2.waitKey(0)

Summary of this part

We previewed both the input and the output.
We saved the final “panoptic view” as an image file.
At this point, you’ve gone from raw photo to full-scene segmentation without training anything, and you did it in a CPU-friendly way.

The result :

panoptic segmentation result — Detectron2 Panoptic Segmentation Made Easy for Beginners 8

FAQ :

What is panoptic segmentation?

Panoptic segmentation labels every pixel in the image. It combines semantic segmentation (stuff like road or sky) and instance segmentation (separate objects like Person 1 vs Person 2).

Why are we using Detectron2?

Detectron2 gives us pretrained, production-grade vision models. We can run panoptic segmentation with just a few lines of Python, no custom training required.

Can I run this on CPU?

Yes. Setting cfg_pan.MODEL.DEVICE = “cpu” forces inference on CPU so you can test without a GPU. It’s slower, but it works for demos and screenshots.

Which model are we loading?

We’re using a COCO panoptic model from the Detectron2 model zoo (panoptic_fpn_R_101_3x.yaml). It’s already trained to recognize common objects and background classes.

What does segments_info mean?

segments_info tells us what each region in the prediction represents. It includes the object category and whether it’s a ‘thing’ (like a person) or ‘stuff’ (like sky or road).

Why do we resize the input image?

Resizing to 30% cuts down the number of pixels. With fewer pixels, inference runs faster, especially on CPU, and the visualizer stays responsive.

Can this run on video?

Yes. You can loop through each frame of a video, run predictor(frame), draw the panoptic overlay, and export the processed frames as a new video file.

Can I save results without opening windows?

Absolutely. You can skip cv2.imshow and just call cv2.imwrite to store the visualized output as an image file.

How is this different from bounding boxes?

Bounding boxes only draw rectangles around objects. Panoptic segmentation creates pixel-accurate masks for each object and also labels the background areas in the scene.

What should I learn next?

The next step is custom training. You can register your own dataset in Detectron2 and fine-tune panoptic segmentation for your specific classes and environment.

Conclusion — how to move from demo to real projects

You now have a working panoptic segmentation pipeline that is friendly for beginners.
You imported Detectron2, loaded a pretrained model from the model zoo, resized your input image, ran inference on CPU, and generated a full-scene segmentation map with colored overlays.

This approach shows why panoptic segmentation is so useful.
Instead of guessing what’s in the frame with bounding boxes, you actually understand every pixel.
You know which areas are road, which pixels belong to a person, where the sky is, and how many separate people appear in the scene.
That is extremely valuable for robotics, video analytics, autonomous systems, retail heatmaps, sports analytics, and safety zones.

You also saw how Detectron2 hides a lot of internal complexity.
The config system, pretrained weights, metadata, and visualizer all work together so you can focus on results instead of boilerplate.
This matches the promise in the title “Detectron2 Panoptic Segmentation Made Easy for Beginners.”
We turned a research-grade capability into a practical script you can run on your own machine.

From here, you can extend this in a few directions.
You can apply the same predictor to each frame of a video and build a panoptic video analyzer.
You can crop only the pixels that belong to certain classes (for example, “person”) and feed those crops to another model.
Or you can move into custom training: registering your own dataset so the model learns your specific environment, like a warehouse floor or a retail aisle.

The most important part is that you are no longer guessing what panoptic segmentation is.
You’ve seen it run end to end.
You watched it label an entire scene.
You saved the result to disk.
That’s real progress.

Connect

☕ Buy me a coffee — https://ko-fi.com/eranfeit
🖥️ Email : feitgemel@gmail.com
🌐 https://eranfeit.net
🤝 Fiverr : https://www.fiverr.com/s/mB3Pbb

Enjoy, Eran

Leave a Comment Cancel Reply