Last Updated on 12/01/2026 by Eran Feit
Introduction
Image segmentation with mediapipe is a practical way to separate a subject from its surroundings at the pixel level.
Instead of drawing a rectangle around an object, segmentation creates a mask that follows the object’s real outline.
That makes edits like background replacement look much cleaner and more realistic.
In this tutorial idea, the goal is to take a normal photo and turn it into something you can “edit” like layers.
Once you have a foreground mask, you can keep the subject, change the background, or apply effects only to the selected region.
This is useful for thumbnails, product photos, profile images, and quick visual experiments.
The workflow is simple but powerful.
You load an image, run a segmentation model, convert the model output into a usable mask, then blend pixels using that mask.
OpenCV and NumPy make the blending step fast and flexible.
What makes this approach feel interactive is the ability to guide the model with a single point.
Instead of trying to segment the whole scene, you tell the model “this is the object I care about.”
That keeps the result focused and makes the code easier to reuse across many different images.
Image segmentation with MediaPipe in a simple, hands-on workflow
Image segmentation with mediapipe becomes very approachable when you think of it as a three-part pipeline.
First you prepare the inputs: read the image, load a replacement background, and make sure both images share the same size.
That way, every pixel in the original image has a matching pixel in the new background.
Next comes the segmentation step, where the model produces a category mask.
A mask is basically a map that says which pixels belong to the selected object and which pixels belong to everything else.
Even if the mask values are soft probabilities, you can turn them into a clean selection by applying a threshold.
Then you create a visualization overlay so you can instantly verify the selection.
This is where alpha blending helps: you mix the original image with a solid color overlay only where the mask is active.
It’s a fast way to debug and tune the threshold before doing the actual background replacement.
Finally, the background replacement becomes a single clean operation.
Where the mask is “true,” you keep the original pixels, and where the mask is “false,” you swap in pixels from the new background.
This creates a result that looks like the subject was photographed in a new scene, without manual editing.

Segmenting an Object and Replacing the Background Step by Step
This tutorial focuses on building a complete, practical pipeline that takes an image and replaces its background using image segmentation with mediapipe.
The code is designed to stay simple and readable while still showing how modern segmentation models can be used in real projects.
Instead of relying on heavy frameworks or complex training steps, everything is done with pre-trained models and a few clear processing stages.
The main target of the code is to isolate a single object in an image using a user-defined keypoint.
By selecting one point inside the object, the segmentation model understands which region should be treated as the foreground.
This approach avoids segmenting unnecessary parts of the image and keeps the output focused on what actually matters.
Once the segmentation mask is generated, the code demonstrates two important ideas.
First, it visualizes the mask by blending a colored overlay with the original image, making it easy to inspect the quality of the segmentation.
Second, it uses the same mask as a condition to decide which pixels come from the original image and which come from a new background.
The final result is a clean background replacement that looks natural and consistent.
The subject keeps its original shape and boundaries, while the surrounding pixels are swapped with a completely different scene.
This makes the code useful for learning purposes, experimentation, and as a foundation for more advanced image editing workflows.
Link to the video tutorial : https://youtu.be/I08RgncbDJs
You can download the code here or here
Link to the post for Medium.com users can be found here .
Want to get started with Computer Vision or take your skills to the next level ?
Great Interactive Course : “Deep Learning for Images with PyTorch” here : https://datacamp.pxf.io/zxWxnm
If you’re just beginning, I recommend this step-by-step course designed to introduce you to the foundations of Computer Vision – Complete Computer Vision Bootcamp With PyTorch & TensorFlow
If you’re already experienced and looking for more advanced techniques, check out this deep-dive course – Modern Computer Vision GPT, PyTorch, Keras, OpenCV4

Replace the Background with New Image
Image segmentation with mediapipe lets you separate a subject from the background at the pixel level.
Instead of working with rectangles or rough selections, you get a mask that follows the object’s real shape.
That mask becomes the key for effects like highlighting a subject, blurring the background, or swapping the entire scene.
In this tutorial, the goal is simple and practical.
You will load an image, pick a single keypoint, segment the object around that point, and replace the background with a new image.
You will also generate an overlay preview so you can visually confirm that the mask is selecting the right region.
Setting up the environment and downloading the models
A clean environment helps you avoid version conflicts and makes your setup reproducible.
This tutorial uses a dedicated conda environment with Python 3.11 and two main packages: OpenCV and MediaPipe.
Keeping the exact versions ensures the MediaPipe Tasks API works as expected.
You also need the segmentation model file used by the code.
The DeepLabV3 TFLite model is loaded from disk, so downloading it once is enough.
After that, you can reuse it for any number of images.
### Create a new conda environment for this tutorial. conda create -n RemoveBG python=3.11 ### Activate the environment so installs go into the right place. conda activate RemoveBG ### Install OpenCV for image I/O, resizing, and display. pip install opencv-python==4.10.0.84 ### Install MediaPipe for InteractiveSegmenter and image segmentation tasks. pip install mediapipe==0.10.14 ### Download the models. ### Put them in your favorite folder. https://storage.googleapis.com/mediapipe-models/image_segmenter/deeplab_v3/float32/1/deeplab_v3.tflite https://storage.googleapis.com/mediapipe-models/interactive_segmenter/magic_touch/float32/1/magic_touch.tflite Loading the input image and preparing a new background
The first step in the Python script is reading the main image you want to edit.
OpenCV loads images as NumPy arrays, which makes it easy to manipulate pixels later.
This is also where you define the path to your replacement background image.
To replace the background cleanly, both images must have the same width and height.
The code resizes the new background to match the original image shape.
This ensures that every pixel position aligns correctly when you apply the segmentation mask.
Test image (Lilach) :
Background image (desert) :


### Import OpenCV for reading and processing images. import cv2 ### Define the path to the input image you want to segment. PathToImage = "Best-Semantic-Segmentation-models/Media Pipe Segmentation/Image Segmentation using Media-pipe - Replace the background with new image/lilach.jpg" ### Read the image from disk into memory. img = cv2.imread(PathToImage) ### Define the path to the replacement background image. new_bg_path = "Best-Semantic-Segmentation-models/Media Pipe Segmentation/Image Segmentation using Media-pipe - Replace the background with new image/Desert.jpg" ### Read the new background image from disk. new_bg = cv2.imread(new_bg_path) ### Resize the new background to match the original image dimensions. new_bg = cv2.resize(new_bg, (img.shape[1], img.shape[0])) ### Display the original image to confirm it loaded correctly. cv2.imshow("img", img) Choosing the keypoint and loading the MediaPipe task components
Interactive segmentation needs a point that tells the model what you want to segment.
In this code, the point is given as normalized coordinates, meaning values between 0 and 1.
A value of 0.5, 0.5 points to the center of the image, which is often a good starting guess.
This section also loads the MediaPipe Tasks modules used later.
You import the task API, vision interfaces, and the keypoint container type.
That setup is what allows you to create a RegionOfInterest and run the InteractiveSegmenter.
### Choose a normalized X coordinate for the keypoint. x = 0.5 ### Choose a normalized Y coordinate for the keypoint. y = 0.5 ### Import NumPy for mask stacking and pixel-level operations. import numpy as np ### Import MediaPipe for task-based image handling. import mediapipe as mp ### Import the MediaPipe Tasks Python API. from mediapipe.tasks import python ### Import the vision task interfaces. from mediapipe.tasks.python import vision ### Import container types such as NormalizedKeypoint. from mediapipe.tasks.python.components import containers ### Alias the RegionOfInterest type for interactive segmentation. RegionOfInterest = vision.InteractiveSegmenterRegionOfInterest ### Alias the NormalizedKeypoint type used inside the ROI. NormalizedKeypoint = containers.keypoint.NormalizedKeypoint Configuring the InteractiveSegmenter and defining overlay settings
MediaPipe Tasks use an options object that describes which model to load and what outputs you want.
Here, the code loads the DeepLabV3 model from a local file path.
It also enables output_category_mask so the segmentation result includes a mask you can apply to pixels.
A good tutorial workflow includes visualization, not just the final saved result.
That is why the code defines an overlay color and creates a blended preview image.
This preview helps you confirm mask quality before committing to background replacement.
### Create the options that will be used for InteractiveSegmentation ### Define the base options and point to the local TFLite model file. base_options = python.BaseOptions(model_asset_path="D:/Temp/Models/MediaPipe/deeplab_v3.tflite") ### Define interactive segmenter options and request the category mask output. options = vision.InteractiveSegmenterOptions(base_options=base_options, output_category_mask=True) ### Generate another visualation image where we highlist the selected object ### Define the overlay color used for mask visualization. OVERLAY_COLOR = (255,0,0) # Blue Running interactive segmentation and creating a mask overlay preview
This section is where image segmentation with mediapipe actually happens.
The segmentor is created from the options, and the input file is loaded into a MediaPipe Image object.
Then a RegionOfInterest is built using the keypoint, which tells the model what object you want.
After segmentation, the category mask is converted into a condition using a threshold.
That condition is used to build an alpha mask and blend a colored overlay onto the original image.
The overlay preview is a fast way to see if your keypoint selection worked well.
### Create a segnentor with python.vision.InteractiveSegmenter.create_from_options(options) as segmentor: ### Create the media pipe Image image2 = mp.Image.create_from_file(PathToImage) ###retrieve the category masks for the image roi = RegionOfInterest(format=RegionOfInterest.Format.KEYPOINT , keypoint = NormalizedKeypoint(x,y)) segmenation_result = segmentor.segment(image2,roi) category_mask = segmenation_result.category_mask ### Convert the BGR to RGB image_data = cv2.cvtColor(image2.numpy_view(), cv2.COLOR_BGR2RGB) ### Create an overlay image with the desired color overlay_image = np.zeros(image_data.shape, dtype=np.uint8) overlay_image[:] = OVERLAY_COLOR ### Create the condition from the category_masks array alpha = np.stack((category_mask.numpy_view(),) * 3, axis=-1) > 0.1 ### Create an alpha channal from the condition with the desired opacty (70%) alpha = alpha.astype(float) * 0.7 ### Blend the original image with the overlay image using the alpha channel output_image2 = image_data * (1-alpha) + overlay_image * alpha output_image2 = output_image2.astype(np.uint8) Replacing the background and saving the final result
Now that you have a mask, background replacement becomes a clean pixel selection problem.
The code creates a boolean condition from the mask and applies np.where to choose pixels.
Foreground pixels come from the original image, and background pixels come from the resized new background.
Finally, the script saves the result to disk and displays both preview windows.
This makes it easy to confirm you got both the overlay visualization and the final composited image.
When you are done, the script waits for a key press and closes the windows cleanly.
### replace the background with the new image condition = np.stack((category_mask.numpy_view(),) * 3, axis=-1) > 0.1 image_with_new_bg = np.where(condition, img , new_bg) # Replace the background using the mask cv2.imwrite("Best-Semantic-Segmentation-models/Media Pipe Segmentation/Image Segmentation using Media-pipe - Replace the background with new image/image_with_new_bg.jpg", image_with_new_bg) # Save the overlay image cv2.imshow("output_image2", output_image2) # Show the overlay image cv2.imshow("image_with_new_bg", image_with_new_bg) # Show the image with new background cv2.waitKey(0) cv2.destroyAllWindows() The result :

Summary
You set up a clean Python environment and loaded a pretrained segmentation model.
You segmented an object using a single keypoint and generated a category mask.
You visualized the selection with an overlay preview and replaced the image background using the mask.
FAQ
What does image segmentation with mediapipe actually output?
It outputs a mask where each pixel has a value indicating how likely it belongs to the selected region. You can threshold it to create a clean foreground versus background selection.
Why do we use a normalized keypoint (0 to 1) instead of pixels?
Normalized coordinates make the code independent of image resolution. The same x and y values still refer to the same relative position after resizing.
What model is used in this tutorial and why?
The code loads a pretrained DeepLabV3 TFLite model. It’s fast, widely used for segmentation, and works well for foreground separation.
What does output_category_mask=True give you?
It ensures the segmentation result includes a category mask output. That mask is what you use for overlays and background replacement.
Why do we convert BGR to RGB for the overlay preview?
OpenCV reads images as BGR, while many pipelines expect RGB for correct color handling. Converting prevents colors from looking swapped in the preview.
How do I improve segmentation if the selection is wrong?
Try moving the keypoint deeper inside the object you want. Small changes can shift the region the interactive model selects.
What does the 0.1 threshold control?
It decides which pixels are considered part of the object. Lower thresholds include more pixels, while higher thresholds produce tighter masks.
Why do we stack the mask three times with np.stack?
The mask is single-channel, but your image is three-channel. Stacking makes the shapes match so you can apply the condition across channels.
Why do we resize the new background to img.shape?
Background replacement requires both images to have the same dimensions. This keeps pixel alignment correct when applying the mask.
What is the safest way to reuse this code for many images?
Wrap the segmentation and replacement logic into a function. Then process a folder of images and save outputs with unique filenames.
Conclusion
Image segmentation with mediapipe is one of the fastest ways to turn a normal photo into an editable foreground and background.
With a single keypoint, you can guide the model toward the object you care about and avoid messy full-scene segmentation.
That makes the workflow feel interactive, practical, and easy to reuse.
The overlay step is not just a nice visualization, it is a debugging tool.
Seeing the mask blended over the original image helps you tune the threshold and confirm the selection before saving results.
This small preview step can save a lot of time when you process many images.
Once the mask looks right, background replacement becomes a clean pixel operation.
Using a boolean condition and a resized background image, you can create consistent results with just a few NumPy operations.
From here, you can extend the same pipeline to blur backgrounds, apply stylized effects, or batch-process a full folder of images.
Connect :
☕ Buy me a coffee — https://ko-fi.com/eranfeit
🖥️ Email : feitgemel@gmail.com
🤝 Fiverr : https://www.fiverr.com/s/mB3Pbb
Enjoy,
Eran
