...

Segment Anything tutorial: Generate YOLOv8 Masks Fast

Build Custom Image Segmentation Model Using YOLOv8 and SAM

Last Updated on 07/11/2025 by Eran Feit

Getting started with Segment Anything (SAM)

Segment Anything tutorial — here’s the big idea behind SAM in plain language.
SAM is a promotable segmentation model that turns a simple hint—like a box or a few clicks—into a clean, pixel-accurate mask.
It was designed to generalize to new images without extra training, so you can segment unfamiliar objects quickly and reliably.
If you’ve struggled with manual annotation or heavyweight pipelines, this Segment Anything tutorial shows why SAM’s flexible prompts make segmentation feel fast, intuitive, and repeatable.

What you’ll build in this Segment Anything tutorial

In this Segment Anything tutorial, you’ll pair YOLOv8 for fast object detection with SAM for precise masks—an end-to-end recipe in Python.
You’ll set up a clean environment, run YOLOv8 to find a dog, pass its bounding box as a prompt to SAM, and export a polished visualization.
Everything is copy-paste-ready, works on CPU or GPU, and is structured so you can swap classes, images, or SAM backbones later.
By the end of this Segment Anything tutorial, you’ll have a reusable workflow for custom image segmentation projects that’s both beginner-friendly and production-minded.

You can download the code here : https://ko-fi.com/s/8388c5a7ed

Here is a link for Medium post : https://medium.com/@feitgemel/segment-anything-python-no-training-image-masks-3785b8c4af78

You can find more tutorials in my blog : https://eranfeit.net/blog/

🚀 Want to get started with Computer Vision or take your skills to the next level ?

If you’re just beginning, I recommend this step-by-step course designed to introduce you to the foundations of Computer Vision – Complete Computer Vision Bootcamp With PyTorch & TensorFlow

If you’re already experienced and looking for more advanced techniques, check out this deep-dive course – Modern Computer Vision GPT, PyTorch, Keras, OpenCV4


Setting up a clean Python workspace

A reliable environment prevents version conflicts and strange GPU issues.
We’ll create a dedicated Conda environment so PyTorch, Ultralytics, OpenCV, and SAM stay consistent.
If you have an NVIDIA GPU, we’ll confirm CUDA to unlock faster inference; otherwise, everything still works on CPU.
Finally, we’ll install the Segment Anything package and a small helper called supervision that many vision folks like for utilities.

Modern deep-learning stacks can be sensitive to mismatched versions, especially between PyTorch and CUDA.
By pinning PyTorch/Torchvision/Torchaudio to a known-good combo, you spare yourself debugging time later.
If you’re CPU-only, you can grab the CPU wheel from the official PyTorch site; the rest of this tutorial is unchanged.
We’ll also fix a common OpenCV display conflict by ensuring a single, standard opencv-python install.

As a practical habit, upgrade pip before installing packages and keep a short README of your environment steps.
That way you can replicate the exact setup on a laptop, workstation, or a cloud VM.
And if you later decide to try a different SAM backbone (like ViT-B), your environment is already compatible.
Let’s get the essentials installed now.

### Create and activate a fresh Conda env for isolation. conda create --name SAM-Tutorial python=3.9 -y conda activate SAM-Tutorial  ### (Optional) Check your CUDA toolkit version for GPU acceleration. nvcc --version  ### Install a compatible PyTorch build (adjust if your CUDA differs). ### For CPU-only, install the CPU build from the official site. conda install pytorch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 pytorch-cuda=11.8 -c pytorch -c nvidia -y  ### Core packages for this tutorial. pip install --upgrade pip pip install opencv-python matplotlib ultralytics pip install "git+https://github.com/facebookresearch/segment-anything.git" pip install supervision  ### Clean up potential OpenCV conflicts (optional but helpful). pip uninstall -y opencv-python-headless || true pip uninstall -y opencv-python || true pip install opencv-python 

Summary: You’ve built a stable workspace ready for PyTorch, OpenCV, Ultralytics (YOLOv8), and SAM, with optional GPU acceleration.

New to object detection? See my friendly starter: YOLOv8 Object Detection with Jetson Nano & OpenCV.


Handy helpers and loading an image

Before running models, we’ll add three tiny helper functions for tidy visuals: drawing masks, boxes, and optional point prompts.
Clean overlays make your results presentation-ready, whether for a blog, a report, or a slide deck.
We’ll then load an image from disk with OpenCV; remember it returns BGR, not RGB, so we’ll convert later.
If you’re on a headless machine, you can skip cv2.imshow and just use Matplotlib for previews.

The file path is relative; ensure you run from a folder where that path exists, or switch to an absolute path.
For reproducibility, it’s a good idea to keep a small assets/ directory in your repo containing example images.
Each time you rerun the pipeline, you’ll get the same starting point, which helps when comparing parameters.
If you plan to scale up to batches, structure your loop early.

For plotting, Matplotlib remains a simple, stable choice.
If you want interactive notebooks, Jupyter or VSCode’s Python extension work great.
Regardless of interface, the helpers below keep your visuals consistent.
Let’s load the image and confirm everything is wired correctly.

### Import core libraries for arrays, deep learning, plotting, and image I/O. import numpy as np import torch import matplotlib.pyplot as plt import cv2  ### Helper to display a semi-transparent mask on the current axes. def show_mask(mask, ax):     ### Define an RGBA color for the mask overlay.     color = np.array([30/255, 144/255, 255/255, 0.6])     ### Get mask spatial dimensions (height and width).     h, w = mask.shape[-2:]     ### Reshape to HxWx4 and multiply by the boolean mask.     mask_image = mask.reshape(h, w, 1) * color.reshape(1, 1, -1)     ### Plot the overlay on the given axes.     ax.imshow(mask_image)  ### Helper to visualize positive and negative clicks (optional). def show_points(coords, labels, ax, marker_size=375):     ### Separate positive and negative points by label.     pos_points = coords[labels == 1]     neg_points = coords[labels == 0]     ### Plot green stars for positive and red for negative prompts.     ax.scatter(pos_points[:, 0], pos_points[:, 1], color='green', marker='*', s=marker_size, edgecolor='white', linewidth=1.25)     ax.scatter(neg_points[:, 0], neg_points[:, 1], color='red', marker='*', s=marker_size, edgecolor='white', linewidth=1.25)  ### Helper to draw a rectangle given a [x1, y1, x2, y2] box. def show_box(box, ax):     ### Unpack the top-left and compute width/height.     x0, y0 = box[0], box[1]     w, h = box[2] - box[0], box[3] - box[1]     ### Draw a visible green rectangle with no facecolor.     ax.add_patch(plt.Rectangle((x0, y0), w, h, edgecolor='green', facecolor=(0, 0, 0, 0), lw=2))  ### Point to your image on disk. image_path = "Best-Semantic-Segmentation-models/Segment-Anything/4. Build Custom Image Segmentation Model Using YOLOv8 and SAM/Dori.jpg"  ### Read the image with OpenCV (BGR order by default). image = cv2.imread(image_path)  ### Optional: preview the raw image in a window (skip on headless). cv2.imshow("image", image) cv2.waitKey(0) cv2.destroyAllWindows() 

Summary: You now have helper functions and a loaded image — perfect foundations for detection and segmentation.

Prefer a no-training approach to masks? Check: Segment Anything in Python — No Training Needed.


Detecting your target with YOLOv8

We’ll use YOLOv8n, a lightweight model, to detect objects in the image.
Ultralytics’ API is straightforward: load the model, run inference, then access boxes and classes.
For this example, we’ll filter to COCO class 16 = dog and draw the bounding box.
The coordinates from this box will become SAM’s prompt for precise masks.

If your image contains multiple dogs, you can choose the largest box (by area) or the highest confidence.
You can also remove the classes=[16] filter to see all objects before deciding which to segment.
For generalization, consider wrapping this in a small function that returns the chosen input_box.
Logging the class name and coordinates helps you debug and document.

Remember OpenCV uses BGR; we’ll convert to RGB after drawing so Matplotlib and SAM see the right colors.
A quick sanity check plot before moving to SAM is worth the few lines of code.
With the box verified, you’re ready for pixel-accurate segmentation.
Let’s detect and capture the box.

Here is the test image :

Test image
Test image
### Import the Ultralytics YOLO interface and numpy/cv2 (if not already). from ultralytics import YOLO import numpy as np import cv2  ### Reload the image to ensure a fresh BGR copy (in case of prior changes). image = cv2.imread("Best-Semantic-Segmentation-models/Segment-Anything/4. Build Custom Image Segmentation Model Using YOLOv8 and SAM/Dori.jpg")  ### Initialize a lightweight YOLOv8 model (pretrained on COCO). model = YOLO('yolov8n.pt')  ### Map class indices to names for human-readable output. names = model.names print(names)  ### Run inference on the image to get detections. objects = model(image) print(objects)  ### Iterate and print class names for transparency. print("For loop : ") for obj in objects:     for c in obj.boxes.cls:         print(names[int(c)])  ### Filter detections to keep only dogs (COCO class 16). print("======================================") objects = model(image, classes=[16])  ### Draw the dog box and label on the image for confirmation. for obj in objects:     ### Access the Boxes object and its predicted classes.     boxes = obj.boxes     cls = boxes.cls      ### Take the first detection (adjust if multiple dogs).     output_index = int(cls[0]) if len(cls) > 0 else -1     class_name = names[output_index] if output_index >= 0 else "none"     print(class_name)      ### Proceed only if we found a dog.     if output_index == 16:         ### Extract [x1, y1, x2, y2] as integers from the first detection.         xyxy_coordinates = boxes.xyxy.cpu().numpy()         x1, y1, x2, y2 = map(int, xyxy_coordinates[0])          ### Draw the rectangle and label for inspection.         cv2.rectangle(image, (x1, y1), (x2, y2), (0, 255, 0), 2)         cv2.putText(image, class_name, (x1 + 10, y1 + 30), cv2.FONT_HERSHEY_COMPLEX, 1, (0, 0, 255), 2)          ### Convert BGR to RGB for Matplotlib and SAM.         image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)          ### Keep the box for the next step (SAM prompt).         input_box = np.array([x1, y1, x2, y2])         print("input_box:", input_box) 

Summary: YOLOv8 localized your target and produced input_box, the perfect prompt for SAM.

Compare with classification workflows in my ResNet50 Image Classification tutorial for context.


Loading SAM (ViT-H) and predicting a mask

SAM offers several backbones; ViT-H yields high-quality masks at a heavier compute cost.
We’ll load the checkpoint, move the model to GPU if available, and wrap it with SamPredictor.
SAM accepts points, boxes, or both; here we’ll supply the YOLOv8 box for a clean, automatic mask.
The predictor returns masks, scores, and logits; we’ll use the first mask to visualize.

If you’re on CPU, this still works — just expect slower inference.
set_image caches embeddings so subsequent prompts are much faster, which is useful if you plan to add clicks.
If your scene is complex, try multimask_output=True to return candidates, then pick the highest-scoring.
You can also combine positive and negative clicks to refine tricky edges.

Keep your checkpoint path handy and consistent across machines.
Storing it under a config or environment variable makes your code portable.
If you experiment with ViT-B or ViT-L, your calling code remains identical.
Let’s predict the mask now.

### Import the SAM registry and predictor utility. from segment_anything import sam_model_registry, SamPredictor import torch import numpy as np  ### Choose the local path to your SAM ViT-H checkpoint file. path_for_sam_model = "e:/temp/sam_vit_h_4b8939.pth"  ### Decide whether to use GPU ("cuda") or CPU ("cpu") based on availability. device = "cuda" if torch.cuda.is_available() else "cpu"  ### Select the SAM model type; "vit_h" provides high-quality masks. model_type = "vit_h"  ### Build the SAM model from the registry and load the checkpoint. sam = sam_model_registry[model_type](checkpoint=path_for_sam_model)  ### Move the model to the selected device for faster inference when possible. sam.to(device=device)  ### Wrap the model with SamPredictor for friendly inference APIs. predictor = SamPredictor(sam)  ### Ensure the image variable is RGB (we converted after drawing) and set it. predictor.set_image(image)  ### Feed the YOLOv8 bounding box as SAM's prompt. box_input = input_box[None, :]  ### Predict a single mask for the guided region (disable multimask for simplicity). masks, scores, logits = predictor.predict(     point_coords=None,     point_labels=None,     multimask_output=False,     box=box_input )  ### Extract the primary mask for visualization. pred_mask = masks[0] print("Mask shape:", pred_mask.shape) 

Summary: SAM is initialized and returned a high-quality pred_mask for your target region.

If you enjoy foundation models, see my transfer learning guide: DenseNet201 for Sports Image Classification.


Visualizing and saving a polished result

Presentation matters.
We’ll show the base RGB image, overlay the box and semi-transparent mask, and hide axes for a clean look.
Then we’ll save the figure as a tight image you can drop into a post or slide.
This closes the loop from detection to segmentation to shareable output.

If you prefer a transparent background for slides, save a PNG and add alpha.
For bigger images, increase figsize or DPI to maintain clarity without ballooning file size.
If you plan to automate reports, build a small function that returns the output path.
And if you want just the raw mask, you can save pred_mask.astype(np.uint8)*255 as a binary PNG.

When comparing multiple prompts (boxes or clicks), save each variant with a different suffix.
That way you can create nice side-by-side collages for your readers.
A consistent naming scheme keeps batch runs organized.
Let’s export the final image now.

### Import Matplotlib and rely on our earlier helpers for a polished look. import matplotlib.pyplot as plt  ### Create a big figure for clarity in blogs and slides. plt.figure(figsize=(10, 10))  ### Show the RGB image and add our overlays. plt.imshow(image) show_mask(pred_mask, plt.gca()) show_box(input_box, plt.gca())  ### Clean the axes for a crisp visual. plt.axis('off')  ### Save the result to a chosen path and display it inline. out_path = "Best-Semantic-Segmentation-models/Segment-Anything/4. Build Custom Image Segmentation Model Using YOLOv8 and SAM/out.jpg" plt.savefig(out_path, bbox_inches='tight', pad_inches=0.0) plt.show()  ### Print the path so it’s easy to find programmatically. print("Saved to:", out_path) 

Summary: You’ve produced and saved a clean visualization that overlays SAM’s mask and YOLOv8’s box.

Need sharper outputs? Try Upscale Your Images & Videos with Super-Resolution.

Result :

Result segmented image
Segment Anything tutorial: Generate YOLOv8 Masks Fast 5

FAQ

What is Segment Anything in simple terms?

A promptable segmentation model that turns a box or points into accurate masks without extra training.

Why pair YOLOv8 with SAM?

YOLOv8 localizes quickly; SAM converts that localization into precise pixel-level masks.

Does this require a GPU?

No. CPU works fine; a CUDA GPU simply speeds up inference.

Which SAM backbone should I start with?

Start with ViT-H for best quality or ViT-B for lighter compute—usage is the same.

My mask looks shifted — why?

Check [x1,y1,x2,y2] ordering and convert the image to RGB before passing to SAM.

Can I refine masks with clicks?

Yes—add point_coords and point_labels along with the box to refine boundaries.

How do I handle multiple detections?

Pick the largest or highest-confidence box or loop through all detections.

Can I export the raw mask?

Save a binary PNG or the raw NumPy array for pipelines and training data.

Is batch processing supported?

Absolutely—loop over images, detect with YOLOv8, prompt SAM, and cache embeddings.

How do I change the target class?

Adjust the COCO class filter or remove it to review all detections first.


Conclusion

In this Segment Anything tutorial, you built a practical, end-to-end workflow: detect with YOLOv8, segment with SAM, and export a clean visualization.
This recipe scales from a single image to a folder of photos, and from a single box prompt to richer point-and-box interactions.
Tweak backbones (ViT-H vs ViT-B), try multimask_output, or introduce refinement clicks for finicky edges.
With these foundations, you’re ready to integrate segmentation into datasets, labeling pipelines, or creative image tools.

Connect :

☕ Buy me a coffee — https://ko-fi.com/eranfeit

🖥️ Email : feitgemel@gmail.com

🌐 https://eranfeit.net

🤝 Fiverr : https://www.fiverr.com/s/mB3Pbb

Enjoy,

Eran

Leave a Comment

Your email address will not be published. Required fields are marked *

Eran Feit