Segment Anything Python — No-Training Image Masks

Leave a Comment / Image Segmentation, Pytorch, VIT

Last Updated on 16/11/2025 by Eran Feit

Segment Anything

If you’re looking to get high-quality masks without collecting a dataset, Segment Anything Python is the sweet spot. Built as a vision foundation model, SAM was trained on an enormous corpus (11M images, 1.1B masks) and generalizes impressively to new scenes. With simple prompts—or even fully automatic sampling—it produces clean, object-level masks that often rival task-specific models, no fine-tuning required.

Under the hood, SAM’s transformer backbone and data engine make it promptable and robust. In practice, that means you can point it at a single image and get a comprehensive set of masks you can analyze, filter, and export—perfect for rapid prototyping, labeling bootstraps, or production pre-processing pipelines. The vibrant ecosystem and documentation around SAM keep the workflow approachable even if you’re new to segmentation.

About this tutorial

In this tutorial, you’ll segment your images without training by loading the ViT-H checkpoint of SAM in Python, configuring the Automatic Mask Generator, and visualizing the resulting overlays. You’ll check GPU availability, prepare the image (resize + color space), set high-quality defaults for stability and IoU thresholds, and then export a clean PNG of the colored masks. The end result is a reproducible, copy-paste pipeline that turns any image into a stack of usable masks—fast—without a single epoch of fine-tuning.

This tutorial shows how to run Segment Anything (SAM) in Python to generate high-quality masks automatically.
You will load the ViT-H checkpoint, configure the SAM Automatic Mask Generator, and visualize results.
Everything is presented as clean, copy-paste code with human-friendly explanations above each command.
This directly fulfills the promise in the title: a fast, effortless, and proven path to production-ready segmentation with SAM in Python.

For a hands-on classifier example you can pair with SAM masks, see my DenseNet201 sports tutorial here → https://eranfeit.net/how-to-build-a-densenet201-model-for-sports-image-classification/

Link for the video tutorial : https://youtu.be/8ZkKg9imOH8

You can download the code here : https://eranfeit.lemonsqueezy.com/buy/3a0df4b9-e074-49be-be37-7add7d079817

or here : https://ko-fi.com/s/5de0fcc9bc

Post for Medium users : https://medium.com/@feitgemel/segment-anything-python-no-training-image-masks-3785b8c4af78

You can find more tutorials in my blog : https://eranfeit.net/blog/

🚀 Want to get started with Computer Vision or take your skills to the next level ?

If you’re just beginning, I recommend this step-by-step course designed to introduce you to the foundations of Computer Vision – Complete Computer Vision Bootcamp With PyTorch & TensorFlow

If you’re already experienced and looking for more advanced techniques, check out this deep-dive course – Modern Computer Vision GPT, PyTorch, Keras, OpenCV4

Setting up the environment and preparing the image

In this part, you verify CUDA availability, import your libraries, and load an image.
You’ll also resize and convert color space to make plotting and inference smooth and reproducible.
This preparation ensures the Segment Anything Python experience is stable and predictable across machines.

The foundation of any successful computer-vision workflow is a reliable environment.
We begin by validating CUDA availability so you instantly know if GPU acceleration is active.
While SAM can run on CPU, enabling CUDA dramatically accelerates mask generation.
This step avoids surprises later and aligns your expectations with the hardware you have.

Next, we import essential packages: NumPy, Matplotlib, and OpenCV.
Each library has a focused role.
NumPy handles array operations efficiently.
Matplotlib provides quick, publication-ready plots.
OpenCV loads, resizes, and converts images with robust performance.

Image loading and resizing are practical cornerstones for reproducible workflows.
We parameterize the scale so you can adapt to different resolutions without rewriting code.
Working at a smaller scale minimizes memory usage and speeds up visualization when testing.
For production, you can dial the scale up to maximize detail.

Finally, we convert from BGR (OpenCV’s default) to RGB for correct color rendering in Matplotlib.
Consistent color spaces remove confusion when comparing plots across tools.
From here, you have an image ready for downstream segmentation.
This streamlined setup supports the SAM automatic mask generator you’ll configure next.

Here is the test image :

Test Image — Segment Anything Python — No-Training Image Masks 6

### Check and import core libraries for Segment Anything Python.
import torch 
### Print whether CUDA is available for GPU acceleration.
print ("Cuda is available : " , torch.cuda.is_available())

### Import NumPy for array ops, Matplotlib for plotting, and OpenCV for image I/O and preprocessing.
import numpy as np  
import matplotlib.pyplot as plt 
import cv2 

### Import SAM registry and the automatic mask generator interface.
from segment_anything import sam_model_registry, SamAutomaticMaskGenerator

### Define the path to the ViT-H checkpoint, model type, and target device.
sam_checkpoint = "e:/temp/sam_vit_h_4b8939.pth"
model_type = "vit_h"
device = "cuda"

### Read the image from disk using OpenCV.
image = cv2.imread("Best-Semantic-Segmentation-models/Segment-Anything/2-Segment-your-images-without-training/Rahaf.jpg")

### Inspect the original image shape to guide resizing decisions.
print(image.shape)

### Choose a percentage scaling factor for reproducible resizing.
scale_precent = 40 
### Compute new width based on scale percentage.
width = int(image.shape[1]* scale_precent / 100)
### Compute new height based on scale percentage.
height = int(image.shape[0]* scale_precent / 100)
### Bundle width and height into a dimension tuple.
dim = (width, height)

### Resize the image with area interpolation for downscaling quality.
image = cv2.resize(image , dim , interpolation= cv2.INTER_AREA)

### Convert BGR (OpenCV) to RGB (Matplotlib) for correct plot colors.
image = cv2.cvtColor(image , cv2.COLOR_BGR2RGB)
### Set up a square figure for consistent visualization.
plt.figure(figsize=(10,10))
### Show the prepared image to confirm visual correctness.
plt.imshow(image)
### Remove axes for a cleaner, publication-style image.
plt.axis('off')
### Render the plot to screen.
plt.show()

Summary.
You confirmed your environment, imported the right tools, and turned a raw image into a clean, display-ready array.
This sets the baseline for image segmentation without training using SAM.

Want a classic clustering baseline before SAM? Check my K-means segmentation walkthrough → https://eranfeit.net/python-image-segmentation-made-easy-with-opencv-and-k-means-algorithm/

Loading SAM (ViT-H) and configuring the automatic mask generator

Here you load the ViT-H variant via sam_model_registry and move it to your target device.
Then you configure the SamAutomaticMaskGenerator with sensible parameters for quality and speed.
These choices balance performance and accuracy for most images.

SAM provides multiple backbones, and ViT-H offers excellent accuracy for complex scenes.
By selecting model_type = "vit_h" you opt into that capability, ensuring strong mask quality.
The checkpoint path points to the local file, so you can manage versions in a controlled environment.
This aligns with repeatable experiments and production workflows.

Moving the model to the right device is crucial for performance.
If cuda is available, GPU inference yields significant speed-ups.
That translates to faster iteration when tuning parameters like pred_iou_thresh or stability_score_thresh.
It also reduces latency when processing batches of images.

The automatic mask generator is the engine that samples points, predicts masks, and filters results.
Parameters such as points_per_side control how densely SAM probes the image.
Quality filters like pred_iou_thresh and stability_score_thresh remove unstable masks early.
Crops and downscaling options help scale to larger images without runaway memory use.

These defaults are a high-quality starting point.
If you need even denser masks, increase points_per_side.
If you see too many small fragments, raise min_mask_region_area.
For speed-critical paths, reduce sampling density or crop layers.

If you prefer transformer-based classification, compare this ViT tutorial → https://eranfeit.net/build-an-image-classifier-with-vision-transformer/

### Build the SAM model from the registry using the specified ViT-H checkpoint.
sam = sam_model_registry[model_type](checkpoint=sam_checkpoint)
### Move the model to the selected device for faster inference when CUDA is available.
sam.to(device=device)

### Configure the automatic mask generator for robust quality and reasonable speed.
mask_generator_ = SamAutomaticMaskGenerator(
    model=sam,
    points_per_side=32,
    pred_iou_thresh=0.96,
    stability_score_thresh=0.96, 
    crop_n_layers=1,
    crop_n_points_downscale_factor=2,
    min_mask_region_area=100,
)

The SAM automatic mask generator is ready.
You’ve chosen parameters that produce stable masks with minimal noise, suitable for most computer vision segmentation Python workflows.

Generating masks, visualizing overlays, and saving results

In the final step, you run the generator, inspect how many masks were discovered, visualize them as transparent overlays, and save the result.
This gives you immediate insight into segmentation quality and a portable artifact for reports or posts.

Generate(image) runs the full sampling-and-masking pipeline over your preprocessed image.
You’ll receive a list of mask dictionaries with fields like segmentation, area, and bbox.
Counting the masks is a fast sanity check that your parameters are neither too strict nor too permissive.
For many photos, dozens to hundreds of masks are expected.

A compact overlay function makes results instantly interpretable.
Sorting by area shows large regions first, which often correspond to dominant objects in the scene.
Randomized colors keep boundaries recognizable even when masks overlap.
An alpha blend provides context by letting the original image remain visible.

Saving figures is critical for reproducible pipelines.
With one line you export a high-resolution PNG that you can attach to documentation, blog posts, or experiment logs.
This completes the core loop: run SAM, visualize, evaluate, and iterate.
From here, you can integrate post-processing or downstream tasks.

Common refinements include filtering by area, adjusting points per side, or batching multiple images.
Each change is incremental, so it’s easy to track improvements.
This modular approach makes SAM a strong foundation for image segmentation without training.
You now have an end-to-end path from raw image to quality masks.

Curious how CNN backbones compare? See ResNet50 vs MobileNetV2 notes → https://eranfeit.net/tensorflow-image-classification-tutorial-resnet50-vs-mobilenet/

### Run SAM to generate segmentation masks for the prepared image.
masks = mask_generator_.generate(image)
### Print how many masks were found as a quick quality and density check.
print("Total mask discovered : " + str(len(masks)))

### Define a helper to overlay masks as semi-transparent colored regions.
def show_anns(anns):
    ### If no masks, exit early to avoid plotting artifacts.
    if len(anns) == 0:
        return
    ### Sort masks by area so larger regions render first.
    sorted_anns = sorted(anns, key=(lambda x: x['area']), reverse=True)
    ### Grab the current axes and lock autoscaling for layered overlays.
    ax = plt.gca()
    ax.set_autoscale_on(False)
    ### Prepare containers for optional polygon logic and color sampling.
    polygons = []
    color = [] 
    ### Render each mask as a colored alpha overlay.
    for ann in sorted_anns:
        m = ann['segmentation']
        img = np.ones((m.shape[0], m.shape[1], 3))
        color_mask = np.random.random((1,3)).tolist()[0]
        for i in range(3):
            img[:,:,i] = color_mask[i]
        ax.imshow(np.dstack((img, m*0.35)))

### Plot the base image and overlays, then export a PNG for your records.
plt.figure(figsize=(10,10))
plt.imshow(image)
show_anns(masks)
plt.axis('off')
plt.savefig("Best-Semantic-Segmentation-models/Segment-Anything/2-Segment-your-images-without-training/output2.png")
plt.show()

Summary.
You generated, visualized, and saved SAM masks in a single, streamlined pass.
This completes a practical ViT-H SAM tutorial you can adapt to any dataset or pipeline.

Here is the Result :

SAM result 2 — Segment Anything Python — No-Training Image Masks 7

SAM result 1 — Segment Anything Python — No-Training Image Masks 8

Need real-time edge inference ideas? Explore YOLOv8 + Jetson Nano here → https://eranfeit.net/yolov8-object-detection-with-jetson-nano-and-opencv/

FAQ

What is Segment Anything (SAM) used for?

SAM generates object masks from images with minimal prompting. It accelerates labeling and exploration without dataset-specific training.

Do I need CUDA for good performance?

CUDA is not mandatory but strongly recommended. GPU inference cuts runtime and boosts iteration speed.

Which SAM backbone should I choose?

ViT-H offers strong quality with higher memory needs. Smaller backbones fit lighter devices.

How can I reduce tiny noisy masks?

Increase min_mask_region_area and adjust IoU and stability thresholds to filter weak regions.

How do I get more mask coverage?

Raise points_per_side to sample more points and capture finer details.

Why do my colors look wrong in plots?

Convert BGR to RGB before plotting in Matplotlib for accurate colors.

Can I export masks for post-processing?

Yes. Read the ‘segmentation’ arrays from each mask dictionary and save as needed.

How do I keep results consistent across runs?

Use fixed resize scales, stable parameters, and record seeds and versions.

How can I process very large images faster?

Lower sampling density or crop layers, downscale inputs, or batch across GPUs.

Is SAM production-ready?

Yes, with profiling, parameter tuning, and proper exporting of overlays and masks.

Conclusion

You now have a clean, reproducible Segment Anything Python pipeline that loads ViT-H, generates masks automatically, and saves polished overlays.
The three parts intentionally separate environment prep, model configuration, and results so you can tune each piece with clarity.
From here, continue by batching images, exporting binary masks, or integrating downstream tasks like object tracking or dataset bootstrapping.
If you need domain-specific quality, adjust sampling density and thresholds, then profile speed to hit your latency targets.
This practical foundation scales from exploratory notebooks to production services with minimal friction.

Connect

☕ Buy me a coffee — https://ko-fi.com/eranfeit

🖥️ Email : feitgemel@gmail.com

🌐 https://eranfeit.net

🤝 Fiverr : https://www.fiverr.com/s/mB3Pbb

Enjoy,

Eran

Leave a Comment Cancel Reply