One-Click Segment Anything in Python (SAM ViT-H)

Leave a Comment / Image Segmentation, Pytorch

Last Updated on 15/11/2025 by Eran Feit

Segment Anything in Python — Fast, One-Click Results

Segment Anything in Python lets you segment any object with a single click using SAM ViT-H, delivering three high-quality masks instantly.
In this tutorial, you’ll set up the environment, load the checkpoint, click a point, and export overlays—clean, practical code included.
Whether you’re labeling datasets or prototyping, this one-click workflow is quick, reliable, and easy to reuse.

Segment Anything in Python builds on a powerful promptable segmentation pipeline: a ViT-H image encoder extracts features once, a lightweight prompt encoder turns your click into guidance, and a mask decoder returns multiple high-quality candidates. This tutorial shows the exact flow—load the checkpoint, set the image, provide a single positive point, and review three masks with scores—so you can pick the cleanest boundary without manual tracing.

Segment Anything in Python is also practical beyond demos: you’ll learn how to avoid OpenCV headless conflicts, run on CPU/GPU/MPS, and export overlays for quick sharing. We also cover adding negative points to suppress spillover, saving binary masks for downstream tasks, and keeping your run reproducible with clear paths and model_type matching. Use it to bootstrap datasets, refine labels, or prototype segmentations in seconds.

For a deeper dive into automatic mask creation from detections, see my post on YOLOv8 object detection with Jetson Nano and OpenCV.

You can download the code here : https://eranfeit.lemonsqueezy.com/buy/f9e60e40-bab1-42a0-9765-6d89ec7ec35e or here : https://ko-fi.com/s/68825d6267

You can find more tutorials in my blog : https://eranfeit.net/blog/

🚀 Want to get started with Computer Vision or take your skills to the next level ?

If you’re just beginning, I recommend this step-by-step course designed to introduce you to the foundations of Computer Vision – Complete Computer Vision Bootcamp With PyTorch & TensorFlow

If you’re already experienced and looking for more advanced techniques, check out this deep-dive course – Modern Computer Vision GPT, PyTorch, Keras, OpenCV4

Part 1 — Let’s set up the environment the simple way

What you’ll do here:
Create a conda environment, install PyTorch (CUDA optional), and add the key libraries: opencv-python, matplotlib, and segment-anything.
These steps make your runtime stable and reproducible.

Summary:
You’re creating an isolated Python 3.9 environment, ensuring compatible PyTorch/CUDA, installing OpenCV + Matplotlib, and pulling SAM directly from the official repo.

### Create a fresh Python 3.9 environment for stability.
conda create --name SAM-Tutorial python=3.9

### Activate the environment so all installs go here.
conda activate SAM-Tutorial

### (Optional) Check CUDA version if you plan to use GPU acceleration.
nvcc --version

### Install PyTorch 2.2.2 with CUDA 11.8 (adjust if your CUDA differs or use CPU builds).
conda install pytorch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 pytorch-cuda=11.8 -c pytorch -c nvidia

### Install core Python libs used in this tutorial.
pip install opencv-python matplotlib

### Install SAM directly from Meta’s official repository.
pip install git+https://github.com/facebookresearch/segment-anything.git

### (Optional) Useful for other CV utilities (not required below but handy).
pip install supervision

### If OpenCV-headless conflicts, force the GUI build to ensure imshow windows work.
pip uninstall -y opencv-python-headless
pip uninstall -y opencv-python
pip install opencv-python

Short recap: after this step, your machine is ready to run SAM and display interactive windows.

Part 2 — Imports and small helper functions to visualize points & masks

What you’ll do here:
Import NumPy, PyTorch, Matplotlib, OpenCV, then add three tiny helpers to draw masks, points, and boxes.
These functions make SAM’s results easy to see.

Summary:
You’ll visualize the clicked point (green star), optional negatives (red), and overlay semi-transparent masks on the image.

### Work with arrays, tensors, and plotting utilities.
import numpy as np
import torch
import matplotlib.pyplot as plt
import cv2

### Overlay a semi-transparent mask on the current Matplotlib Axes.
def show_mask(mask, ax):
    color = np.array([30/255, 144/255, 255/255, 0.6])
    h, w = mask.shape[-2:]
    mask_image = mask.reshape(h, w, 1) * color.reshape(1, 1, -1)
    ax.imshow(mask_image)

### Draw positive (green) and negative (red) clicks on the image.
def show_points(coords, labels, ax, marker_size=375):
    pos_points = coords[labels == 1]
    neg_points = coords[labels == 0]
    ax.scatter(pos_points[:, 0], pos_points[:, 1], color='green', marker='*',
               s=marker_size, edgecolor='white', linewidth=1.25)
    ax.scatter(neg_points[:, 0], neg_points[:, 1], color='red', marker='*',
               s=marker_size, edgecolor='white', linewidth=1.25)

### (Optional) Show a guiding bounding box (not required for single-click flow).
def show_box(box, ax):
    x0, y0 = box[0], box[1]
    w, h = box[2] - box[0], box[3] - box[1]
    ax.add_patch(plt.Rectangle((x0, y0), w, h,
                               edgecolor='green', facecolor=(0, 0, 0, 0), lw=2))

Short recap: your visual overlays are ready—clicks and masks will be easy to inspect.

If you prefer a full framework, check out Detectron2 panoptic segmentation made easy for beginners for training-ready pipelines.

Part 3 — Click once on the image to tell SAM what to segment

What you’ll do here:
Load an image, open an OpenCV window, and click the object once.
Press q to confirm and capture the coordinates.

Summary:
You’ll build a tiny helper function that returns the (x, y) coordinates of your click—SAM’s only required input in this flow.

Here is our test image :

Test image

### Choose the image you want to segment (update the path to your file).
image_path = "Best-Semantic-Segmentation-models/Segment-Anything/3-Segment Anything with one mouse click/Dori.jpg"

### Read and convert BGR → RGB for Matplotlib display later.
image_bgr = cv2.imread(image_path)
image_rgb = cv2.cvtColor(image_bgr, cv2.COLOR_BGR2RGB)

### Let the user click a single point; press 'q' to finalize.
def get_clicked_point(image_path):
    chosenX, chosenY = 0, 0
    img = cv2.imread(image_path)

    cv2.putText(img, "Click on the object, then press 'q' to exit.",
                (20, 50), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 255), 2)

    def mouse_callback(event, x, y, flags, param):
        nonlocal chosenX, chosenY
        if event == cv2.EVENT_LBUTTONDOWN:
            cv2.circle(img, (x, y), 10, (0, 255, 0), -1)
            print("Clicked:", x, y)
            chosenX, chosenY = x, y

    cv2.namedWindow("Select Point")
    cv2.setMouseCallback("Select Point", mouse_callback)

    while True:
        cv2.imshow("Select Point", img)
        key = cv2.waitKey(1) & 0xFF
        if key == ord('q'):
            break

    cv2.destroyAllWindows()
    return chosenX, chosenY

### Run the point picker.
x, y = get_clicked_point(image_path)
print("Chosen clicked point:", f"{x},{y}")

Short recap: you now have a single (x, y) pointing to the object—SAM will do the rest.

Want point-based interaction in videos? See Segment Anything in Python — no training, instant masks for more live demos.

Part 4 — Load SAM ViT-H and prepare the predictor (CPU/GPU friendly)

What you’ll do here:
Load the SAM checkpoint (ViT-H), move it to GPU if available, and attach a SamPredictor.
Then set the current image so SAM can compute features.

Summary:
This step binds the model + image together and readies the predictor for your single click.

### Point to your downloaded SAM ViT-H checkpoint (edit path as needed).
path_for_sam_model = "e:/temp/sam_vit_h_4b8939.pth"

### Pick the best available device automatically (CUDA > MPS > CPU).
device = "cuda" if torch.cuda.is_available() else ("mps" if torch.backends.mps.is_available() else "cpu")
print("Using device:", device)

### Model type must match the checkpoint; 'vit_h' is correct for sam_vit_h_4b8939.pth.
model_type = "vit_h"

### Import SAM registry and predictor.
from segment_anything import sam_model_registry, SamPredictor

### Build and load the model weights.
sam = sam_model_registry[model_type](checkpoint=path_for_sam_model)

### Send the model to the selected device.
sam.to(device=device)

### Create the predictor and set the RGB image (as NumPy HWC).
predictor = SamPredictor(sam)
predictor.set_image(image_rgb)

Short recap: SAM is loaded, on the right device, and primed with your image.

If you’re exploring medical or structured masks, compare with U-Net medical segmentation with TensorFlow & Keras.

Part 5 — Predict masks from your single click and save all results

What you’ll do here:
Turn your (x, y) into SAM inputs, get three candidate masks, show them, and save each result.
You’ll see mask scores to help you pick your favorite.

Summary:
You’ll get three high-quality segmentations and PNGs saved to disk for later use.

### Pack the single positive point for SAM: label 1 = foreground.
input_point = np.array([[x, y]])
input_label = np.array([1])  # 1 for foreground, 0 for background

### Visualize your click on the original image.
plt.figure(figsize=(10, 10))
plt.imshow(image_rgb)
show_points(input_point, input_label, plt.gca())
plt.axis('on')
plt.show()

### Ask SAM for up to three masks (multimask_output=True).
masks, scores, logits = predictor.predict(
    point_coords=input_point,
    point_labels=input_label,
    multimask_output=True
)

### Confirm shapes and quality scores.
print("Masks shape:", masks.shape)  # (3, H, W)
print("Scores:", scores)

### Overlay and save each candidate mask; pick your favorite by score/visual.
for i, (mask, score) in enumerate(zip(masks, scores)):
    plt.figure(figsize=(10, 10))
    plt.imshow(image_rgb)
    show_mask(mask, plt.gca())
    show_points(input_point, input_label, plt.gca())
    plt.title(f"Mask {i+1}, Score: {score:.3f}", fontsize=18)
    plt.axis('off')
    out_path = f"Best-Semantic-Segmentation-models/Segment-Anything/3-Segment Anything with one mouse click/output{i}.png"
    plt.savefig(out_path, bbox_inches='tight', pad_inches=0.0)
    plt.show()

Short recap: you now have three crisp segmentations saved—choose the best and keep creating.

Here is the result :

SAM Output image 1 — One-Click Segment Anything in Python (SAM ViT-H) 7

SAM Output image 2 — One-Click Segment Anything in Python (SAM ViT-H) 8

SAM Output image 3 — One-Click Segment Anything in Python (SAM ViT-H) 9

Next, try improving mask quality with post-processing or super-resolution: upscale your images and videos using super-resolution.

FAQ :

What is Segment Anything (SAM)?

SAM is a general-purpose segmentation model that returns object masks from simple prompts like a single click. It’s ideal for fast labeling and prototyping.

Which SAM model type should I use?

Use ViT-H for best quality. Use ViT-L/B for lower memory. Match model_type to your checkpoint name.

Do I need a GPU?

No, but GPU or Apple MPS speeds up inference significantly. CPU works, just slower.

How do I pick the best mask?

Compare the three candidates by score and visual quality. Choose the one that cleanly captures your object.

Can I add negative clicks?

Yes. Label 0 for background to suppress unwanted regions. Mix positives and negatives for precision.

Why does imshow not open?

Use opencv-python (GUI) instead of the headless build. The post includes a cleanup step.

Where should I save the checkpoint?

Anywhere. Update the code’s path_for_sam_model to match your file location.

Can I export masks as PNGs?

Yes. The code saves overlay images. You can also save binary masks by converting to 0/255 and writing with OpenCV.

Does SAM support boxes?

Yes. SAM accepts points and bounding boxes. Boxes help guide segmentation when objects are crowded.

Is this approach good for dataset labeling?

Absolutely. One-click masks are a quick way to bootstrap datasets or refine labels with minimal effort.

Conclusion

You’ve just built a complete one-click segmentation tool around SAM ViT-H in Python.
The workflow is intentionally lightweight: create an environment, install SAM, click a point, and export masks.
Because SAM generalizes broadly, it’s excellent for new domains where you don’t have labeled data yet.
From here, you can add negative clicks for refinement, use bounding boxes, or integrate with super-resolution and post-processing to lift mask quality even further.

If you plan to use this in production, consider wrapping the flow in a small GUI, storing your clicks/masks, and adding batch processing for entire image sets.
For research, this pipeline is a fantastic way to prototype and compare segmentations across different scenes quickly.

Connect :

☕ Buy me a coffee — https://ko-fi.com/eranfeit

🖥️ Email : feitgemel@gmail.com

🌐 https://eranfeit.net

🤝 Fiverr : https://www.fiverr.com/s/mB3Pbb

Enjoy,

Eran

Leave a Comment Cancel Reply