Last Updated on 31/01/2026 by Eran Feit
YOLO image segmentation is a practical way to move from “where is the object” to “which exact pixels belong to it.”
Instead of stopping at a bounding box, segmentation gives you a mask that traces the real outline of the target region.
That extra detail matters in computer vision tasks where shape, edges, and fine structures carry the information you care about.
For thin patterns like cracks, scratches, and surface defects, a pixel mask is often more useful than a rectangle because the signal is narrow and irregular.
In a typical workflow, YOLO image segmentation starts with a dataset that includes images and segmentation labels that describe the target pixels.
During training, the model learns to associate visual texture and edges with the correct mask region.
At inference time, the model outputs masks that you can post-process into binary images, overlays, or measurements.
This makes it easy to build an end-to-end pipeline where you detect defects, visualize them, and export results for QA or reporting.
A big advantage of YOLO-style segmentation is the speed-to-accuracy balance.
You get a modern deep learning model that can run fast on a GPU while still producing masks that are detailed enough for real-world use.
That makes it a strong fit for automated inspection scenarios such as concrete crack monitoring, road maintenance analytics, and manufacturing defect checks.
You can also scale it to batches of images and keep everything inside a Python workflow for repeatable experiments.
Once you have predicted masks, you can treat them like any other image in OpenCV and NumPy.
You can resize masks back to the original image resolution, merge multiple detections into one final mask, and save results as PNGs.
You can also compare predictions to ground-truth masks for quick sanity checks or evaluation.
This combination of deep learning output plus classic image processing tools is what makes YOLO image segmentation so flexible in practice.
YOLO image segmentation for crack detection projects
YOLO image segmentation for crack detection projects focuses on isolating crack pixels from everything else in the scene.
Cracks are often thin, low-contrast, and fragmented, which makes them challenging for simple thresholding or edge detection alone.
A segmentation model learns the visual patterns of cracks directly from labeled examples, so it can generalize to new surfaces and lighting conditions.
The end goal is a clean mask that highlights only the crack regions, ready for visualization or measurement.
A high-level crack segmentation pipeline usually has three pillars: data, training, and inference.
Data matters because cracks come in many shapes, widths, and textures, and the background can vary from asphalt to concrete to painted surfaces.
Training matters because segmentation needs enough resolution to capture thin structures without losing them during downsampling.
Inference matters because you often want to export masks, combine multiple detections, and keep outputs consistent across many test images.
The target output is typically a binary mask where crack pixels are “on” and background pixels are “off.”
From that mask, you can compute practical signals like crack area coverage, approximate length, connected components, or defect density per region.
Even without complex metrics, having a reliable mask is a huge win because it turns a visual inspection task into a machine-readable result.
That’s the foundation for building automated QA systems and for tracking defects over time.
At a practical level, crack segmentation is also about controlling false positives and making the masks usable.
Good preprocessing, consistent image sizing, and clear labels help the model learn the difference between cracks and similar patterns like seams, stains, or texture lines.
Post-processing can help too, such as combining masks, cleaning small noise blobs, or smoothing boundaries if needed for reporting.
With a solid YOLO image segmentation setup, you get a repeatable workflow that takes raw images and produces exportable crack masks you can trust.

Training and Testing YOLO Image Segmentation for Crack Detection in Python
This tutorial focuses on building a complete, practical pipeline for crack detection using YOLO image segmentation in Python.
The code is designed to take you from environment setup and dataset preparation all the way to training a custom segmentation model and running inference on new images.
Rather than treating segmentation as a black box, the workflow exposes each step so you understand how the model is configured, trained, and evaluated.
The main goal is to show how a real-world segmentation project is structured in code, not just how to run a single command.
At the training stage, the code loads a pretrained YOLO segmentation model and fine-tunes it on a custom crack dataset.
Key parameters such as image size, batch size, number of epochs, early stopping, and GPU usage are explicitly defined.
This makes the training process reproducible and easy to adjust for different datasets or hardware constraints.
By separating the project directory and experiment name, the code also keeps results organized for future comparison.
The dataset configuration plays a central role in the pipeline.
The YAML file defines where training and validation images are stored, how many classes exist, and how those classes are named.
This clean separation between code and data makes it simple to swap datasets or reuse the same training logic for a different segmentation task.
It also reflects how most production-ready segmentation projects are structured.
On the inference side, the code demonstrates how to load the trained model and apply it to a test image.
Predicted segmentation masks are extracted, resized to match the original image, and combined into a final output mask.
Each mask is saved to disk, making it easy to inspect individual detections or use them in downstream processing.
By the end of the pipeline, you have a clear example of how to go from raw input images to saved segmentation results that can be visualized, evaluated, or integrated into a larger system.
Link to the video tutorial here
Code for the tutorial here or here
My Blog
Link for Medium users here .
Want to get started with Computer Vision or take your skills to the next level ?
Great Interactive Course : “Deep Learning for Images with PyTorch” here
If you’re just beginning, I recommend this step-by-step course designed to introduce you to the foundations of Computer Vision – Complete Computer Vision Bootcamp With PyTorch & TensorFlow
If you’re already experienced and looking for more advanced techniques, check out this deep-dive course – Modern Computer Vision GPT, PyTorch, Keras, OpenCV4

YOLO Image Segmentation for Crack Detection Projects
YOLO image segmentation is one of the fastest ways to turn an input image into a pixel-accurate mask that outlines the exact shape of an object.
Instead of stopping at bounding boxes, segmentation predicts the pixels that belong to the target class, which is what you want when the “shape” matters more than the “location.”
That is especially true for cracks, because cracks are thin, irregular, and often blend into the background texture.
In this tutorial, the goal is simple and practical.
Train a YOLOv11 segmentation model on a custom crack dataset, then run inference on a test image and export the predicted masks.
The code is written to be copy-paste friendly, so you can reuse the same structure for other defect-detection segmentation projects.
Setting up a clean environment for YOLO image segmentation
A stable environment makes segmentation projects much easier to reproduce.
In this part, you create a dedicated Conda environment, confirm CUDA is available, and install the exact library versions used in the tutorial.
The goal is to avoid version drift and surprise dependency conflicts.
Once these packages are installed, your training and inference scripts will behave consistently across runs.
Install : - Create conda enviroment conda create --name YoloV11-311 python=3.11 conda activate YoloV11-311 nvcc --version # Cuda 12.4 conda install pytorch==2.5.0 torchvision==0.20.0 torchaudio==2.5.0 pytorch-cuda=12.4 -c pytorch -c nvidia # install YoloV11 pip install ultralytics==8.3.59 # more : pip install opencv-python==4.10.0.84 Short summary.
You now have a ready environment with PyTorch, CUDA, Ultralytics, and OpenCV installed.
Getting the crack dataset and keeping it organized
Segmentation training depends on labeled masks, so the dataset is the foundation of everything that follows.
If you want the same dataset used in this tutorial, send me an email and I will send you the download link.
Once you have the dataset, the most important thing is to keep the folder structure intact.
YOLO segmentation training expects images and labels arranged in a consistent train and validation structure.
Pointing YOLO to the dataset with a simple config file
The dataset config file is what connects your code to your data.
It tells YOLO where the dataset lives on disk, where train and validation images are located, and what class names exist.
Keeping this in a YAML file is a big quality-of-life improvement.
You can reuse the same training script for other datasets just by swapping the YAML content.
Here is the config.yaml path: 'D:/Data-Sets-Object-Segmentation/crack.v2i.yolov11' train: 'train/images' val: 'valid/images' nc: 1 names: ['crack'] Short summary.
Your training script can now reference one file and instantly know where the dataset is and what it contains.
Loading a pretrained YOLOv11 segmentation model to start training
Starting from a pretrained segmentation model is the fastest path to strong results on a custom dataset.
In this part, the code loads a YOLOv11 segmentation checkpoint and sets up a project folder to keep results organized.
This structure also makes experiments easier to compare later.
You can train multiple runs with different names, batch sizes, or image sizes without overwriting old outputs.
### Import the YOLO class so we can load pretrained segmentation weights and run training. from ultralytics import YOLO ### Wrap the workflow in a main function so it is easy to run and reuse. def main(): ### Load a pretrained YOLOv11 segmentation model as the starting point for fine-tuning. model = YOLO("yolo11s-seg.pt") # Load a pretrained YOLOv11 segmentation model ### Define where training outputs will be saved so runs stay organized. project = "d:/temp/models/Crack-Segmentation-Using-YOLOv11" # Define the project directory ### Give this run a clear name so you can compare multiple experiments later. experiment = "My-model-S" ### Set a batch size that fits your GPU memory and training speed needs. batch_size = 8 Short summary.
You have a pretrained segmentation model loaded and a clean project structure ready for training outputs.
Training YOLOv11 on a crack segmentation dataset
This training call is the heart of the tutorial.
It connects the dataset YAML, defines the number of epochs, sets the image size, and ensures training runs on the GPU if available.
A few parameters here matter a lot in practice.
Image size affects crack detail, patience helps stop early if validation stalls, and batch size controls GPU memory usage and training throughput.
### Train the model using the dataset YAML, controlling epochs, image size, and GPU device. resutls = model.train( ### Point YOLO to the dataset config that defines paths and class names. data = "Best-Semantic-Segmentation-models/Yolo-V11/Crack Segmentation Using YOLOv11 - Custom dataset/config.yaml", ### Set the number of training epochs for fine-tuning on cracks. epochs = 50, # Set the number of training epochs ### Save outputs under a consistent project directory. project = project, # Specify the project directory ### Name this experiment so the run has its own folder. name = experiment, # Name of the experiment ### Use a batch size that balances speed with GPU memory. batch = batch_size, # Set the batch size ### Resize images to a fixed size for training stability and speed. imgsz = 416, # Set the image size for training ### Select the GPU device, using "0" for the first GPU. device = "0", # Specify the device to use (0 for GPU) ### Stop early if validation does not improve for several checks. patience = 5, # Set the patience for early stopping ### Print detailed training logs to track progress. verbose = True, # Enable verbose output ### Enable validation during training to monitor generalization. val = True,) ### Standard Python entry point so this file runs training only when executed directly. if __name__ == "__main__" : # Enable validation during training main() Short summary.
You trained a custom crack segmentation model and saved the best weights into a structured run folder.
Running inference and building a final crack mask
After training, the next step is to load the best weights and run prediction on a new image.
This section reads the image, runs the model, and prepares an empty mask that will accumulate all predicted crack regions.
The core idea is simple and robust.
Every predicted mask is resized back to the original image resolution, then combined into one final binary mask you can save, view, or post-process.
Crack test image :

### Import YOLO so we can load the trained segmentation model for inference. from ultralytics import YOLO ### Import NumPy for mask creation and array operations. import numpy as np ### Import OpenCV for reading images, resizing masks, saving outputs, and visualization. import cv2 ### Import OS utilities for creating output folders reliably. import os ### Define the path to the trained weights produced during training. model_path = "D:/Temp/Models/Crack-Segmentation-Using-YOLOv11/My-model-S/weights/best.pt" # Path to the trained model ### Define the test image path that we will run segmentation on. image_path = "Best-Semantic-Segmentation-models/Yolo-V11/Crack Segmentation Using YOLOv11 - Custom dataset/test_image.jpg" ### Create an output folder for masks so saving never fails. os.makedirs("d:/temp/Fiber-Segment", exist_ok=True) ### Read the input image from disk using OpenCV. img = cv2.imread(image_path) # Read the input image ### Extract original image dimensions so we can resize masks back correctly. H, W, _ = img.shape # Get the dimensions of the image ### Load the trained YOLOv11 segmentation model from best.pt. model = YOLO(model_path) # Load the trained YOLOv11 model ### Run inference on the input image to get predicted boxes and masks. results = model(img) # Perform inference on the input image ### Grab the first result item for a single image inference call. result = results[0] # Get the first result from the inference ### Read the class-name dictionary from the model for readable outputs. names = model.names # Get the class names from the model ### Print the available class names to confirm label mapping. print("Classes:", names) # Print the class names ### Create an empty mask that will hold the merged segmentation output. final_mask = np.zeros((H, W), dtype=np.uint8) Short summary.
You loaded the trained weights, ran inference on a test image, and prepared the final mask canvas for accumulation.
Saving predicted masks and visualizing the results
This final part extracts each predicted mask, rescales it, and writes it to disk as a PNG.
It also merges multiple masks into a single final mask so the output is easy to use in downstream steps.
This is also where you confirm the model output is sensible.
By saving individual masks and the merged final mask, you get fast debugging signals and a clean artifact you can compare against ground truth masks later.
### Extract the predicted class IDs so each mask can be mapped to a class name. predicted_classes = result.boxes.cls.cpu().numpy() # Get the predicted classes print("Predicted classes:", predicted_classes) # Print the predicted classes ### Loop over every predicted mask instance in the result. for j , mask in enumerate(result.masks.data): ### Convert the mask tensor to a NumPy array and scale values to 0-255 for saving. mask = mask.cpu().numpy()* 255 # Convert the mask to a numpy array and scale it to 255 ### Convert the predicted class value to an integer index. classID = int(predicted_classes[j]) # Get the class ID for the mask ### Print a readable line showing which object was detected and its class name. print("Object "+ str(j) + " detected as " + str(classID) + " - " + names[classID]) # Print the class ID and name ### Resize the predicted mask back to the original image size. mask = cv2.resize(mask, (W, H)) # Resize the mask to match the original image dimensions ### Merge this mask into the final output mask using a pixelwise max. final_mask = np.maximum(final_mask , mask) ### Build a unique output filename for this instance mask. file_name = "output" +str(j) + ".png" # Create a filename for the mask ### Save the instance mask to disk for inspection and reuse. cv2.imwrite("d:/temp/Fiber-Segment/" + file_name, mask) # Save the mask to disk ### Save the merged mask that represents all crack pixels detected in the image. cv2.imwrite("d:/temp/Fiber-Segment/final_mask.png", final_mask) # Save the final mask ### Display the final merged mask in a window for quick validation. cv2.imshow("Final Mask", final_mask) # Show the final mask ### Display the original image side-by-side with the mask window. cv2.imshow("Input Image", img) # Show the input image ### Pause until a key is pressed so you can inspect the output. cv2.waitKey(0) # Wait for a key press ### Close OpenCV windows cleanly when done. cv2.destroyAllWindows() # Close all OpenCV windows The result : (final mask)

Short summary.
You exported per-instance masks, saved a merged final mask, and displayed the results for quick visual validation.
YOLO Image Segmentation – FAQ
What is YOLO image segmentation in simple terms?
YOLO image segmentation predicts pixel masks instead of only bounding boxes. This is ideal when you need the exact shape of a target like thin cracks.
Why is segmentation better than detection for cracks?
Cracks are thin and irregular, so boxes include lots of background. Masks isolate only crack pixels, which is better for visualization and measurement.
Do I need a special dataset format for YOLO segmentation?
Yes. The dataset must follow a YOLO segmentation format and folder layout. The YAML file then points training to the correct train and validation paths.
What does the config.yaml file control?
It defines the dataset root path, train and validation directories, and class names. A wrong YAML usually causes missing files or incorrect labels.
Why start from yolo11s-seg.pt instead of training from scratch?
Pretrained segmentation weights typically converge faster and need fewer labeled images. Fine-tuning helps the model adapt to crack textures quickly.
What does imgsz change during training?
imgsz sets the training image resolution. Higher values can preserve more crack detail but require more GPU memory and can slow training.
What does patience do in model.train?
patience enables early stopping when validation stops improving. It saves time and can reduce overfitting on small segmentation datasets.
Why resize each predicted mask back to the original image size?
Predicted masks may be produced at a different internal resolution. Resizing aligns the mask with the original pixels for saving, overlays, and comparisons.
Why merge masks with np.maximum?
np.maximum merges multiple instance masks into one final mask without losing pixels. This is useful when cracks appear as separate predicted segments.
What is the fastest way to debug poor crack masks?
Save per-instance masks and the merged mask, then compare them to the input image. If alignment is off, recheck labeling consistency and resizing.
Conclusion
YOLO image segmentation is a strong fit for crack detection because it focuses on the pixels that actually matter.
When the target is thin and irregular, masks provide a clearer and more actionable output than bounding boxes.
This tutorial’s code shows a complete workflow that you can reuse, starting from environment setup and ending with saved mask files you can inspect or analyze.
The training section demonstrates a clean pattern that scales well.
A simple YAML file defines the dataset, the training call holds the key hyperparameters, and the output folder structure keeps experiments organized.
Once you have this pattern working, you can iterate by adjusting image size, epochs, or early stopping to match your dataset size and crack detail level.
The inference section turns the model into something usable.
You load best.pt, run prediction on a real test image, extract masks, resize them, and export them as PNGs.
That final step matters because it turns model outputs into artifacts you can share, validate, compare, and integrate into a larger inspection system.
If you want to take the next step, you can expand the same pipeline to batch processing or video.
You can also add evaluation logic, compare predicted masks against ground truth masks, and track improvements across experiments.
The core structure stays the same, which is what makes this tutorial a solid foundation for real crack segmentation projects.
Connect
☕ Buy me a coffee — https://ko-fi.com/eranfeit
🖥️ Email : feitgemel@gmail.com
🤝 Fiverr : https://www.fiverr.com/s/mB3Pbb
Enjoy,
Eran
