Object Detection in 15 minutes with YOLOv5 & Python

Leave a Comment / Uncategorized, Pytorch

Last Updated on 19/11/2025 by Eran Feit

Introduction to YOLOv5 object detection in Python

Object detection has become one of the most practical ways to bring computer vision into real-world projects, and YOLOv5 makes it surprisingly accessible. With just a few lines of code, you can go from a raw image to a set of detected objects, each wrapped in a bounding box and labeled with a class name. In this tutorial, you’ll see how yolov5 object detection python works end to end: from setting up the Conda environment and installing PyTorch with CUDA, through loading a pretrained model, to visualizing the detections with OpenCV.

At a high level, YOLOv5 is a real-time object detection model implemented in PyTorch that takes an image, divides it into a grid, and predicts bounding boxes and class probabilities in a single forward pass. Instead of running multiple stages or sliding windows over the image, the entire detection task is handled by one neural network, which is why it’s both fast and efficient for many applications like surveillance, sports analytics, and simple proof-of-concept projects.

In Python, the workflow feels very natural. You load an image with OpenCV, send it through a YOLOv5 model loaded from the PyTorch Hub, and receive back a rich results object containing labels, confidence scores, and normalized coordinates. From there, it’s just standard NumPy and OpenCV operations: convert coordinates to pixel values, loop over detections that pass a confidence threshold, and draw rectangles and class names directly onto the original image.

The code you’re working with focuses on a single image, which is perfect for beginners and for building a clear mental model of the pipeline. You read an image from disk, run inference once, inspect the labels, classes, and coordinates, and then visualize everything in an OpenCV window. This small but complete example demonstrates the core ideas behind YOLOv5 object detection in Python, and lays the groundwork for expanding into videos, webcams, or real-time applications later on.

If you enjoy learning object detection pipelines, you might also like my Detectron2 object detection tutorial for beginners and my SSD MobileNet V3 object detection guide.

Yolo V5

Getting comfortable with YOLOv5 object detection in Python

When people talk about “YOLOv5 object detection in Python,” they usually mean exactly the workflow you’re implementing: set up a Conda environment, install PyTorch with CUDA support, clone the YOLOv5 repository, and then write a short script that loads a pretrained model and runs inference on an image. The target is to have a minimal yet realistic pipeline that shows how deep learning–based object detection actually works, without needing to train a model from scratch.

The core idea is simple: you provide an image as input, and YOLOv5 returns a list of objects it sees. Under the hood, the model uses a convolutional neural network backbone to extract features from the image, a neck to combine multi-scale information, and a detection head that predicts bounding boxes and class scores at multiple scales. But from the Python side, you mostly interact with a friendly API: model = torch.hub.load(...), results = model(img), and then you explore the results object to pull out labels and coordinates.

At a higher level, this pipeline solves a very practical problem: turning raw pixel data into structured information. Instead of “just an image,” you now have machine-readable data like “person at coordinates (x1, y1, x2, y2)” or “dog with 0.87 confidence.” That structured output is what allows you to build downstream logic: counting people, tracking cars, flagging specific objects, or overlaying analytics on top of videos. Even in this simple example, looping over n = len(labels) and drawing bounding boxes is already a small analytics layer.

The Python ecosystem is what makes this so productive. Conda handles the environment and CUDA-compatible PyTorch build; libraries like NumPy and OpenCV handle image arrays and visualization; YOLOv5 provides the pretrained model and the detection logic; and everything is glued together in a single script you can run in under a minute once the environment is ready. That combination is exactly why yolov5 object detection python has become such a common phrase in tutorials, GitHub repos, and real-world projects.

YoloV5

Building a simple YOLOv5 object detection script in Python

This tutorial walks through a complete, minimal example of yolov5 object detection python using a single image.
The idea is to take you from a fresh Conda environment to a working script that loads a pretrained YOLOv5 model, runs inference on an image, and displays the detected objects with bounding boxes and labels.
Instead of diving into theory or training, the focus here is on a practical pipeline you can understand, modify, and reuse in your own projects.

The code starts by creating and activating a dedicated Conda environment, then installing PyTorch with CUDA support so you can take advantage of your GPU if you have one available.
After that, the YOLOv5 repository is cloned and all the required Python packages are installed: NumPy for array handling, OpenCV for image operations and display, Matplotlib and TensorBoard for potential visualization, and Albumentations for future data augmentation if you decide to extend the project.
This setup section ensures the rest of the tutorial runs smoothly without version conflicts.

Once the environment is ready, the script loads an image from disk using OpenCV and prepares it for inference.
The YOLOv5 model is pulled directly from the PyTorch Hub, using the small variant (yolov5s) that is fast and lightweight while still providing solid detection quality.
The device selection logic checks whether CUDA is available and moves the model to GPU if possible, falling back to CPU otherwise, so the same code can run on different machines without changes.

The heart of the tutorial is the inference and post-processing stage.
The model receives the image, returns a results object, and the code extracts labels, class names, and normalized coordinates from it.
A loop iterates over all detected objects, filters them by confidence threshold, converts the normalized coordinates to pixel values, and draws rectangles and class names on the original image.
Finally, OpenCV opens a window to show the annotated image, giving you instant visual feedback that the pipeline works.
By the end of the tutorial, you have a clean, readable script that demonstrates end-to-end YOLOv5 object detection in Python and can easily be adapted to process multiple images or video frames.

To keep practicing OpenCV drawing and contour logic, check out my tutorials on coin detection with Python and OpenCV and text detection with OpenCV and EasyOCR.

Link for the video tutorial : https://youtu.be/f0lu5jNZLdg

Code for the tutorial here : https://eranfeit.lemonsqueezy.com/buy/c9e3e16a-bb09-4ba2-8248-bc7f9a2f09b9

or here : https://ko-fi.com/s/2ab4512e50

Link for Medium users : https://medium.com/@feitgemel/object-detection-in-15-minutes-with-yolov5-python-5d2191bcd71d

You can follow my blog here : https://eranfeit.net/blog/

Want to get started with Computer Vision or take your skills to the next level ?

If you’re just beginning, I recommend this step-by-step course designed to introduce you to the foundations of Computer Vision – Complete Computer Vision Bootcamp With PyTorch & TensorFlow

If you’re already experienced and looking for more advanced techniques, check out this deep-dive course – Modern Computer Vision GPT, PyTorch, Keras, OpenCV4

Introduction to YOLOv5 object detection in Python

Object detection is all about turning raw images into meaningful lists of objects, classes, and locations.
With YOLOv5 and Python, you can do this in just a few lines of code, without training a model from scratch.

In this tutorial, you will build a small but complete yolov5 object detection python script.
The script loads a pretrained YOLOv5 model, runs inference on a single image, and draws labeled bounding boxes with OpenCV.

The goal is to give you a clear, copy-paste-ready pipeline you can reuse in your own projects.
Once you understand how this single-image example works, you can easily extend it to videos, webcams, or batch processing.

Everything is organized into three parts.
First you prepare the environment, then you load the model and run inference, and finally you inspect the results and visualize them on the image.

Building a simple YOLOv5 object detection script in Python

The code you will see in this post focuses on a minimal, practical pipeline.
You set up a dedicated Conda environment, install PyTorch with CUDA support, and bring in all the Python libraries required by YOLOv5 and OpenCV.

Then you load a test image from disk, send it through a pretrained YOLOv5s model from the PyTorch Hub, and receive a results object back.
From that results object, you extract labels, class names, and bounding box coordinates in a NumPy-friendly format.

Finally, you loop over all detections, filter them by confidence, convert normalized box coordinates into pixel positions, and draw rectangles and labels on top of the original image.
The end result is a simple window showing your input image with YOLOv5 detections overlaid, demonstrating the full yolov5 object detection python workflow from start to finish.

Getting your YOLOv5 environment ready

In this first part, you will create a clean Conda environment, clone the YOLOv5 repository, install PyTorch with CUDA support, and add all required Python libraries.
This ensures your script runs reproducibly without version conflicts.

### Create a new Conda environment dedicated to YOLOv5 object detection in Python.   conda create --name YoloV5 python=3.8  ### Activate the new Conda environment so all next installations stay isolated.   conda activate YoloV5  ### Move into the folder where you want to keep your YOLOv5 and other computer vision projects.   cd Cool-Python  ### Clone the official YOLOv5 repository from GitHub so you have the latest scripts and configs locally.   git clone https://github.com/ultralytics/yolov5.git  ### Check that the NVIDIA CUDA compiler is visible so PyTorch can use your GPU if available.   nvcc --version  ### Install a compatible PyTorch build with CUDA 11.8 support plus torchvision and torchaudio from the correct channels.   conda install pytorch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 pytorch-cuda=11.8 -c pytorch -c nvidia  ### Install GitPython so YOLOv5 utilities can interact with Git repositories when needed.   pip install gitpython>=3.1.30  ### Install Matplotlib for plotting and visualizing results or debugging later.   pip install matplotlib>=3.3  ### Install NumPy for fast numerical operations and array handling throughout your workflow.   pip install numpy>=1.22.2  ### Install OpenCV to load images, display windows, and draw bounding boxes on frames.   pip install opencv-python>=4.1.1  ### Install Pillow for additional image file format support and conversions.   pip install Pillow>=10.0.1  ### Install psutil so you can monitor CPU, RAM, and other system resources if you extend the script.   pip install psutil  ### Install PyYAML for reading and writing configuration files commonly used in deep learning projects.   pip install PyYAML>=5.3.1  ### Install the Requests library so you can easily download files or interact with web APIs if needed.   pip install requests>=2.23.0  ### Install SciPy to enable more advanced scientific and numerical operations when your project grows.   pip install scipy>=1.4.1  ### Install thop so you can compute FLOPs and model complexity for performance analysis.   pip install thop>=0.1.1  ### Install tqdm for clean progress bars during long operations like training or dataset processing.   pip install tqdm>=4.64.0  ### Install the Ultralytics library which you can use later for YOLOv8 or other utilities in your computer vision projects.   pip install ultralytics==8.0.235  ### Install TensorBoard to log and visualize training metrics if you move from inference to training.   pip install tensorboard  ### Install Albumentations to perform powerful data augmentation pipelines for images.   pip install albumentations  ### Remove the headless OpenCV build to prevent conflicts with the full OpenCV package.   pip uninstall opencv-python-headless  ### Uninstall any previous OpenCV installation to ensure a clean reinstall.   pip uninstall opencv-python  ### Reinstall the correct OpenCV build that includes GUI support for image display.   pip install opencv-python>=4.1.1

This block gives you a reproducible environment for YOLOv5 and OpenCV.
Once everything installs successfully, you are ready to write and run the detection script itself.

Once you are comfortable with this basic YOLOv5 script, you can move on to more advanced workflows like pairing YOLOv8 with Segment Anything for fast masks or auto-labeling segmentation datasets with YOLOv8.

Loading your image and running YOLOv5

In the second part, you import the required Python libraries, load an input image, create the YOLOv5 model from the PyTorch Hub, and run inference on a single frame.
This is the heart of the yolov5 object detection python pipeline.

Here is the test image :

Haverim Test Image — Test Image

### Import the PyTorch library to handle tensors, models, and GPU acceleration. import torch   ### Import NumPy for numerical operations and convenient array handling. import numpy as np   ### Import OpenCV for reading images, drawing boxes, and displaying results. import cv2    ### Define the path to the input image you want to run object detection on. imgPath = "haverim.jpg"  ### Read the image from disk using OpenCV and store it as a NumPy array. img = cv2.imread(imgPath)   ### Load the YOLOv5s model from the official Ultralytics repository via the PyTorch Hub. ### The 'pretrained=True' flag downloads and uses weights trained on COCO. model = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)  ### Choose CUDA if a compatible GPU is available, otherwise fall back to CPU. device = 'cuda' if torch.cuda.is_available() else 'cpu'  ### Move the model to the selected device so inference runs on GPU or CPU accordingly. model.to(device)   ### Wrap the image in a list because the YOLOv5 model expects a batch of images. imgForModel = [img]   ### Print a small header so it is clear when results are being shown in the console. print("Results : ")  ### Run the YOLOv5 model on the image batch to perform object detection. results = model(imgForModel)  ### Print the raw results object for inspection and debugging. print(results)

At this point, you already have a working YOLOv5 inference step.
The results object contains bounding boxes, confidence scores, and class indices for every detected object in the image.

Extracting detections and drawing bounding boxes

In the final part, you extract labels and coordinates from the results object, convert normalized values to pixel positions, and visualize everything on the original image with OpenCV.
This is where the console output becomes an actual detection overlay you can see.

### Select the label indices for the first image in the batch from the results object. ### The '-1' index grabs the final column, which stores class IDs for each detection. lables = results.xyxyn[0][:, -1]  ### Print the raw label indices so you can see which classes were detected. print(lables)   ### Grab the dictionary that maps YOLOv5 class indices to human-readable class names. classes = model.names  ### Print a small header to separate the class list output. print("Classes : ")  ### Print all available class names that this YOLOv5 model knows about. print(classes)   ### Convert the first label index into an integer and map it to its class name. class0 = classes[int(lables[0])]  ### Convert the eighth label index into an integer and map it to its class name. class7 = classes[int(lables[7])]  ### Print the class name corresponding to the first detected object. print(class0)  ### Print the class name corresponding to the eighth detected object. print(class7)   ### Extract all bounding box coordinates and confidence scores, excluding the class index column. cords = results.xyxyn[0][:, :-1]  ### Print a header so the coordinate output is easy to spot in the console. print("Coordinates : ")  ### Print the coordinates and scores for every detection. print(cords)   ### Compute how many objects were detected in total. n = len(lables)  ### Extract the width (x_shape) and height (y_shape) of the original image. x_shape, y_shape = img.shape[1], img.shape[0]   ### Loop over all detected objects one by one. for i in range(n):     ### Take the row corresponding to the i-th detection, including box and score.     row = cords[i]      ### Only keep detections whose confidence score (row[4]) is at least 0.2.     if row[4] >= 0.2:         ### Convert the normalized x1 coordinate to an absolute pixel position.         x1 = int(row[0] * x_shape)          ### Convert the normalized y1 coordinate to an absolute pixel position.         y1 = int(row[1] * y_shape)          ### Convert the normalized x2 coordinate to an absolute pixel position.         x2 = int(row[2] * x_shape)          ### Convert the normalized y2 coordinate to an absolute pixel position.         y2 = int(row[3] * y_shape)          ### Define the bounding box color in BGR format (blue in this case).         box_color = (255, 0, 0)          ### Draw a rectangle on the image at the computed pixel coordinates.         cv2.rectangle(img, (x1, y1), (x2, y2), box_color, 2)          ### Look up the class name for the current detection and prepare it as a label.         className = classes[int(lables[i])]          ### Draw the class name text at the top-left corner of the bounding box.         cv2.putText(img, className, (x1, y1), cv2.FONT_HERSHEY_COMPLEX, 1, box_color, 2)   ### Display the final annotated image in a window titled "img". cv2.imshow("img", img)  ### Wait indefinitely (0) for any key press before closing the image window. cv2.waitKey(0)  ### Close all OpenCV windows and clean up GUI resources. cv2.destroyAllWindows()

This loop walks through each detection, filters low-confidence predictions, and draws clear boxes and labels on the original image.
By the end, you have a working yolov5 object detection python demo showing exactly what the model sees.

Here is the result :

Object detection result

FAQ :

What is YOLOv5 object detection in Python?

YOLOv5 object detection in Python refers to using the PyTorch-based YOLOv5 model inside a Python script to detect objects, returning bounding boxes and class labels for each detection.

Do I need CUDA to run this YOLOv5 tutorial?

CUDA is not mandatory, but it significantly speeds up inference when you have a compatible NVIDIA GPU. Without CUDA, the same script will run on CPU with slower processing times.

Why do we wrap the image in a list before passing it to the model?

YOLOv5 expects a batch of images as input, so wrapping a single image in a list creates a batch of size one and keeps the same interface for multiple images later.

What is the purpose of the confidence threshold in the loop?

The confidence threshold removes low-confidence detections so that only predictions with a reasonable probability are drawn on the image, improving readability.

Can I change the color and thickness of the bounding boxes?

Yes, you can edit the BGR color tuple and the thickness argument in the cv2.rectangle call to customize how the bounding boxes look on the image.

How can I save the output image with detections?

Instead of or in addition to cv2.imshow, you can call cv2.imwrite with a desired file path to save the annotated image with all drawn boxes and labels.

What should I do if the script cannot find my image?

Check that the imgPath matches your actual file location and that you are running the script from the correct working directory, then print the resolved path for debugging if needed.

Is this YOLOv5 example suitable for production use?

The example is designed as a learning and prototyping script, and for production you would typically refactor it into reusable functions, add logging, and handle errors and performance tuning.

Can I use a different YOLOv5 model size instead of yolov5s?

Yes, you can swap ‘yolov5s’ with variants like ‘yolov5m’ or ‘yolov5l’ in the torch.hub.load call to trade off speed for accuracy, as long as your hardware can handle the larger models.

How do I extend this script to detect objects in video files?

To process video, open a VideoCapture object on your file, read frames in a loop, run YOLOv5 on each frame, draw detections, and display or save the processed frames as a new video.

For even more tutorials on image classification, segmentation, and Python computer vision projects, visit my full computer vision blog.

Conclusion

By now you have a complete, working yolov5 object detection python script that runs end-to-end on a single image.
You set up a clean Conda environment, installed PyTorch with CUDA and OpenCV, and cloned the YOLOv5 repository so everything stays organized and reproducible.

On the Python side, you learned how to load an image, send it through a pretrained YOLOv5 model, inspect the raw results, and convert normalized coordinates into real pixel positions.
You also saw how to filter detections based on confidence, draw bounding boxes, and overlay readable class labels with only a few lines of OpenCV code.

This small tutorial is intentionally simple, but it mirrors the structure of much larger production pipelines.
Once you are comfortable with this code, you can extend it to webcam streams, videos, batches of images, or even custom-trained YOLO models tailored to your own dataset.
From here, the rest of your computer-vision journey is mostly about swapping inputs, changing models, and combining detection with tracking, segmentation, or analytics.

Connect :

☕ Buy me a coffee — https://ko-fi.com/eranfeit

🖥️ Email : feitgemel@gmail.com

🌐 https://eranfeit.net

🤝 Fiverr : https://www.fiverr.com/s/mB3Pbb

Enjoy,

Eran

Leave a Comment Cancel Reply