How to train YOLOv8 dog detection on Stanford Dogs

Leave a Comment / Object Detection, Pytorch

Last Updated on 30/11/2025 by Eran Feit

Yolov8 dog detection is all about teaching a modern deep learning model to find dogs in images the same way your eyes do: quickly, accurately, and in real time.
Instead of just answering “is there a dog here?”, the model learns to say “there’s a dog here, here, and here” and wrap each one with a precise bounding box.
This makes yolov8 dog detection perfect for real-world scenarios like smart cameras, pet monitoring systems, dog daycare analytics, or even research on dog behavior.
With the Stanford Dogs dataset, you can push this idea further and detect not only “a dog” but specific breeds, turning a simple object detector into a powerful fine-grained recognition tool.

Under the hood, yolov8 dog detection relies on a single neural network that looks at the entire image and predicts bounding boxes and class labels in one pass.
This “you only look once” philosophy is what makes YOLO-based models so fast compared to older two-stage detectors.
YOLOv8 improves this design with a stronger backbone, better feature fusion, and optimizations that help it detect dogs of different sizes, from tiny puppies in the distance to close-up portraits.
Because of this efficiency, you can train and deploy models that work on GPUs, edge devices, or even modest hardware, especially when you carefully prepare and label your dataset.

For dog-focused projects, the power of yolov8 dog detection is that it can scale from just “dog vs no-dog” to dozens or even hundreds of breeds.
By combining a high-quality dataset like Stanford Dogs with good annotations in YOLO format, the model learns visual details: ear shape, coat texture, muzzle length, and other subtle cues.
That makes it ideal for educational apps, breed-identification tools, or computer vision demos that showcase how deep learning can understand fine-grained categories.
Once trained, the same model can be reused or fine-tuned for related problems, such as detecting dogs in crowded scenes, in motion, or under different lighting conditions.

From a practical point of view, yolov8 dog detection is also a great learning project for anyone getting serious about object detection.
It takes you through the full lifecycle: setting up the environment, converting existing annotations (like Pascal VOC XML) into YOLO format, organizing train/validation splits, configuring the data.yaml file, training, and finally running predictions.
Along the way, you build intuition about how bounding boxes are encoded, how confidence thresholds work, and how to debug labels when something looks wrong on the image.
By the time you finish a project like this, you not only have a working dog detector, but also a reusable pipeline you can adapt for any custom object detection task.

Golden Retriever

Getting comfortable with YOLOv8 dog detection

When people talk about yolov8 dog detection, they usually mean taking the general-purpose YOLOv8 architecture and adapting it to focus on dogs as the main object of interest.
Instead of training on generic datasets like COCO, you feed the model dog-centered images and labels, so it learns to specialize in finding dogs in all kinds of scenes.
This specialization often leads to better accuracy on dog-related tasks, because the model can dedicate more of its capacity to understanding canine shapes, poses, and contexts.
It also simplifies downstream logic, since you care primarily about the “dog” class (and possibly dog breeds), rather than dozens of unrelated categories.

A typical yolov8 dog detection workflow starts with data preparation.
You gather images that contain dogs, extract or convert bounding box annotations into YOLO format, and create train/validation folders.
For datasets like Stanford Dogs, this can involve parsing existing XML files, mapping breed names to numeric IDs, and saving the results as .txt label files next to the images.
Clean data at this stage is crucial, because every mislabeled bounding box or wrong class ID will confuse the model and show up later as strange predictions.

Once the dataset is ready, you define a data configuration file that tells YOLOv8 where the images live and what each class index represents.
For dog detection, that might be a single “dog” class, or a long list of breed names if you’re doing fine-grained detection.
During training, YOLOv8 uses this configuration to load images, read their bounding boxes, and augment them with random flips, scales, and crops.
These augmentations help the model generalize, so it can detect dogs in new backgrounds, camera angles, or lighting conditions it never saw during training.

After training, yolov8 dog detection turns into a simple inference step: load the best checkpoint, pass an image to the model, and read back the predicted boxes and labels.
You can draw rectangles around each detected dog, print the predicted breed name, and filter predictions by a confidence threshold to remove weak detections.
Because YOLOv8 is designed for real-time performance, this process can run on videos, webcams, or image streams, not just static photos.
That’s where the project really comes alive: you see your model spotting dogs in real-world scenes, frame by frame, powered by everything you built in the earlier steps.

Shih-Tzu

Walking through the YOLOv8 dog detection tutorial code

This tutorial is built around a complete, practical code pipeline for yolov8 dog detection on the Stanford Dogs dataset.
Instead of jumping straight to model training, the code walks you through everything that needs to happen behind the scenes: creating the environment, preparing the data, converting annotations, organizing folders, training the model, and finally running predictions on real images.
The goal is that by the end of the tutorial, you’ll not only have a working detector, but you’ll also understand why each code block exists and how to adapt it to your own projects.

The first part of the code focuses on setting up a clean and reproducible environment.
You create a dedicated Conda environment, install the correct versions of PyTorch with CUDA support, add Ultralytics YOLOv8, and make sure OpenCV is ready to handle image processing.
This may seem like a small step, but it ensures that everything that comes later—data loading, model training, and visualization—runs smoothly without version conflicts or missing libraries.

Next, the tutorial dives into data preparation, which is the heart of any custom yolov8 dog detection project.
The Stanford Dogs dataset provides Pascal VOC style XML annotations, while YOLOv8 expects labels in its own text-based YOLO format.
The code parses every XML file, reads the bounding boxes and class names, maps each dog breed to a numeric ID, and writes out YOLO-formatted label files.
At the same time, it copies images into train and validation folders with a clear 90/10 split, so the model can be trained and evaluated consistently.

Once the dataset is in the right shape, the code defines a data.yaml file that tells YOLOv8 where the images live and what each class index means.
With that in place, the training script loads the lightweight yolov8n model configuration, points it to the prepared dataset, and starts training for a defined number of epochs.
You control batch size, image size, patience, and logging options directly in the code, giving you a nice balance between simplicity and flexibility.
As training progresses, YOLOv8 automatically saves the best-performing weights, which will later be used for inference.

The final part of the tutorial code shows how to turn the trained model into a practical dog detector.
You load the best.pt weights, read a test image with OpenCV, run the YOLOv8 model on it, and loop over the predicted boxes.
For each detection above a chosen confidence threshold, the code draws a bounding box and writes the predicted breed name on top of the dog in the image.
This is where everything comes together: the environment setup, the XML-to-YOLO conversion, the data split, the training process, and the configuration file all support this simple, satisfying moment where your yolov8 dog detection model accurately finds and labels dogs in real images.

Link to the video tutorial : https://youtu.be/EpLEsL7clbg

Code for the tutorial here : https://eranfeit.lemonsqueezy.com/buy/156a6ac2-5795-413a-b6a9-3c5f91d83365 or here : https://ko-fi.com/s/3021f127ba

Link to the dataset : https://www.kaggle.com/datasets/jessicali9530/stanford-dogs-dataset

Link for Medium users : https://medium.com/@feitgemel/how-to-train-yolov8-dog-detection-on-stanford-dogs-147067cfbd95

You can follow my blog here : https://eranfeit.net/blog/

Want to get started with Computer Vision or take your skills to the next level ?

If you’re just beginning, I recommend this step-by-step course designed to introduce you to the foundations of Computer Vision – Complete Computer Vision Bootcamp With PyTorch & TensorFlow

If you’re already experienced and looking for more advanced techniques, check out this deep-dive course – Modern Computer Vision GPT, PyTorch, Keras, OpenCV4

yolov8 dog detection is a great way to turn raw images of dogs into a working, real-time detector that can spot different breeds inside your photos.
In this project, you’ll train YOLOv8 on the Stanford Dogs dataset, convert Pascal VOC XML annotations into YOLO format, and build a complete pipeline from environment setup to final predictions on new images.
Along the way, you’ll see how each code block fits into the bigger picture: preparing data, defining classes, training the model, and drawing bounding boxes with breed names.
By the end of this tutorial, you’ll have a reusable yolov8 dog detection workflow you can adapt to other datasets and custom projects.

This post focuses entirely on the code.
Each section breaks down a different part of the pipeline so you can copy, paste, and tweak it inside your own environment.
You’ll start by creating a dedicated Conda environment, then move on to converting XML annotations, organizing train/validation splits, training YOLOv8, and finally running inference on a test image.
If you’ve ever wanted a full, end-to-end example of yolov8 dog detection on a real dataset, this tutorial is designed to be that missing piece.

Setting up a clean environment for YOLOv8 dog detection

Before touching the Stanford Dogs dataset, it’s important to isolate your dependencies in a fresh Conda environment.
This makes yolov8 dog detection reproducible, avoids version conflicts, and ensures CUDA and PyTorch play nicely together.
The following commands create the environment, verify CUDA, and install PyTorch, YOLOv8, and the extra dependency required for OpenCV tracking.

### Create a new Conda environment dedicated to YOLOv8 dog detection with Python 3.8. conda create --name YoloV8 python=3.8  ### Activate the environment so that all upcoming packages are installed into this project. conda activate YoloV8  ### Check that the NVIDIA CUDA compiler is available and confirm your CUDA toolkit version. nvcc --version  ### Install PyTorch, Torchvision, and Torchaudio with CUDA 11.8 support from the official PyTorch and NVIDIA channels. conda install pytorch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 pytorch-cuda=11.8 -c pytorch -c nvidia  ### Install the Ultralytics package that provides the YOLOv8 implementation for dog detection. pip install ultralytics==8.1.0  ### Install the lapx package required by some OpenCV tracking utilities. pip install "lapx>=0.5.2"

After running these commands you’re ready to focus on data handling, annotation conversion, and the rest of the yolov8 dog detection pipeline.

Getting the Stanford Dogs dataset into place

Link to the dataset : https://www.kaggle.com/datasets/jessicali9530/stanford-dogs-dataset

The next step is to download the Stanford Dogs dataset from Kaggle and arrange it in a local folder structure.
Once the dataset is on disk, you’ll have two key resources: the image files and the corresponding XML annotations in Pascal VOC format.
YOLOv8 expects labels in its own text-based format, so the goal of the code is to bridge that gap by reading the XML files and writing YOLO label files.
In this section, we’ll start with a small script that scans the annotation directory and builds a mapping from breed names to numeric class indices.

### Import os so we can list directories and build file paths for the Stanford Dogs annotations. import os  ### Define the root directory where the original Stanford Dogs XML annotation folders live. directory = "C:/Data-sets/Stanford Dogs Dataset/annotations/Annotation"  ### Create an empty list that will store all breed folder names found inside the annotations directory. folder_names = []  ### Loop over every entry in the annotations directory. for folder in os.listdir(directory):     ### Check if the entry is a directory (each directory corresponds to a dog breed).     if os.path.isdir(os.path.join(directory, folder)):         ### Add the folder name to the list so we can later turn it into class indices.         folder_names.append(folder)  ### Create an empty dictionary that will map cleaned breed names to integer class IDs. class_indices = {}  ### Enumerate over all folder names so each breed gets a unique integer index. for i, folder_name in enumerate(folder_names):     ### Remove the numeric prefix before the dash to keep only the clean breed name.     name_after_dash = "-".join(folder_name.split("-")[1:]).strip()     ### Store the mapping from cleaned breed name to the current index.     class_indices[name_after_dash] = i  ### Print the full mapping so we can verify that all 120 dog breeds are included. print("class_indices =", class_indices)

At this stage you have a dictionary that assigns each dog breed a stable numeric ID, which you’ll reuse in all later stages of the yolov8 dog detection pipeline.

Converting a single annotation from XML to YOLO format

Before converting the entire dataset, it’s helpful to test your XML-to-YOLO conversion on a single example file.
This makes debugging much easier and lets you verify that bounding boxes are computed correctly.
The next code block defines a helper function to convert one Pascal VOC XML file into YOLO format, and then calls it for a single Rhodesian Ridgeback image.

### Import the xml.etree.ElementTree module to parse Pascal VOC XML annotation files. import xml.etree.ElementTree as ET  ### Define a function that converts one XML file into a YOLO-formatted label file. def convert_xml_to_yolo(xml_path, yolo_path, class_indices):     ### Parse the XML file into an element tree.     tree = ET.parse(xml_path)     ### Get the root node so we can query image size and object elements.     root = tree.getroot()      ### Read the image width from the XML size block.     image_width = int(root.find(".//size/width").text)     ### Read the image height from the XML size block.     image_height = int(root.find(".//size/height").text)      ### Open the target YOLO label file for writing.     with open(yolo_path, 'w') as yolo_file:         ### Loop over every object (dog instance) described in the XML file.         for obj in root.findall(".//object"):             ### Read the class name (breed) from the object block.             class_name = obj.find('name').text             ### Look up the numeric class index for this breed.             class_index = class_indices[class_name]              ### Read the bounding box coordinates from the XML file.             xmin = int(obj.find('bndbox/xmin').text)             ymin = int(obj.find('bndbox/ymin').text)             xmax = int(obj.find('bndbox/xmax').text)             ymax = int(obj.find('bndbox/ymax').text)              ### Compute the x-center in YOLO's relative coordinate system.             x_center = (xmin + xmax) / (2.0 * image_width)             ### Compute the y-center in YOLO's relative coordinate system.             y_center = (ymin + ymax) / (2.0 * image_height)             ### Compute the bounding box width as a fraction of image width.             box_width = (xmax - xmin) / image_width             ### Compute the bounding box height as a fraction of image height.             box_height = (ymax - ymin) / image_height              ### Format a YOLO label line as: class x_center y_center width height.             yolo_line = f"{class_index} {x_center:.6f} {y_center:.6f} {box_width:.6f} {box_height:.6f}\n"             ### Write the label line to the YOLO label file.             yolo_file.write(yolo_line)  ### Define the full path to one example XML annotation file (without the .xml extension if your files follow that pattern). json_file_path = "C:/Data-sets/Stanford Dogs Dataset/annotations/Annotation/n02087394-Rhodesian_ridgeback/n02087394_2253.xml"  ### Define the path where the YOLO label for this image will be saved. yolo_output_path = "C:/Data-sets/Stanford Dogs Dataset/annotations/n02087394_2253.txt"  ### Define the folder where all YOLO label files for training will eventually be stored. lables_folder_path = "C:/Data-sets/Stanford Dogs Dataset/training/labels"  ### Create the labels folder if it does not already exist. if not os.path.exists(lables_folder_path):     ### Make the directory tree so we can write label files into it.     os.makedirs(lables_folder_path)  ### Run the XML-to-YOLO conversion for this single example to verify the pipeline. convert_xml_to_yolo(json_file_path, yolo_output_path, class_indices)

After this test run, you should see a .txt file with one or more YOLO label lines, confirming that your conversion logic works before scaling up to all images.

Visualizing one annotated dog image with OpenCV

Once you’ve generated a YOLO label file, it’s useful to draw the bounding box on top of the corresponding image.
This helps confirm that the coordinates are in the right place and that the dog breed name matches the actual dog.
In this section, you’ll load a Rhodesian Ridgeback image, read its YOLO label file, convert normalized coordinates into pixel coordinates, and draw a labeled bounding box using OpenCV.

### Import the YOLO class from Ultralytics (even though we mainly use OpenCV here, this keeps the environment consistent). from ultralytics import YOLO  ### Import OpenCV to handle image loading, drawing, and display. import cv2  ### Import os for working with file paths if needed later. import os  ### Import yaml in case you want to read configuration files alongside this script. import yaml  ### Define the path to the example dog image you want to visualize. img = "C:/Data-sets/Stanford Dogs Dataset/images/images/n02087394-Rhodesian_ridgeback/n02087394_2253.jpg"  ### Define the path to the YOLO label file corresponding to this image. imgAnot = "C:/Data-sets/Stanford Dogs Dataset/annotations/n02087394_2253.txt"  ### Define a dictionary mapping dog breed names to integer class indices. class_indices = {     "Chihuahua": 0, "Japanese_spaniel": 1, "Maltese_dog": 2, "Pekinese": 3, "Tzu": 4,     "Blenheim_spaniel": 5, "papillon": 6, "toy_terrier": 7, "Rhodesian_ridgeback": 8,     "Afghan_hound": 9, "basset": 10, "beagle": 11, "bloodhound": 12, "bluetick": 13,     "tan_coonhound": 14, "Walker_hound": 15, "English_foxhound": 16, "redbone": 17,     "borzoi": 18, "Irish_wolfhound": 19, "Italian_greyhound": 20, "whippet": 21,     "Ibizan_hound": 22, "Norwegian_elkhound": 23, "otterhound": 24, "Saluki": 25,     "Scottish_deerhound": 26, "Weimaraner": 27, "Staffordshire_bullterrier": 28,     "American_Staffordshire_terrier": 29, "Bedlington_terrier": 30, "Border_terrier": 31,     "Kerry_blue_terrier": 32, "Irish_terrier": 33, "Norfolk_terrier": 34,     "Norwich_terrier": 35, "Yorkshire_terrier": 36, "haired_fox_terrier": 37,     "Lakeland_terrier": 38, "Sealyham_terrier": 39, "Airedale": 40, "cairn": 41,     "Australian_terrier": 42, "Dandie_Dinmont": 43, "Boston_bull": 44,     "miniature_schnauzer": 45, "giant_schnauzer": 46, "standard_schnauzer": 47,     "Scotch_terrier": 48, "Tibetan_terrier": 49, "silky_terrier": 50,     "coated_wheaten_terrier": 51, "West_Highland_white_terrier": 52, "Lhasa": 53,     "coated_retriever": 55, "golden_retriever": 56, "Labrador_retriever": 57,     "Chesapeake_Bay_retriever": 58, "haired_pointer": 59, "vizsla": 60,     "English_setter": 61, "Irish_setter": 62, "Gordon_setter": 63, "Brittany_spaniel": 64,     "clumber": 65, "English_springer": 66, "Welsh_springer_spaniel": 67,     "cocker_spaniel": 68, "Sussex_spaniel": 69, "Irish_water_spaniel": 70, "kuvasz": 71,     "schipperke": 72, "groenendael": 73, "malinois": 74, "briard": 75, "kelpie": 76,     "komondor": 77, "Old_English_sheepdog": 78, "Shetland_sheepdog": 79, "collie": 80,     "Border_collie": 81, "Bouvier_des_Flandres": 82, "Rottweiler": 83,     "German_shepherd": 84, "Doberman": 85, "miniature_pinscher": 86,     "Greater_Swiss_Mountain_dog": 87, "Bernese_mountain_dog": 88, "Appenzeller": 89,     "EntleBucher": 90, "boxer": 91, "bull_mastiff": 92, "Tibetan_mastiff": 93,     "French_bulldog": 94, "Great_Dane": 95, "Saint_Bernard": 96, "Eskimo_dog": 97,     "malamute": 98, "Siberian_husky": 99, "affenpinscher": 100, "basenji": 101,     "pug": 102, "Leonberg": 103, "Newfoundland": 104, "Great_Pyrenees": 105,     "Samoyed": 106, "Pomeranian": 107, "chow": 108, "keeshond": 109,     "Brabancon_griffon": 110, "Pembroke": 111, "Cardigan": 112, "toy_poodle": 113,     "miniature_poodle": 114, "standard_poodle": 115, "Mexican_hairless": 116,     "dingo": 117, "dhole": 118, "African_hunting_dog": 119 }  ### Create a reverse mapping from class index to readable breed name. number_to_name = {value: key for key, value in class_indices.items()}  ### Load the image from disk using OpenCV. img = cv2.imread(img)  ### Read the image height and width so we can convert YOLO coordinates back to pixels. H, W, _ = img.shape  ### Open the YOLO label file and read all lines into memory. with open(imgAnot, 'r') as file:     ### Store every line from the label file in a list for later parsing.     lines = file.readlines()  ### Create an empty list that will hold parsed annotations in (label, x, y, w, h) format. annotations = []  ### Loop over each line in the YOLO label file. for line in lines:     ### Split the line into separate values (class index followed by normalized coordinates).     values = line.split()     ### The first value is the class label index as a string.     label = values[0]     ### Convert the remaining four values into float coordinates.     x, y, w, h = map(float, values[1:])     ### Append the parsed annotation to our list.     annotations.append((label, x, y, w, h))  ### Loop over each parsed annotation to draw boxes on the image. for annotation in annotations:     ### Unpack the label and YOLO-format coordinates.     label, x, y, w, h = annotation     ### Convert the numeric label back to a readable breed name.     label_name = number_to_name[int(label)]      ### Compute the top-left x coordinate in pixels.     x1 = int((x - w / 2) * W)     ### Compute the top-left y coordinate in pixels.     y1 = int((y - h / 2) * H)     ### Compute the bottom-right x coordinate in pixels.     x2 = int((x + w / 2) * W)     ### Compute the bottom-right y coordinate in pixels.     y2 = int((y + h / 2) * H)      ### Draw the bounding box rectangle around the detected dog.     cv2.rectangle(img, (x1, y1), (x2, y2), (200, 200, 0), 1)      ### Put the breed name text slightly above the top-left corner of the box.     cv2.putText(         img,         label_name,         (x1, y1 - 5),         cv2.FONT_HERSHEY_SIMPLEX,         0.5,         (200, 200, 0),         2,         cv2.LINE_AA     )  ### Display the annotated image in a window. cv2.imshow("img", img)  ### Wait for a key press so the window does not close immediately. cv2.waitKey()  ### Close all OpenCV windows when you are done. cv2.destroyAllWindows()

If the box and label line up correctly with the dog, your yolov8 dog detection dataset is well on its way to being usable for training.

Building the full YOLOv8 dataset for 120 dog breeds

After validating a single example, you’re ready to convert the entire Stanford Dogs dataset into YOLO format and split it into training and validation sets.
This code recreates the class index mapping, defines output folders, copies each image to the appropriate split, and writes a matching YOLO label file.
The split uses a simple 90/10 rule based on an image counter, so every tenth image goes into the validation

### Import os for working with file paths and directory listing. import os  ### Import shutil to copy image files into train and validation folders. import shutil  ### Define the root directory where the Pascal VOC XML annotation folders are stored. directory = "C:/Data-sets/Stanford Dogs Dataset/annotations/Annotation"  ### Create an empty list that will hold all annotation folder names. folder_names = []  ### Loop over each item in the annotations directory. for folder in os.listdir(directory):     ### If the item is a directory, treat it as a breed folder and store its name.     if os.path.isdir(os.path.join(directory, folder)):         folder_names.append(folder)  ### Create a dictionary that will map cleaned breed names to numeric class indices. class_indices = {}  ### Enumerate over all folder names to build the class index mapping. for i, folder_name in enumerate(folder_names):     ### Strip off the numeric prefix before the dash to keep only the breed name.     name_after_dash = "-".join(folder_name.split("-")[1:]).strip()     ### Store the mapping in the dictionary.     class_indices[name_after_dash] = i  ### Print the mapping so you can double-check the class indices if needed. print("class_indices =", class_indices)  ### Import the XML parsing module so we can read Pascal VOC annotations. import xml.etree.ElementTree as ET  ### Define the function that converts a single XML annotation into YOLO label format. def convert_xml_to_yolo(xml_path, yolo_path, class_indices):     ### Parse the XML annotation file into an element tree.     tree = ET.parse(xml_path)     ### Get the root element to access image size and object tags.     root = tree.getroot()      ### Read the width of the image from the XML file.     image_width = int(root.find(".//size/width").text)     ### Read the height of the image from the XML file.     image_height = int(root.find(".//size/height").text)      ### Open the target YOLO text file for writing label lines.     with open(yolo_path, 'w') as yolo_file:         ### Loop over each object instance described in the XML.         for obj in root.findall(".//object"):             ### Extract the class name (breed) of the current object.             class_name = obj.find('name').text             ### Look up the corresponding numeric class index.             class_index = class_indices[class_name]              ### Read the bounding box coordinates from the XML tags.             xmin = int(obj.find('bndbox/xmin').text)             ymin = int(obj.find('bndbox/ymin').text)             xmax = int(obj.find('bndbox/xmax').text)             ymax = int(obj.find('bndbox/ymax').text)              ### Compute the YOLO x-center value relative to the image width.             x_center = (xmin + xmax) / (2.0 * image_width)             ### Compute the YOLO y-center value relative to the image height.             y_center = (ymin + ymax) / (2.0 * image_height)             ### Compute the relative width of the bounding box.             box_width = (xmax - xmin) / image_width             ### Compute the relative height of the bounding box.             box_height = (ymax - ymin) / image_height              ### Format the label line with class index and normalized coordinates.             yolo_line = f"{class_index} {x_center:.6f} {y_center:.6f} {box_width:.6f} {box_height:.6f}\n"             ### Write the label line to the YOLO text file.             yolo_file.write(yolo_line)  ### Define the output folder for training images. output_train_images_folder = "C:/Data-sets/Stanford Dogs Dataset/dataset/train/images"  ### Create the training images directory if it does not already exist. if not os.path.exists(output_train_images_folder):     os.makedirs(output_train_images_folder)  ### Define the output folder for validation images. output_valid_images_folder = "C:/Data-sets/Stanford Dogs Dataset/dataset/valid/images"  ### Create the validation images directory if it does not already exist. if not os.path.exists(output_valid_images_folder):     os.makedirs(output_valid_images_folder)  ### Define the output folder for training label text files. output_train_lables_folder = "C:/Data-sets/Stanford Dogs Dataset/dataset/train/labels"  ### Create the training labels directory if it does not already exist. if not os.path.exists(output_train_lables_folder):     os.makedirs(output_train_lables_folder)  ### Define the output folder for validation label text files. output_valid_lables_folder = "C:/Data-sets/Stanford Dogs Dataset/dataset/valid/labels"  ### Create the validation labels directory if it does not already exist. if not os.path.exists(output_valid_lables_folder):     os.makedirs(output_valid_lables_folder)  ### Define the root folder containing all original image subfolders for each breed. source_images_folder = "C:/Data-sets/Stanford Dogs Dataset/images/Images"  ### Initialize a counter that will help us create a 90/10 train/valid split. split_numerator = 0  ### List all breed image folders inside the source images directory. images_folder_names = os.listdir(source_images_folder)  ### Print the folder names so you can verify the dataset structure. print(images_folder_names)  ### Loop over each breed folder in the images directory. for folder in images_folder_names:     ### Get the list of image file names inside the current breed folder.     list_of_images = os.listdir(os.path.join(source_images_folder, folder))      ### Loop over every image file in this breed folder.     for image in list_of_images:         ### Compute the residue of the image counter to decide the dataset split.         residue = split_numerator % 10          ### Build the full path to the original image file.         image_full_path = os.path.join(source_images_folder, folder, image)          ### If residue is zero, send this image to the validation set.         if residue == 0:             image_full_path_destination = os.path.join(output_valid_images_folder, image)         ### Otherwise, send it to the training set.         else:             image_full_path_destination = os.path.join(output_train_images_folder, image)          ### Copy the image file to its destination (train or valid).         shutil.copyfile(image_full_path, image_full_path_destination)          ### Strip the file extension from the image file name to get the base name.         file_name_without_extension = os.path.splitext(image)[0]          ### Build the full path to the corresponding XML annotation file.         full_file_path = os.path.join(directory, folder, file_name_without_extension + ".xml")          ### Decide where the YOLO label file should be stored based on the split.         if residue == 0:             yolo_file_path = os.path.join(output_valid_lables_folder, file_name_without_extension + ".txt")         else:             yolo_file_path = os.path.join(output_train_lables_folder, file_name_without_extension + ".txt")          ### Convert the XML annotation for this image into YOLO label format.         convert_xml_to_yolo(full_file_path, yolo_file_path, class_indices)          ### Increment the split counter so the next image is routed correctly.         split_numerator = split_numerator + 1          ### Print progress so you can track how many files have been processed.         print("File no. " + str(split_numerator))

After running this script, you’ll have a YOLO-ready dataset with train/images, train/labels, valid/images, and valid/labels folders, which is exactly what yolov8 dog detection expects.

Training the YOLOv8 dog detection model on Stanford Dogs

With the dataset prepared, it’s time to configure YOLOv8 and start training the Nano model.
This section loads the yolov8n configuration, defines the dataset config file, and runs training with a chosen batch size and image size.
You’ll also define a data.yaml file so YOLO knows where to find images and how to map class indices to breed names.

### Import the YOLO class from Ultralytics so we can create and train a YOLOv8 model. from ultralytics import YOLO  ### Define the main entry point of the training script. def main():     ### Load the YOLOv8 Nano model architecture from its YAML configuration file.     model = YOLO("yolov8n.yaml")      ### Set the path to the data configuration file that points to train and validation folders.     config_file_path = "Best-Object-Detection-models/Yolo-V8/Stanford Dogs-Convert-Json-2-Yolo/data.yaml"      ### Define the project directory where YOLOv8 will store experiment results.     project = "C:/Data-sets/Stanford Dogs Dataset/dataset"      ### Give a friendly name to this experiment so results are stored in a clear subfolder.     experiment_name = "Nano-Model"      ### Choose a batch size that fits into your GPU memory.     batch_size = 16      ### Start training the YOLOv8 Nano model on the Stanford Dogs dataset.     results = model.train(         data=config_file_path,         epochs=100,         project=project,         name=experiment_name,         batch=batch_size,         device=0,         patience=10,         imgsz=640,         verbose=True,         val=True     )  ### Run the main function only when this script is executed directly. if __name__ == "__main__":     main()

Here is the corresponding data.yaml file you can save alongside your training script.
It tells YOLOv8 where your images are stored and how each class index maps to a dog breed name.

train: C:/Data-sets/Stanford Dogs Dataset/dataset/train/images val: C:/Data-sets/Stanford Dogs Dataset/dataset/valid/images  names:     0: 'Chihuahua'     1: 'Japanese_spaniel'     2: 'Maltese_dog'     3: 'Pekinese'     4: 'Shih-Tzu'     5: 'Blenheim_spaniel'     6: 'papillon'     7: 'toy_terrier'     8: 'Rhodesian_ridgeback'     9: 'Afghan_hound'     10: 'basset'     11: 'beagle'     12: 'bloodhound'     13: 'bluetick'     14: 'black-and-tan_coonhound'     15: 'Walker_hound'     16: 'English_foxhound'     17: 'redbone'     18: 'borzoi'     19: 'Irish_wolfhound'     20: 'Italian_greyhound'     21: 'whippet'     22: 'Ibizan_hound'     23: 'Norwegian_elkhound'     24: 'otterhound'     25: 'Saluki'     26: 'Scottish_deerhound'     27: 'Weimaraner'     28: 'Staffordshire_bullterrier'     29: 'American_Staffordshire_terrier'     30: 'Bedlington_terrier'     31: 'Border_terrier'     32: 'Kerry_blue_terrier'     33: 'Irish_terrier'     34: 'Norfolk_terrier'     35: 'Norwich_terrier'     36: 'Yorkshire_terrier'     37: 'wire-haired_fox_terrier'     38: 'Lakeland_terrier'     39: 'Sealyham_terrier'     40: 'Airedale'     41: 'cairn'     42: 'Australian_terrier'     43: 'Dandie_Dinmont'     44: 'Boston_bull'     45: 'miniature_schnauzer'     46: 'giant_schnauzer'     47: 'standard_schnauzer'     48: 'Scotch_terrier'     49: 'Tibetan_terrier'     50: 'silky_terrier'     51: 'soft-coated_wheaten_terrier'     52: 'West_Highland_white_terrier'     53: 'Lhasa'     54: 'flat-coated_retriever'     55: 'curly-coated_retriever'     56: 'golden_retriever'     57: 'Labrador_retriever'     58: 'Chesapeake_Bay_retriever'     59: 'German_short-haired_pointer'     60: 'vizsla'     61: 'English_setter'     62: 'Irish_setter'     63: 'Gordon_setter'     64: 'Brittany_spaniel'     65: 'clumber'     66: 'English_springer'     67: 'Welsh_springer_spaniel'     68: 'cocker_spaniel'     69: 'Sussex_spaniel'     70: 'Irish_water_spaniel'     71: 'kuvasz'     72: 'schipperke'     73: 'groenendael'     74: 'malinois'     75: 'briard'     76: 'kelpie'     77: 'komondor'     78: 'Old_English_sheepdog'     79: 'Shetland_sheepdog'     80: 'collie'     81: 'Border_collie'     82: 'Bouvier_des_Flandres'     83: 'Rottweiler'     84: 'German_shepherd'     85: 'Doberman'     86: 'miniature_pinscher'     87: 'Greater_Swiss_Mountain_dog'     88: 'Bernese_mountain_dog'     89: 'Appenzeller'     90: 'EntleBucher'     91: 'boxer'     92: 'bull_mastiff'     93: 'Tibetan_mastiff'     94: 'French_bulldog'     95: 'Great_Dane'     96: 'Saint_Bernard'     97: 'Eskimo_dog'     98: 'malamute'     99: 'Siberian_husky'     100: 'affenpinscher'     101: 'basenji'     102: 'pug'     103: 'Leonberg'     104: 'Newfoundland'     105: 'Great_Pyrenees'     106: 'Samoyed'     107: 'Pomeranian'     108: 'chow'     109: 'keeshond'     110: 'Brabancon_griffon'     111: 'Pembroke'     112: 'Cardigan'     113: 'toy_poodle'     114: 'miniature_poodle'     115: 'standard_poodle'     116: 'Mexican_hairless'     117: 'dingo'     118: 'dhole'     119: 'African_hunting_dog'

Once training finishes, YOLOv8 will save the best weights (for example, best.pt) into the Nano-Model/weights folder inside your project directory, ready for inference.

Running predictions with your trained YOLOv8 dog detector

The final step turns your trained model into a practical yolov8 dog detection tool.
You’ll load the best weights, run inference on a test image, loop over all predicted boxes, and draw breed labels with OpenCV.
This is where the entire pipeline—from annotation conversion to training—comes together in a single, visual output.

Here is the test image :

Doberman

### Import the YOLO class to load the trained YOLOv8 model. from ultralytics import YOLO  ### Import OpenCV so we can read the test image and draw bounding boxes. import cv2  ### Import os to help build filesystem paths in a portable way. import os  ### Define the path to the test image you want to run predictions on. imgPath = "Best-Object-Detection-models/Yolo-V8/Stanford Dogs-Convert-Json-2-Yolo/doberman.jpg" # imgPath = "Best-Object-Detection-models/Yolo-V8/Stanford Dogs-Convert-Json-2-Yolo/Dori.jpg"  ### Read the image from disk using OpenCV. img = cv2.imread(imgPath)  ### Get the image height and width to help with any later processing. H, W, _ = img.shape  ### Build the full path to the trained YOLOv8 model weights. model_path = os.path.join(     "C:/Data-sets/Stanford Dogs Dataset/dataset",     "Nano-Model",     "weights",     "best.pt" )  ### Load the trained YOLOv8 model from disk. model = YOLO(model_path)  ### Set a confidence threshold to filter out weak detections. threshold = 0.3  ### Run the model on the image and take the first result. results = model(img)[0]  ### Loop over every detected bounding box returned by YOLOv8. for result in results.boxes.data.tolist():     ### Unpack the box coordinates, confidence score, and class ID.     x1, y1, x2, y2, score, class_id = result      ### Only draw detections that exceed the chosen confidence threshold.     if score > threshold:         ### Draw the bounding box rectangle around the detected dog.         cv2.rectangle(img, (int(x1), int(y1)), (int(x2), int(y2)), (0, 0, 0), 3)          ### Put the predicted breed name above the bounding box.         cv2.putText(             img,             results.names[int(class_id)].upper(),             (int(x1), int(y1 - 10)),             cv2.FONT_HERSHEY_SIMPLEX,             0.5,             (0, 0, 0),             1,             cv2.LINE_AA         )  ### Show the final image with YOLOv8 dog detection results. cv2.imshow("img", img)  ### Wait for a key press so you can inspect the detections. cv2.waitKey(0)  ### Close all OpenCV windows when you are done. cv2.destroyAllWindows()

You now have a complete yolov8 dog detection system: from raw Stanford Dogs annotations through training and all the way to labeled predictions on new images.

The result :

Object detection result — How to train YOLOv8 dog detection on Stanford Dogs 7

FAQ: YOLOv8 dog detection on Stanford Dogs

What does this YOLOv8 dog detection code actually do?

The code builds a full pipeline that converts Stanford Dogs annotations to YOLO format, trains a YOLOv8 model on 120 breeds, and runs predictions on new images.

Which dataset is used for training the model?

The tutorial uses the Stanford Dogs dataset, which contains images and bounding-box annotations for 120 different dog breeds.

Why do we split the data into train and validation sets?

Splitting into train and validation sets helps you monitor performance on unseen images and avoid overfitting during training.

Do I need to change the data.yaml file for my own paths?

Yes, you should update the train and val paths in data.yaml so they point to your actual dataset folders on disk.

Can I switch from YOLOv8 Nano to a larger model?

You can switch to models like YOLOv8s or YOLOv8m by changing the YAML file name, as long as your GPU has enough memory.

What does the confidence threshold control in prediction?

The threshold filters out low-confidence detections so only predictions with a score above the chosen value are drawn on the image.

Is it possible to train on fewer dog breeds?

Yes, you can subset the dataset and adjust the class mappings so YOLOv8 only learns the specific breeds you care about.

How do I debug incorrect bounding boxes after conversion?

Check that image width and height are read correctly, verify the XML coordinates, and use the visualization script to see where boxes are drawn.

Can I reuse this code for video input instead of images?

Yes, you can wrap the prediction loop inside a video frame reader and run YOLOv8 on each frame in real time.

What are the next steps after this tutorial?

You can experiment with larger models, add data augmentation, or adapt the same pipeline to other custom object detection tasks.

Conclusion

In this tutorial you walked through a complete, hands-on yolov8 dog detection project using the Stanford Dogs dataset.
Starting from a clean Conda environment, you prepared the data by converting Pascal VOC XML annotations into YOLO format, organized the images into train and validation splits, and defined a data configuration that exposes all 120 dog breeds to YOLOv8.
Each code block is designed to be copy-paste friendly so you can reproduce the full pipeline on your own machine with only minor path adjustments.

Once the dataset was ready, you trained the YOLOv8 Nano model, monitored its progress, and saved the best-performing weights.
The final prediction script closed the loop by loading those weights, running inference on a test dog image, and drawing bounding boxes with breed labels.
This workflow turns the abstract idea of “dog detection with deep learning” into something concrete and visual: you can literally see the model’s understanding of each dog breed on screen.

From here, you can extend the project in many directions.
You might switch to a larger YOLOv8 variant for higher accuracy, try more aggressive data augmentations, or adapt the same code structure to different animal or object datasets.
No matter which path you choose, you now have a solid, reusable template for training YOLOv8 on custom detection tasks and scaling your computer vision projects to real-world scenarios.

Connect :

☕ Buy me a coffee — https://ko-fi.com/eranfeit

🖥️ Email : feitgemel@gmail.com

🌐 https://eranfeit.net

🤝 Fiverr : https://www.fiverr.com/s/mB3Pbb

Enjoy,

Eran

Leave a Comment Cancel Reply