How to use YOLOv8 for object detection on YouTube videos

/ Uncategorized

Last Updated on 30/11/2025 by Eran Feit

Object detection on YouTube videos is a practical way to take deep learning models out of static image demos and put them into real-world, moving scenarios.
Instead of working with a fixed dataset of images, you connect your model to dynamic content such as sports games, tutorials, or traffic cameras that are streamed or hosted on YouTube.
This lets you detect objects frame by frame as the video plays, turning ordinary clips into rich, structured data that you can analyze, visualize, or use to build smart applications.

When you focus on object detection on YouTube videos, you’re essentially building a full pipeline.
You start by grabbing the video stream, breaking it into frames, and feeding those frames into a deep learning model like YOLOv8.
The model predicts bounding boxes and class labels for each frame, allowing you to track how objects move, interact, and change over time.
This temporal perspective is especially powerful for use cases like sports analytics, where you want to follow players, referees, or a ball throughout an entire match.

Another advantage of object detection on YouTube videos is that YouTube itself becomes a huge source of training and testing material.
You can download or stream public videos, extract frames, label them automatically or manually, and then train a custom model that understands your specific scenario.
For example, you can specialize in basketball, cars on highways, or people in retail stores, without having to build a dataset from scratch.
This approach dramatically reduces the barrier to entry for custom object detection projects.

Finally, object detection on YouTube videos is ideal for showcasing results.
Once your model is trained, you can run it on new YouTube clips and overlay the predictions directly on the video.
The visual effect is clear and intuitive: bounding boxes and labels appear in real time, helping viewers and stakeholders immediately see what the model is doing.
This makes it an excellent topic for tutorials, demos, and portfolio projects that highlight both your deep learning skills and your ability to solve real-world problems.

object detection on YouTube videos

Bringing object detection on YouTube videos into a real project

When you build a real project around object detection on YouTube videos, the first step is usually to connect to the video source.
This can be done by streaming directly from YouTube or by downloading the video file and reading it locally.
Tools like OpenCV, CamGear, or similar libraries help you access the video stream frame by frame, giving you full control over how often you sample frames and how you preprocess them before inference.
At this stage, you decide the resolution, frame rate, and any basic transformations you want to apply.

Once you can reliably read frames from the video, the next part of the pipeline is the detection model itself.
Modern models like YOLOv8 are designed to be fast and accurate, which makes them well-suited for working with video data.
Each frame is passed through the model, which returns coordinates of bounding boxes, confidence scores, and class IDs.
You then draw these boxes and labels back onto the frame, so you can watch the video with detections visualized in real time or save it as a processed output video.

The real power of object detection on YouTube videos appears when you go beyond raw predictions and think about targets and goals.
For example, if your goal is sports analytics, you may want to distinguish between teams, track individual players, and identify key events like shots, passes, or fouls.
If your goal is crowd analysis, you might focus on counting people, measuring congestion, or detecting unusual behavior.
By setting a clear target, you can tailor the classes, thresholds, and post-processing logic to deliver insights rather than just bounding boxes.

On a higher level, this type of project also teaches important concepts in computer vision engineering.
You learn how to handle video I/O, manage GPU resources, work with batch sizes and inference speed, and design a pipeline that is robust over long clips.
You also get experience with dataset creation, whether you rely on manual labeling or automatic annotation tools to prepare training data from YouTube frames.
Altogether, object detection on YouTube videos becomes a complete, end-to-end example of how deep learning, data engineering, and practical software skills come together in one project.

Youtube Object detection

Object detection on YouTube videos becomes much more approachable when you can see every step laid out clearly in code.
Instead of treating the model as a mysterious black box, this tutorial walks through a complete Python pipeline, from preparing your environment all the way to running detections on a fresh basketball game.
You’ll see how each part of the script contributes to the bigger picture: downloading video frames, auto-labeling them, training YOLOv8, and finally watching the model follow players, referees, and the ball in real time.

The goal of this tutorial is not just to show that object detection on YouTube videos is possible, but to make the entire workflow repeatable on your own machine.
You’ll work with real YouTube links, create a local dataset, and use automatic annotation so you don’t need to hand-label hundreds of images.
By the end, the code gives you a complete project that you can adapt to other sports, other classes, or even completely different types of videos.

Throughout the walkthrough we will keep the focus on clarity.
Every major block of code is there for a reason: setting up CUDA and PyTorch, capturing frames with CamGear, preparing YOLO-style labels, or configuring the YOLOv8 training loop.
With this structure, you can understand the logic, tweak parameters, and debug issues without feeling lost in a sea of imports and functions.

Most importantly, this is a real-world example.
The scripts are designed around an actual use case: detecting Maccabi players, Real Madrid players, the basketball, and the referee in a live game pulled from YouTube.
That keeps the tutorial grounded and makes it easy to see how the same approach could be applied to your own projects and channels.

Let’s walk through the code we’ll use in this tutorial

The code you’ll use in this tutorial is organized into clear steps, each one moving you closer to a fully working object detection system for YouTube basketball videos.
It starts with environment setup, where you create a dedicated Conda environment, install the correct PyTorch and CUDA versions, and bring in libraries like Ultralytics YOLOv8, CamGear, and yt_dlp.
This foundation is critical: once CUDA, PyTorch, and your dependencies are aligned, the rest of the pipeline runs smoothly and you can focus on the logic instead of fighting with installs.

Next comes the video-to-images stage.
Using CamGear and OpenCV, the code streams each YouTube game and reads it frame by frame.
For every frame, it resizes the image to 640×640, saves it to disk, and overlays a simple frame counter so you can visually confirm progress while the script runs.
This step turns your YouTube links into a real training corpus of basketball images that YOLOv8 can learn from later.

The third part of the code is all about automatic labeling.
Here you define a CaptionOntology with classes such as “Maccabi player”, “Real Madrid player”, “ball”, and “referee”, then plug that ontology into Grounding DINO through Autodistill.
The script scans all the extracted PNG images, detects objects that match your text prompts, and writes YOLO-format label files into a structured dataset folder.
The result is a ready-to-train dataset created from raw YouTube footage without any manual drawing of bounding boxes.

After the labels are generated, the tutorial includes a verification phase to make sure everything looks right.
The code randomly selects images from the training folder, reads the corresponding label files, converts normalized YOLO coordinates into pixel boxes, and draws them back onto the images with class names.
You can visually inspect several samples in OpenCV windows to confirm that players, referees, and the ball are being detected and labeled correctly before committing time and GPU resources to training.

Once the dataset is validated, the training script takes over.
You initialize a YOLOv8 model from a YAML configuration, point it to your custom data.yaml file that describes the train and validation image folders and the four class names, and then call the training function with parameters such as epochs, batch size, image size, and early stopping patience.
Checkpoints are saved in a dedicated project directory so you can later load the best-performing weights without digging through temporary files.

The final step of the code puts everything together in a live demo.
You load the best.pt weights, open a new YouTube video stream, and run YOLOv8 inference on each frame in real time.
For each detection above the chosen confidence threshold, the script draws bounding boxes and labels directly on the video frames, so you can watch the model follow players and the ball as the game unfolds.
At this point, the original goal of the code is fully realized: a complete, end-to-end pipeline that performs object detection on YouTube videos using a model you trained yourself from automatically labeled data.

Link for the video tutorial : https://youtu.be/i8k8YP0oy00

Code for the tutorial here : https://eranfeit.lemonsqueezy.com/buy/6263e229-5338-4c47-9f8a-e2b24119c6d6 or here : https://ko-fi.com/s/cff94b715a

Link for Medium users : https://medium.com/@feitgemel/how-to-use-yolov8-for-object-detection-on-youtube-videos-4e33b2564c4a

You can follow my blog here : https://eranfeit.net/blog/

Want to get started with Computer Vision or take your skills to the next level ?

If you’re just beginning, I recommend this step-by-step course designed to introduce you to the foundations of Computer Vision – Complete Computer Vision Bootcamp With PyTorch & TensorFlow

If you’re already experienced and looking for more advanced techniques, check out this deep-dive course – Modern Computer Vision GPT, PyTorch, Keras, OpenCV4

Object detection on YouTube videos is one of the most practical ways to bring deep learning into real-world scenarios.
Instead of working with static images, you connect a trained model to actual games, tutorials, or broadcasts and detect objects frame by frame as the video plays.
In this tutorial, we’ll focus on basketball games from YouTube and build a full YOLOv8 pipeline that can spot players, the ball, and the referee in real time.

The idea is simple but powerful.
We take raw YouTube links, stream or download the videos, break them into frames, automatically label those frames with a vision-language model, and then train a YOLOv8 detector on the generated dataset.
At the end, we point the model at a new YouTube game and watch it highlight everything on the court.

This post is built around a complete code walkthrough.
We’ll set up a clean Conda environment, extract images with CamGear and OpenCV, auto-label them using Autodistill and Grounding DINO, inspect the annotations, train a YOLOv8 model, and finally test it on a fresh video.
If you follow along, you’ll end up with a reusable template for object detection on YouTube videos that you can adapt to other sports or use cases.

Getting the environment ready for YOLOv8 and YouTube video tools

Before we write any Python code, it’s important to create a dedicated environment.
This keeps PyTorch, CUDA, and your video libraries isolated so you can experiment freely without breaking other projects.
The following commands create a Conda environment, make sure CUDA is installed, and add all the core packages you need for this tutorial.

### Create a new Conda environment named Autodistill with Python 3.8 so we can keep this project isolated. conda create --name Autodistill python=3.8  ### Activate the Autodistill environment so every package we install is scoped to this tutorial. conda activate Autodistill  ### Check that the CUDA toolkit is available and verify its version on your system. nvcc --version  ### Install PyTorch 2.1.1, torchvision 0.16.1, torchaudio 2.1.1, and the CUDA 11.8 runtime from the official channels. conda install pytorch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 pytorch-cuda=11.8 -c pytorch -c nvidia  ### Install the Ultralytics package which provides YOLOv8 models and convenient training and inference utilities. pip install ultralytics==8.1.0  ### Install the Vidgear library so we can use CamGear to stream frames directly from YouTube videos. pip install vidgear  ### Install yt_dlp in case we want to download YouTube videos instead of streaming them. pip install yt_dlp

With this environment in place, you’re ready to build a full end-to-end pipeline that connects YouTube videos to YOLOv8 object detection.

Turning YouTube basketball games into training images

The first big step is to convert raw YouTube games into a folder of images.
Instead of manually downloading and cutting videos, we’ll use CamGear to stream each URL and OpenCV to save individual frames at a fixed resolution.
These frames will become the input for our automatic annotation step later.

### Import the OpenCV library so we can read, resize, draw on, and save video frames as images. import cv2  ### Import the os module to work with directories, paths, and file system checks. import os  ### Import CamGear from the vidgear package to stream frames directly from YouTube URLs. from vidgear.gears import CamGear  ### Define a list of YouTube URLs that will act as our training video sources for basketball object detection. train_URLs = ['https://youtu.be/32bsPfx1kmY?si=JzcaFYrmkp9ubB3O',             'https://youtu.be/QxzdEivMbvE?si=JuJMv-dzVSvkHouG']  ### Start a frame counter so each saved image gets a unique index in its filename. numerator = 0  ### Define the local folder where all extracted training images will be stored. output_path_images = "c:/data-sets/Mac-Real/images"  ### Create the output folder if it does not already exist so the script will not crash. if not os.path.exists(output_path_images):     os.makedirs(output_path_images)  ### Loop over every YouTube URL in the training list. for url in train_URLs:     ### Print the current URL so you can see which video is being processed.     print(url)      ### Create a CamGear stream from the YouTube URL in stream_mode with logging enabled.     stream = CamGear(source=url , stream_mode=True, logging=True ).start()      ### Read and display the video frames one by one in a loop.     while True :          ### Grab the next frame from the CamGear stream.         frame = stream.read()          ### Print the current frame index to track progress in the console.         print(numerator)          ### Increase the frame counter by one after reading a frame.         numerator = numerator + 1          ### If no frame is returned, the stream has ended, so we break out of the loop.         if frame is None:             break           ### Build the full output path for the current image using the counter in the filename.         image_output_path = output_path_images + "/" + "images" + str(numerator) + ".png"          ### Resize the frame to 640x640 pixels so it matches the expected YOLOv8 input size.         resized = cv2.resize(frame , (640,640) , interpolation=cv2.INTER_AREA)          ### Save the resized frame to disk as a PNG file in the images folder.         cv2.imwrite(image_output_path, resized)          ### Overlay the frame index on the image so you can see which frame is being displayed.         cv2.putText(frame , "imag no. " + str(numerator), (100,100), cv2.FONT_HERSHEY_SIMPLEX, 3, (0,255,0), 4)          ### Show the current frame in a window named 'img' so you can monitor the extraction process visually.         cv2.imshow("img", frame)          ### Wait for 25 milliseconds and allow the user to quit the loop by pressing the 'q' key.         if cv2.waitKey(25) & 0xFF == ord('q'):             break   ### Close all OpenCV windows when the extraction loop is finished. cv2.destroyAllWindows()  ### Stop the CamGear stream and release any underlying resources. stream.stop()

After this step, you have a folder full of basketball frames taken directly from YouTube games.
These images are the raw material for building a custom dataset tailored to your object detection on YouTube videos project.

Auto-labeling basketball frames with Autodistill and Grounding DINO

Next we want to transform raw frames into a labeled dataset.
Instead of manually drawing hundreds of boxes, we’ll use Autodistill with Grounding DINO to automatically detect players, referees, and the ball based on text prompts.
This step turns your video frames into YOLO-style labels that are ready for training.

### Import the CaptionOntology helper so we can map text prompts to clean class names. from autodistill.detection import CaptionOntology  ### Define the text based ontology that connects descriptive prompts to short label names. ontology = CaptionOntology({     "basketball player with blue shirt": "Maccabi player",     "basketball player with white shirt": "Real Madrid player",     "Baskball orange ball": "ball",     "person with black or orange shirt" : "referee", })  ### Set the path where the extracted PNG images from the YouTube videos are stored. IMAGE_DIR_PATH = "C:/Data-sets/Mac-Real/images"  ### Set the folder where the labeled YOLO dataset will be written. DATASET_DIR_PTH = "C:/Data-sets/Mac-Real/dataset"  ### Choose the minimum score for bounding boxes to be kept by the detector. BOX_THRESHOLD = 0.3  ### Choose the minimum confidence score for the text grounding step. TEXT_THRESHOLD = 0.3   ### Import the GroundingDINO autodistill wrapper which uses the ontology to detect objects. from autodistill_grounding_dino import GroundingDINO  ### Create the GroundingDINO base model with our ontology and threshold values. base_model = GroundingDINO(ontology=ontology , box_threshold=BOX_THRESHOLD, text_threshold=TEXT_THRESHOLD)  ### Run automatic labeling on all PNG images in the input folder and save YOLO labels into the dataset directory. dataset = base_model.label(input_folder=IMAGE_DIR_PATH , extension=".png" , output_folder=DATASET_DIR_PTH)

By the end of this block, your images folder is accompanied by labels files that follow the YOLO format.
Each line corresponds to a detected Maccabi player, Real Madrid player, ball, or referee, with normalized coordinates that YOLOv8 can understand.

Visualizing random annotations to validate the dataset

Before training any model, it’s smart to visually inspect the labels.
In this section, you’ll pick random images from the training set, read their YOLO label files, convert the coordinates back to pixel values, and draw the boxes on top of the images.
This makes it easy to spot common problems like missing boxes, wrong classes, or misaligned coordinates.

### Import the os module to list files and build full paths for images and labels. import os  ### Import the random module so we can sample a subset of images for quick inspection. import random  ### Import matplotlib.pyplot in case you want to extend this script to use Matplotlib plots. import matplotlib.pyplot as plt  ### Import OpenCV to load images and draw rectangles and text on them. import cv2  ### Define the human readable class names in the same index order used in the YOLO label files. label_names=["Maccabi player", "Real Madrid player", "ball", "referee"]  ### Define a helper function that reads a YOLO label file and returns a list of annotations. def get_annoations(original_img,label_file):      ### Open the label file in read mode so we can parse every line.     with open(label_file, 'r') as file:             lines = file.readlines()              ### Create an empty list that will store tuples of (label, x, y, w, h).     annotations = []          ### Loop through each line in the label file to extract the label and bounding box coordinates.     for line in lines:         ### Split the line on whitespace to separate the label index and the four coordinate values.         values = line.split()         ### The first value in the line is the class label index as a string.         label = values[0]         ### The remaining four values are the normalized x, y, w, and h values which we convert to floats.         x, y, w, h = map(float, values[1:])         ### Append the parsed annotation as a tuple to the list.         annotations.append((label, x, y, w, h))      ### Return the complete list of annotations read from the label file.     return annotations  ### Define a helper function that draws all annotations on top of an image. def put_annoations_in_image(image,annotations):          ### Extract the image height, width, and number of channels from the array shape.     H, W, _ = image.shape      ### Loop over every annotation tuple to convert coordinates and draw boxes.     for annotation in annotations:             ### Unpack the label index and normalized coordinates from the tuple.             label, x, y, w, h = annotation             ### Print the raw values to the console for debugging if needed.             print(label, x, y, w, h)             ### Map the numeric label index to a human readable class name.             label_name = label_names[int(label)]                          ### Convert normalized YOLO coordinates into pixel coordinates for the top left and bottom right corners.             x1 = int((x - w / 2) * W)             y1 = int((y - h / 2) * H)             x2 = int((x + w / 2) * W)             y2 = int((y + h / 2) * H)                          ### Draw the bounding box rectangle on the image using a visible color and line thickness.             cv2.rectangle(image, (x1, y1), (x2, y2), (200, 200, 0), 1)                          ### Draw the class name text slightly above the top left corner of the bounding box.             cv2.putText(image, label_name, (x1, y1 - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (200, 200, 0), 2), cv2.LINE_AA      ### Return the image with all annotations drawn so it can be displayed.     return image  ### Define a function that picks random images and visualizes their annotations. def display_random_images(folder_path, num_images, label_folder):     ### Build a list of all image filenames in the given folder.     image_files = [f for f in os.listdir(folder_path) if os.path.isfile(os.path.join(folder_path, f))]     ### Randomly choose the required number of image filenames from the list.     selected_images = random.sample(image_files, num_images)      ### Iterate through each randomly selected image for visualization.     for i, image_file in enumerate(selected_images):                  ### Read the current image from disk using its filename.         img = cv2.imread(os.path.join(folder_path, image_file))                  ### Build the matching label filename with a .txt extension.         label_file = os.path.splitext(image_file)[0] + '.txt'         ### Build the full path to the label file inside the labels folder.         label_file_path = os.path.join(label_folder, label_file)                  ### Read annotations from the label file and store them in YOLO format.         annotations_Yolo_format = get_annoations(img,label_file_path)                  ### Draw bounding boxes and labels on a copy of the image using the parsed annotations.         image_with_anotations = put_annoations_in_image(img,annotations_Yolo_format)          ### Print the resulting image shape to confirm that dimensions are preserved.         print(image_with_anotations.shape)          ### Show the annotated image in its own OpenCV window so you can inspect it visually.         cv2.imshow("img no. " + str(i),image_with_anotations)          ### Wait for a key press before moving on to the next random image.         cv2.waitKey(0)              ### After reviewing all selected images, close any Matplotlib figures if used later.     plt.show()  ### Set the path to the YOLO training images generated by the automatic labeling step. images_path = 'C:/Data-sets/Mac-Real/dataset/train/images'  ### Set the path to the YOLO label files that match the training images. label_folder = 'C:/Data-sets/Mac-Real/dataset/train/labels'  ### Choose how many random images you want to visualize from the training set. num_images = 4  ### Call the helper function to display random annotated training images. display_random_images(images_path, num_images, label_folder)

By quickly scanning a few annotated images, you can confirm that the ontology prompts and thresholds are working well before spending time on a long YOLOv8 training run.

Training a YOLOv8 model on the basketball YouTube dataset

Once the dataset looks good, it’s time to train YOLOv8.
Here we start from a small YOLOv8 model, point it to a custom data.yaml, and configure key training arguments such as epochs, batch size, and image size.
This step transforms your auto-labeled frames into a specialized model for basketball-focused object detection on YouTube videos.

### Import the YOLO class from the ultralytics package so we can create and train YOLOv8 models. from ultralytics import YOLO  ### Define the main training function that will configure and run the YOLOv8 experiment. def main():      ### Load the YOLOv8 small model architecture from the provided YAML configuration file.     model = YOLO('yolov8s.yaml') # load the small model      ### Define the path to the custom data configuration file that describes the dataset and class names.     config_file_path = "Best-Object-Detection-models/Yolo-V8/Auto-Annotation-FromYotube-Maccabi/data.yaml"      ### Set the directory where YOLOv8 will store training runs, logs, and checkpoints.     project = "C:/Data-sets/Mac-Real/dataset/checkpoints"      ### Name this particular experiment so you can easily find its results and weights.     experiment = "small-Model"      ### Choose the batch size for training and reduce it if you encounter GPU memory issues.     batch_size = 32 # reduce to 16 if you have memory errors      ### Start the YOLOv8 training process using the custom dataset configuration and training parameters.     results  = model.train(data=config_file_path,                            epochs=300,                            project=project,                            name=experiment,                            batch=batch_size,                            device=0,                            patience=40,                            imgsz=640,                            verbose=True,                            val=True)      ### Run the main function only when this script is executed directly. if __name__ =="__main__":         main()

And here is the data.yaml file that describes the dataset and classes for the model:

train : C:/Data-sets/Mac-Real/dataset/train/images val : C:/Data-sets/Mac-Real/dataset/valid/images  #class names   nc: 4 names : - Maccabi player - Real Madrid player - ball - referee

After training, the best model weights are stored inside the checkpoints/small-Model/weights folder, ready to be used for inference on new basketball games.

Running object detection on a new YouTube basketball video

The final step is where everything comes together.
Here you load the best YOLOv8 weights and connect to a fresh YouTube video, then draw detections in real time as the game plays.
This gives you a complete demo of custom object detection on YouTube videos using a model you trained from automatically annotated frames.

### Import OpenCV to read frames from the stream and draw bounding boxes and labels on them. import cv2   ### Import the YOLO class so we can load the trained model and run inference on each frame. from ultralytics import YOLO  ### Import the os module to help build file paths in a cross-platform way. import os   ### Import CamGear from vidgear to stream frames directly from a YouTube URL in real time. from vidgear.gears import CamGear   ### Define the YouTube URL for the test basketball video we want to run detection on. test_url = 'https://youtu.be/7qKU1b2Shr8?si=XWTU1Fbc0XtIs-yv'  ### Build the full path to the best model weights produced during training. model_path = os.path.join("C:/Data-sets/Mac-Real/dataset/checkpoints","small-Model","weights","best.pt")  ### Load the YOLOv8 model from the saved weights so it is ready to process new frames. model = YOLO(model_path)  ### Set the confidence threshold so only reliable detections are drawn on the video. threshold = 0.25   ### Initialize a simple frame counter in case you want to log or debug frame numbers. n= 0   ### Create a CamGear stream for the YouTube video in stream mode with logging enabled. stream = CamGear(source=test_url, stream_mode= True, logging=True).start()   ### Loop forever and process frames from the YouTube stream one at a time. while True :      ### Read the next frame from the CamGear stream.     frame = stream.read()      ### If no frame is returned, it means the stream has ended, so break out of the loop.     if frame is None:         break      ### Run the YOLOv8 model on the current frame and take the first result object.     results = model(frame)[0]      ### Loop over all detected boxes returned by YOLOv8 for this frame.     for result in results.boxes.data.tolist():         ### Unpack the bounding box coordinates, confidence score, and class index from the result.         x1, y1, x2, y2 , score, class_id = result           ### Convert the floating point coordinates into integer pixel positions.         x1 = int(x1)         y1 = int(y1)         x2 = int(x2)         y2 = int(y2)          ### Only draw the box and label if the detection confidence is above the chosen threshold.         if score > threshold :             ### Draw the bounding box rectangle for the detected object in a visible color.             cv2.rectangle(frame , (x1,y1), (x2,y2), (3,240,252), 1)              ### Draw the class name text just above the bounding box using the names dictionary from YOLOv8.             cv2.putText(frame , results.names[int(class_id)].upper(), (x1,y1-10),                         cv2.FONT_HERSHEY_SIMPLEX, 0.5, (3,240,252), 1)                  ### Show the annotated frame in a window named 'img' so you can see detections in real time.     cv2.imshow('img', frame)      ### Break out of the loop and stop the stream when the user presses the 'q' key.     if cv2.waitKey(25) & 0XFF == ord('q'):         break  ### Close all OpenCV windows to clean up the display. cv2.destroyAllWindows()  ### Stop the CamGear stream and release the underlying resources. stream.stop()

At this point, you have a complete pipeline: from YouTube URLs to training images, automatic labeling, YOLOv8 training, and live object detection on new videos.

FAQ — Object detection on YouTube videos with YOLOv8

What does this basketball YouTube tutorial cover?

This tutorial covers a full pipeline for object detection on YouTube videos, from extracting basketball frames and auto-labeling them to training YOLOv8 and running live inference.

Which objects are detected in the final model?

The model is trained to detect Maccabi players, Real Madrid players, the basketball, and the referee in YouTube game footage.

Why do we use Autodistill instead of manual labeling?

Autodistill automatically generates YOLO labels from text prompts, which saves a huge amount of time compared to drawing bounding boxes by hand on every frame.

How does Grounding DINO help in auto-labeling?

Grounding DINO connects your natural-language prompts to specific regions in each image, allowing Autodistill to create accurate bounding boxes for the classes you care about.

What is the role of the data.yaml file in YOLOv8 training?

The data.yaml file tells YOLOv8 where the training and validation images are stored and lists the exact order of class names used during training and inference.

Can I change the classes for a different sport or domain?

Yes, you can edit the ontology prompts, update the class names in data.yaml, and point to new YouTube videos to train a model for any other sport or object type.

Do I need to download YouTube videos to disk?

No, the example uses CamGear to stream frames directly from YouTube, but you can also download videos with yt_dlp if you prefer working with local files.

What image size is used for training YOLOv8?

The tutorial resizes all frames to 640×640 pixels, which is a common and efficient resolution for YOLOv8 object detection models.

How can I monitor training progress?

YOLOv8 prints training metrics to the console and saves detailed logs and plots in the project folder so you can track loss and mAP over time.

Is this pipeline suitable for real-time deployment?

With a suitable GPU and the small YOLOv8 model, you can process YouTube frames in real time, making this pipeline a solid starting point for live dashboards or analytic tools.

Conclusion

Building object detection on YouTube videos around YOLOv8 and Autodistill gives you a practical, end-to-end blueprint for real-world computer vision projects.
You start from simple YouTube links, extract frames, and let a modern vision-language model generate high-quality labels that would take hours to draw by hand.
From there, YOLOv8 training becomes a straightforward step, because your dataset already follows a clean, consistent structure.

By validating random annotations, you make sure the model is learning from trustworthy data instead of noisy labels.
Once training is done, the same code structure lets you plug in any new game or broadcast and watch the detections unfold in real time.
This combination of automatic labeling, flexible training, and live inference is powerful because you can reuse it across sports, domains, and even different object detection architectures.

Most importantly, this tutorial shows that advanced workflows don’t have to be mysterious.
Each block of code serves a clear purpose: setting up the environment, preparing data, auto-labeling, validating, training, and finally deploying the model on actual YouTube content.
If you adapt the ontology, paths, and classes to your own ideas, you can quickly spin up new models that understand the things you care about—and share compelling visual demos with your audience or clients.

Connect

☕ Buy me a coffee — https://ko-fi.com/eranfeit

🖥️ Email : feitgemel@gmail.com

🌐 https://eranfeit.net

🤝 Fiverr : https://www.fiverr.com/s/mB3Pbb

Enjoy,

Eran