Real-time Youtube video stream extraction and object detection

Leave a Comment / Object Detection, Pytorch

Last Updated on 22/11/2025 by Eran Feit

Bringing object detection from YouTube video into focus

Object detection from youtube video is all about teaching your code to “see” what is happening inside an online video stream.
Instead of manually downloading a file and running offline analysis, you connect directly to a YouTube URL, grab each frame in real time, and let a deep learning model such as YOLOv8 highlight the objects that matter.
This approach is perfect when you want to analyze live broadcasts, long recordings, or frequently updated content without extra manual steps.

At a high level, the process looks like a simple pipeline.
First, you open the YouTube stream and decode it into individual frames.
Each frame is then passed through an object detection model that outputs bounding boxes, class labels, and confidence scores.
Finally, you draw these detections back onto the video frame so you can watch the original stream and the detected objects at the same time.

Working this way unlocks powerful real-time use cases.
You can monitor traffic cameras, sports games, security feeds, or educational videos directly from YouTube and have your Python script react while the video is still playing.
Because everything happens frame-by-frame, the model can keep up with motion, changing scenes, and new objects entering or leaving the view.

From a practical perspective, this workflow combines several tools.
A streaming library handles the YouTube video input, OpenCV displays and annotates the frames, and YOLOv8 performs the heavy lifting of detection on each frame.
Together, they turn a simple link into a live computer vision lab where you can experiment, measure performance, and build more advanced applications such as tracking, counting, or alerts on top of the basic detections.

If you enjoy this walkthrough of object detection from YouTube video, you can explore more tutorials in my object detection blog category where I cover different models and real-world projects step by step.

Link for the video tutorial : https://youtu.be/ofjR16swp3E

Code for the tutorial here : https://eranfeit.lemonsqueezy.com/buy/c56cbb3f-3b28-4aea-9fd1-c6ae6fb6c9a9 or here : https://ko-fi.com/s/72718d40e2

Link for Medium users : https://medium.com/object-detection-tutorials/real-time-youtube-video-stream-extraction-and-object-detection-23370fc39e94

You can follow my blog here : https://eranfeit.net/blog/

Want to get started with Computer Vision or take your skills to the next level ?

If you’re just beginning, I recommend this step-by-step course designed to introduce you to the foundations of Computer Vision – Complete Computer Vision Bootcamp With PyTorch & TensorFlow

If you’re already experienced and looking for more advanced techniques, check out this deep-dive course – Modern Computer Vision GPT, PyTorch, Keras, OpenCV4

Real-time youtube video stream extraction and object detection

Walking through the real-time detection code step by step

This tutorial is all about turning a simple Python script into a real-time lab for object detection from YouTube video.
Instead of working with a local MP4 file, the code connects directly to a YouTube stream, reads the frames one by one, and sends each frame into a YOLOv8 model.
The result is a live video window where you can see bounding boxes and class names drawn over people, cars, and other objects as they appear in the stream.

The first part of the code focuses on setting up the environment and importing the right tools.
You create a dedicated Conda environment, install PyTorch with CUDA support, add the ultralytics package for YOLOv8, and bring in vidgear and yt_dlp to handle the YouTube video source.
This keeps the project clean and repeatable, so you can come back later, recreate the environment, and get the same behavior on another machine.

Next, the script opens the YouTube video using CamGear and prepares the YOLOv8 nano model.
Instead of downloading the video, CamGear pulls frames directly from the provided YouTube URL, which is perfect for testing on long clips or live streams.
The YOLO model is loaded once at the start, and a confidence threshold is defined so the script can filter out low-confidence detections and keep the output readable.

The main loop is where the real action happens.
Each iteration reads a frame from the stream, checks that it is valid, and then passes it through the YOLOv8 model.
The detections are unpacked into bounding coordinates, class IDs, and scores, and only boxes with a score above the threshold are drawn on the frame.
OpenCV is then used to show the annotated frame in a window, giving you immediate visual feedback on how well the detector is performing.

Finally, the code handles cleanup and a graceful exit.
Pressing the q key breaks the loop, closes the OpenCV window, and stops the video stream so no resources are left hanging.
This structure makes the script a solid starting point for more advanced projects, such as counting objects, tracking movement over time, or triggering alerts based on the detections you see in the YouTube video.

Real-time youtube video stream extraction and object detection

Object detection from youtube video is a powerful way to run deep learning on content that already lives online.
Instead of downloading files manually, you connect straight to a YouTube URL, read the frames in real time, and let a trained model highlight the objects on screen.
This turns any public video into a live playground for experiments, dashboards, and quick prototypes.
In this post we will walk through a complete Python script that does exactly that using YOLOv8, OpenCV, and the CamGear streaming library.

Once the basic pipeline is in place, you can point the same code to different videos and reuse the same object detection logic.
You can monitor traffic, sports games, or demo clips and immediately see bounding boxes and class names appear on each frame.
The tutorial is designed to stay beginner friendly while still being flexible enough for more advanced computer vision projects.
By the end you will be comfortable building and extending your own real time object detection from youtube video setups.

Getting ready with the environment and tools

Before we dive into the Python script, it helps to have a clean environment dedicated to YOLOv8 and video streaming.
Using Conda keeps dependencies isolated so you can install PyTorch with CUDA, ultralytics, vidgear, and yt_dlp without breaking other projects.
You can reuse the same environment later for other tutorials that work with object detection or video processing.

Here is one example of how you can create that setup in your terminal.

conda create --name YoloV8 python=3.8 conda activate YoloV8  nvcc --version  # optional: check that CUDA is visible  # install PyTorch with CUDA 11.8 support conda install pytorch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 pytorch-cuda=11.8 -c pytorch -c nvidia  # install YOLOv8 and video tools pip install ultralytics==8.1.0 pip install vidgear pip install yt_dlp

Once this is ready, you can open your favorite editor and start building the main script that streams a YouTube video and runs YOLOv8 on every frame.
We will break that script into three parts so you can understand each step clearly and copy the code directly into your own project.

To see another beginner-friendly detector, check out my YOLOv5 object detection in 15 minutes tutorial, where we build a fast pipeline on local videos before moving to streams.

Setting up YOLOv8 and the YouTube video stream

In this first part we import the core libraries and open a live stream from a YouTube URL.
We also load the YOLOv8 nano model and define a confidence threshold so we only draw boxes for reliable detections.
This sets the stage for real time processing before we move into the main loop.

### Import OpenCV so we can work with images and draw on video frames. import cv2  ### Import the YOLO class from the ultralytics package to run a pretrained YOLOv8 model. from ultralytics import YOLO  ### Import the os module in case you want to work with file paths or environment variables later. import os  ### Import CamGear from vidgear to capture frames directly from a YouTube video stream. from vidgear.gears import CamGear  ### Create a video stream from the YouTube URL using CamGear in stream mode. stream = CamGear(     source="https://www.youtube.com/watch?v=3sgewysRGZY",     stream_mode=True,     logging=True ).start()  ### Load the YOLOv8 nano model for fast real time object detection from youtube video. model = YOLO("yolov8n.pt")  ### Set a confidence threshold to filter out low confidence detections. threshold = 0.25

This block connects your script to the online video and prepares the model to analyze each frame.
The yolov8n variant is intentionally lightweight, which makes it a good fit for real time demos even on modest hardware.

Processing each frame and drawing detections in real time

Now we build the main loop that reads frames from the stream, runs YOLOv8 on each one, and overlays bounding boxes and labels.
This is where the live object detection from Youtube video experience really comes to life on your screen.

### Start a loop to process frames from the YouTube video stream continuously. while True:     ### Read the next frame from the YouTube stream.     frame = stream.read()      ### If no frame is returned, break the loop because the stream has ended or failed.     if frame is None:         break      ### Run the YOLOv8 model on the current frame and take the first result object.     results = model(frame)[0]      ### Iterate over each detection box in the results as a Python list.     for result in results.boxes.data.tolist():         ### Unpack the bounding box coordinates, confidence score, and class id from the detection.         x1, y1, x2, y2, score, class_id = result          ### Convert the x and y coordinates to integers so OpenCV can draw precise boxes.         x1 = int(x1)         y1 = int(y1)         x2 = int(x2)         y2 = int(y2)          ### Check whether the confidence score is above the chosen detection threshold.         if score > threshold:             ### Draw a green rectangle around the detected object on the frame.             cv2.rectangle(frame, (x1, y1), (x2, y2), (0, 255, 0), 1)              ### Put the detected class name text above the bounding box in uppercase letters.             cv2.putText(                 frame,                 results.names[int(class_id)].upper(),                 (x1, y1 - 10),                 cv2.FONT_HERSHEY_SIMPLEX,                 0.5,                 (0, 255, 0),                 1,                 cv2.LINE_AA             )      ### Show the annotated frame in a window titled Video.     cv2.imshow("Video", frame)      ### Wait 25 milliseconds for a key press and break the loop if the user presses q.     if cv2.waitKey(25) & 0xFF == ord("q"):         break

Each iteration of this loop turns a raw video frame into a labeled scene with rectangles and class names.
The threshold value lets you tune how strict the detector should be, which is useful when you want fewer false positives or more aggressive coverage.

Once you are comfortable with this YOLOv8 streaming workflow, you can move on to my YOLOX object detection tutorial, which walks through a modern anchor-free detector using a similar step-by-step structure.

Cleaning up the stream and closing the application

After the user quits or the video ends, it is important to close the OpenCV window and stop the stream.
This prevents your application from leaving background processes running or locking system resources.

### Close all OpenCV windows created during the object detection session. cv2.destroyAllWindows()  ### Safely stop the YouTube video stream to release network and system resources. stream.stop()

With this final piece, your script now behaves like a complete mini application.
It starts a stream, performs object detection from youtube video in real time, and exits cleanly when you are done.

For more advanced workflows that combine detection and segmentation, have a look at my Segment Anything and YOLOv8 masks tutorial, where YOLOv8 detections are used to drive precise SAM-based segmentation.

FAQ: Object detection from YouTube video

What is object detection from YouTube video?

Object detection from YouTube video uses a deep learning model to analyze frames streamed from a YouTube URL in real time. It lets you detect objects without downloading the video file first.

Which model is used for detection in this tutorial?

This tutorial uses the YOLOv8 nano model because it offers a strong balance between speed and accuracy. It is lightweight enough for real-time demos on many laptops and desktops.

Why do we need CamGear instead of only OpenCV?

CamGear simplifies streaming from YouTube by handling the network and decoding logic for you. It plugs into OpenCV-style workflows so you still work with familiar frames.

Can I run this code without a GPU?

Yes, the code can run on CPU, although the frame rate may be lower. Reducing input resolution and using the nano model helps keep it usable on non-GPU machines.

How do I change the YouTube video used in the stream?

You can change the source parameter in the CamGear constructor to any other public YouTube URL. The rest of the detection loop stays the same.

What does the confidence threshold control?

The confidence threshold sets the minimum score an object must have before it is drawn on the frame. Increasing it reduces false positives, while lowering it shows more detections.

How can I focus on specific object classes only?

You can filter detections by checking the class_id value returned by YOLOv8 and drawing boxes only for selected classes. This is useful when you care about people, vehicles, or other targeted objects.

How do I record the processed video with detections?

Create an OpenCV VideoWriter with your desired codec and frame size, then call write on each annotated frame. This lets you save a new video that includes all bounding boxes and labels.

What happens when I press the q key in the window?

Pressing the q key triggers a break in the main while loop and closes the live detection window. It is a simple way to stop the streaming process safely.

Can I adapt this script for local files or webcams?

Yes, you can replace CamGear with cv2.VideoCapture pointing to a file path or webcam index. The rest of the object detection loop remains almost identical.

If you want to turn video streams into labeled training data, my YOLOv8 auto-label segmentation tutorial shows how to process each frame, generate masks, and build a dataset from raw footage.

Conclusion

Streaming directly from YouTube and running YOLOv8 on every frame gives you a fast, flexible way to explore real time computer vision.
With only a few dozen lines of Python, you built a complete loop that reads frames, detects objects, and overlays clean labels on top of the original video.

Once this pattern is clear, you can adapt it to many other sources such as webcams, RTSP cameras, or pre-recorded videos.
You can filter classes, log detections to disk, record annotated videos, or plug the results into downstream analytics.

Most importantly, the project shows that object detection from youtube video does not need to be complicated or fragile.
With the right tools and a structured environment, you can get from a blank script to a working real time demo in a single focused session.

If you are working in the medical domain, you may also like my YOLOv8 dental object detection tutorial, where we apply similar techniques to X-ray images for real clinical workflows.

Connect :

☕ Buy me a coffee — https://ko-fi.com/eranfeit

🖥️ Email : feitgemel@gmail.com

🌐 https://eranfeit.net

🤝 Fiverr : https://www.fiverr.com/s/mB3Pbb

Enjoy,

Eran

Leave a Comment Cancel Reply