How to Build a Real-Time Video Classifier on NVIDIA Jetson Nano with Python

Last Updated on 14/10/2025 by Eran Feit

Introduction

Real‑time video classification on edge devices opens exciting possibilities—from wildlife monitoring to smart kiosks. NVIDIA’s Jetson Nano is a compact AI computer designed for these tasks. This tutorial walks through building a real‑time video classifier using OpenCV and Python on the Jetson Nano. You’ll learn how to load a pre‑trained neural network (GoogLeNet), process each video frame on the GPU, draw predicted labels on the stream, and optimize performance. Along the way we’ll explore why Jetson Nano is well suited for computer‑vision workloads and how to troubleshoot common pitfalls.

Jetson Nano hardware overview and why it matters

Compact yet powerful AI computer

The Jetson Nano is a small 69×45 mm system‑on‑module (SoM) that packs serious compute . Key features relevant to computer vision include:

GPU: a 128‑core NVIDIA Maxwell™ GPU providing up to 472 GFLOPS of FP16 performance ; this hardware acceleration enables neural networks to run in real time while keeping power consumption between 5–10 W.
CPU: quad‑core ARM A57 64‑bit processor for handling preprocessing and system tasks.
Memory: 4 GB LPDDR4 memory (25.6 GB/s bandwidth) plus 16 GB of onboard eMMC storage for models and data.
I/O interfaces: support for MIPI‑CSI cameras, HDMI/DisplayPort, USB 3.0/2.0, Gigabit Ethernet and GPIO, allowing multiple high‑resolution sensors and peripherals.
Software stack: Jetson Nano runs the NVIDIA JetPack SDK, which includes Linux, CUDA, cuDNN and TensorRT libraries for deep learning and computer vision. Popular frameworks such as TensorFlow, PyTorch and OpenCV are supported, and pre‑trained models like ResNet‑50, SSD MobileNet‑V2 and Tiny YOLO v3 can be deployed.

Jetson Nano 3QTR Front Left — Real-time object classification running on Jetson Nano using GoogLeNet and OpenCV

Why Jetson Nano suits computer‑vision tasks

Real‑time inference: GPU acceleration allows classification models like GoogLeNet to process video frames at 10–20 FPS. Lowering resolution or using lighter networks improves FPS further.
Edge deployment: Low power consumption (5–10 W) and small footprint enable battery‑powered or fanless deployments. On‑device processing keeps data private and reduces latency.
High throughput with multiple sensors: Jetson Nano can process multiple streams simultaneously, thanks to its GPU and high‑speed I/O.
Rich software ecosystem: JetPack provides CUDA, cuDNN and TensorRT plus integration with OpenCV, PyTorch and TensorFlow, simplifying development.

Image classification vs. object detection

Newcomers often confuse classification and detection.

Image classification assigns a label to an entire image or frame; for example, predicting “zebra” for a frame captured from a wildlife camera.

Object detection goes further by identifying and localizing multiple objects within an image, producing bounding boxes and labels. Object detection algorithms leverage CNN‑based approaches like YOLO and R‑CNN and are used for tasks such as counting crops, detecting weapons or identifying defects. In contrast, classification models output one or multiple labels for the whole frame.

Because classification is simpler, it can achieve higher frame rates on devices like Jetson Nano.

Setting up Jetson Nano for video classification

Before diving into code, ensure your device is prepared:

Install JetPack: Use the NVIDIA SDK Manager to flash the latest supported JetPack 4.6.x image onto your SD card or eMMC. JetPack includes CUDA 10.2, cuDNN and TensorRT. Updating ensures compatibility with OpenCV and jetson‑inference libraries.
Update packages: Open a terminal and run sudo apt update && sudo apt upgrade to install the latest security patches and drivers.
Install dependencies: Clone NVIDIA’s jetson‑inference repository and build it from source. This provides the jetson.inference and jetson.utils Python bindings. Avoid installing OpenCV via pip; instead use the preinstalled system version to ensure CUDA support.
Confirm Python version: Python 3.6–3.8 works well with JetPack 4.xt. Newer versions may require building OpenCV and bindings from source.
Connect a camera or prepare a video file: You can use a USB camera, an MIPI‑CSI camera or any MP4 video. For cameras, GStreamer pipelines provide reliable capture (see the FAQ below for examples).

In this tutorial, we’ll build a real-time wildlife video classifier using NVIDIA Jetson Inference and OpenCV in Python.
You’ll learn how to open a video file, convert frames into GPU-friendly memory, run GoogLeNet classification on every frame, and overlay the top class on the video when confidence is high.
This post fully answers the title by walking you through a clean, production-ready pattern: video I/O → GPU conversion → deep learning inference → polished on-screen results.
By the end, you’ll have a copy-paste script that runs smoothly on Jetson, plus the knowledge to swap models, tweak thresholds, and adapt it for your own datasets.

Want a hands-on detection workflow too? Explore my YOLOv8 heatmaps tutorial that visualizes model attention: Generating heatmaps with YOLOv8

You can find the full code here : https://ko-fi.com/s/7a72f61abe

If you’re new to Jetson Nano projects, I also recommend checking out my related tutorials: YOLOv8 Object Detection with Jetson Nano and Image Classification with ResNet50. These will give you more context on building computer vision pipelines with pre-trained networks.

Here is a video for Jetson Nano Real Time Image Classification:

The link for the video : https://youtu.be/AgOdXB34zaA

You can find more Nvidia Jetson Nano tutorials here : https://eranfeit.net/how-to-classify-objects-using-jetson-nano-inference-and-opencv/

Step‑by‑step classification pipeline

Below is a high‑level outline of the Python script. The full code can be downloaded from the linked repository and adapted to your needs.

1. Import libraries and load the model

import cv2
import jetson.inference
import jetson.utils

# Load the pre‑trained classification model.  GoogLeNet offers a good balance
# between accuracy and speed on Jetson Nano.
net = jetson.inference.imageNet("googlenet")

# Open a video file (or set device index for a camera)
cap = cv2.VideoCapture('/home/user/videos/wildlife.mp4')
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 1280)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 720)

2. Convert frames for GPU processing

Jetson’s CUDA backend expects RGBA images in GPU memory. Each loop iteration:

while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break
    # Convert BGR (OpenCV default) to RGBA
    frame_rgba = cv2.cvtColor(frame, cv2.COLOR_BGR2RGBA)
    # Copy frame into CUDA memory
    cuda_frame = jetson.utils.cudaFromNumpy(frame_rgba)

3. Classify and overlay predictions

After conversion, run inference and overlay the label when confidence exceeds a threshold:

    # Run classification on the GPU
    class_id, confidence = net.Classify(cuda_frame)
    class_desc = net.GetClassDesc(class_id)
    # Only display label if confidence > 0.4 (40 %)
    if confidence > 0.4:
        text = f"{class_desc}: {confidence:.2f}"
        cv2.putText(frame, text, (30, 80),
                    cv2.FONT_HERSHEY_SIMPLEX, 1.0,
                    (255, 255, 255), 2)
    cv2.imshow('Classification', frame)
    # Press 'q' to quit
    if cv2.waitKey(10) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()

Optimizations and advanced tips

Resolution and frame skipping: Downscaling frames or processing every Nth frame (e.g., every third frame) increases FPS. For instance, you can skip frames by incrementing a counter and continuing the loop when frame_idx % 3 != 0.
Alternative models: Jetson Nano supports models like ResNet‑18, MobileNet‑v2, or Tiny YOLO. Swapping "googlenet" for another model name in imageNet() loads a different classifier. YOLOv5/YOLOv8 detectors require more compute, so expect lower FPS.
GStreamer pipelines: If cv2.VideoCapture fails to open your video, use a GStreamer pipeline tailored to your source. For example, to read an MP4 file on Jetson you can use: filesrc location=video.mp4 ! qtdemux ! h264parse ! avdec_h264 ! videoconvert ! appsink. For USB cameras, use v4l2src device=/dev/video0 ! videoconvert ! video/x-raw,format=BGR ! appsink and open it with cv2.CAP_GSTREAMER.
Recording output: To save the annotated video, create a cv2.VideoWriter with matching FPS and resolution and call write() on each frame.
Confidence threshold: Adjust the threshold between 0.4–0.6 to balance sensitivity and false positives. Display confidence alongside the label using f"{confidence:.2f}".
Running headless: When running the script without a display (e.g. over SSH), remove calls to cv2.imshow() and instead stream frames to a file or to an MJPEG server.
Thermal management: For sustained high FPS, add a heatsink and fan. Set maximum performance mode with sudo nvpmodel -m 0 and ensure good airflow.
Avoid out‑of‑memory errors: Use smaller input sizes (e.g. 224×224), close other applications, and choose lightweight models when memory is limited.

FAQ :

What makes Jetson Nano suitable for video classification?

Jetson Nano integrates a 128‑core GPU and quad‑core CPU in a compact module, delivering up to 472 GFLOPS of compute power while consuming 5–10 W. These features allow real‑time inference and make it ideal for edge AI applications.

How does image classification differ from object detection?

Image classification assigns a single label to a whole image, whereas object detection identifies and localizes multiple objects within an image using bounding boxes. Detection provides spatial information but requires more computation.

Which JetPack and Python versions should I use?

JetPack 4.6.x with CUDA 10.2 and TensorRT 8.x is recommended for Jetson Nano. Python 3.6–3.8 works best; newer versions may require building OpenCV and bindings from source.

How do I install the required libraries?

Clone and build NVIDIA’s jetson‑inference repository to get the jetson.inference and jetson.utils Python bindings. Use the JetPack‑provided OpenCV rather than pip wheels to ensure CUDA support.

Why convert BGR frames to RGBA and CUDA memory?

Jetson’s CUDA backend expects RGBA images. Converting frames to RGBA and copying them to CUDA memory enables high‑throughput inference and allows the GPU to process frames efficiently.

How can I improve frame rate?

Downscale the video resolution, process every Nth frame, or use lighter models such as ResNet‑18 or MobileNet. Skipping frames—classifying every third frame—can significantly boost throughput on Jetson Nano.

What if cv2.VideoCapture fails on Nano?

Use a GStreamer pipeline to read your video source. For MP4 files, use filesrc → h264parse → avdec_h264 → videoconvert → appsink. For USB cameras, use v4l2src → videoconvert → appsink and open it with the CAP_GSTREAMER flag.

How do I save the annotated video?

Create a cv2.VideoWriter with matching frame size and FPS, write each annotated frame to it inside the loop, and release the writer after the loop finishes.

Can I run this script on Raspberry Pi?

Yes, but the absence of CUDA acceleration means the frame rate will be much lower. Consider using TensorFlow Lite or a Coral TPU for acceleration on Raspberry Pi.

How do I avoid duplicate Jetson posts hurting my SEO?

Consolidate overlapping articles or add canonical links to the most comprehensive post. Make sure each article covers a distinct angle and links to related tutorials to improve internal linking and avoid near‑duplicate content issues.

How to Build a Real-Time Video Classifier on NVIDIA Jetson Nano with Python 5

Conclusion

In this guide, you learned how to build a real-time video classifier on the Jetson Nano using OpenCV and Python. We explored why the Jetson Nano is ideal for edge AI applications, thanks to its powerful yet compact hardware and rich software ecosystem. The step-by-step instructions covered everything from installing dependencies and setting up the environment to processing video frames and optimizing performance. We also discussed common troubleshooting tips and best practices for improving frame rates, along with SEO recommendations to help your article gain visibility. By following these techniques, you can confidently develop your own computer-vision projects on the Jetson Nano and adapt them to a variety of real-world use cases.

If you’re comparing classic CV vs. deep learning, see this ResNet50 classification walkthrough: Alien vs Predator Image Classification with ResNet50

You can find the full code here : https://ko-fi.com/s/7a72f61abe

Connect :

☕ Buy me a coffee — https://ko-fi.com/eranfeit

🖥️ Email : feitgemel@gmail.com

🌐 https://eranfeit.net

🤝 Fiverr : https://www.fiverr.com/s/mB3Pbb

Enjoy,

Eran