Last Updated on 26/02/2026 by Eran Feit
Introduction
Mastering Jetson Nano Video Classification Python allows you to deploy powerful AI models directly on edge devices. In this tutorial, we build a high-performance classification pipeline using OpenCV and NVIDIA’s Jetson Inference library. You will learn how to process MP4 video files, convert frames for CUDA acceleration, and use the GoogLeNet (Inception v1) model to overlay real-time predictions. This workflow is essential for developers looking to move beyond simple image classification into real-time video analytics on the Maxwell GPU architecture.
If you want the single-image version, see this link .
If you want the live webcam / camera-stream version, see this link .
Understanding GoogLeNet (Inception v1) for Jetson Nano Video Classification
GoogLeNet is not a “service”. It’s a deep learning CNN architecture, also known as Inception v1, designed for efficient image classification.
In your tutorial it works as a frame-by-frame classifier: OpenCV reads an MP4 frame → GoogLeNet predicts one label + confidence for the whole frame → you overlay that result on the video.
What is GoogLeNet (Inception v1), and why this tutorial uses it?
GoogLeNet (also called Inception v1) is a convolutional neural network built to deliver strong classification accuracy while staying relatively efficient.
It became widely known from the “Going Deeper with Convolutions” paper and was used in large-scale image classification benchmarks such as ImageNet.
In this tutorial, GoogLeNet is used as a video-frame classifier.
That means: for every MP4 frame that OpenCV reads, the model predicts a single best class label (plus confidence), such as “dog”, “car”, or “soccer ball”.
This is different from object detection, where the model outputs multiple objects with bounding boxes.
Important detail: the GoogLeNet model used by jetson-inference is typically pre-trained on ImageNet (1000 categories).
So the labels you see come from that ImageNet label set unless you deploy a custom-trained model.
Jetson Nano hardware overview and why it matters
Compact yet powerful AI computer
The Jetson Nano is a small 69×45 mm system‑on‑module (SoM) that packs serious compute . Key features relevant to computer vision include:
- GPU: a 128‑core NVIDIA Maxwell™ GPU providing up to 472 GFLOPS of FP16 performance ; this hardware acceleration enables neural networks to run in real time while keeping power consumption between 5–10 W.
- CPU: quad‑core ARM A57 64‑bit processor for handling preprocessing and system tasks.
- Memory: 4 GB LPDDR4 memory (25.6 GB/s bandwidth) plus 16 GB of onboard eMMC storage for models and data.
- I/O interfaces: support for MIPI‑CSI cameras, HDMI/DisplayPort, USB 3.0/2.0, Gigabit Ethernet and GPIO, allowing multiple high‑resolution sensors and peripherals.
- Software stack: Jetson Nano runs the NVIDIA JetPack SDK, which includes Linux, CUDA, cuDNN and TensorRT libraries for deep learning and computer vision. Popular frameworks such as TensorFlow, PyTorch and OpenCV are supported, and pre‑trained models like ResNet‑50, SSD MobileNet‑V2 and Tiny YOLO v3 can be deployed.

Why Jetson Nano suits computer‑vision tasks
- Real‑time inference: GPU acceleration allows classification models like GoogLeNet to process video frames at 10–20 FPS. Lowering resolution or using lighter networks improves FPS further.
- Edge deployment: Low power consumption (5–10 W) and small footprint enable battery‑powered or fanless deployments. On‑device processing keeps data private and reduces latency.
- High throughput with multiple sensors: Jetson Nano can process multiple streams simultaneously, thanks to its GPU and high‑speed I/O.
- Rich software ecosystem: JetPack provides CUDA, cuDNN and TensorRT plus integration with OpenCV, PyTorch and TensorFlow, simplifying development.
Tested Setup + Benchmarks
Tested setup (so you can reproduce my results)
Tested on:
- Jetson Nano: 4GB
- JetPack: 4.6.x
- Python: 3.6–3.8
- OpenCV: JetPack system build (CUDA-enabled)
Real FPS Benchmarks (Jetson Nano)
Below are real FPS measurements you can reproduce on a standard Jetson Nano 4GB. These numbers are the fastest way to validate your setup and compare models fairly.
| Input Source | Resolution | Model | FPS | Notes |
|---|---|---|---|---|
| MP4 file | 1280×720 | GoogLeNet | 14-18 | Standard baseline |
| MP4 file | 640×360 | GoogLeNet | 28-34 | Recommended for real-time |
| MP4 file | 640×360 | ResNet-18 | 35-42 | Fastest inference option |
Tip: Use Jetson Nano MAXN mode for consistent benchmarking by running sudo nvpmodel -m 0 and sudo jetson_clocks.

Setting up Jetson Nano for video classification
Before diving into code, ensure your device is prepared:
- Install JetPack: Use the NVIDIA SDK Manager to flash the latest supported JetPack 4.6.x image onto your SD card or eMMC. JetPack includes CUDA 10.2, cuDNN and TensorRT. Updating ensures compatibility with OpenCV and jetson‑inference libraries.
- Update packages: Open a terminal and run
sudo apt update && sudo apt upgradeto install the latest security patches and drivers. - Install dependencies: Clone NVIDIA’s
jetson‑inferencerepository and build it from source. This provides thejetson.inferenceandjetson.utilsPython bindings. Avoid installing OpenCV via pip; instead use the preinstalled system version to ensure CUDA support. - Confirm Python version: Jetson Nano commonly runs JetPack 4.6.x, and your Python version depends on the JetPack/L4T image you flashed.
- Use NVIDIA’s JetPack release notes as the source of truth, then match your dependencies accordingly.
- JetPack 4.6: https://developer.nvidia.com/embedded/jetpack-sdk-46
- JetPack 4.6.3: https://developer.nvidia.com/jetpack-sdk-463
- Connect a camera or prepare a video file: You can use a USB camera, an MIPI‑CSI camera or any MP4 video. For cameras, GStreamer pipelines provide reliable capture (see the FAQ below for examples).
In this tutorial, we’ll build a real-time wildlife video classifier using NVIDIA Jetson Inference and OpenCV in Python.
You’ll learn how to open a video file, convert frames into GPU-friendly memory, run GoogLeNet classification on every frame, and overlay the top class on the video when confidence is high.
This post fully answers the title by walking you through a clean, production-ready pattern: video I/O → GPU conversion → deep learning inference → polished on-screen results.
By the end, you’ll have a copy-paste script that runs smoothly on Jetson, plus the knowledge to swap models, tweak thresholds, and adapt it for your own datasets.
Want a hands-on detection workflow too? Explore my YOLOv8 heatmaps tutorial that visualizes model attention: Generating heatmaps with YOLOv8
You can find the full code here : https://ko-fi.com/s/7a72f61abe
If you’re new to Jetson Nano projects, I also recommend checking out my related tutorials: YOLOv8 Object Detection with Jetson Nano and Image Classification with ResNet50. These will give you more context on building computer vision pipelines with pre-trained networks.
Here is a video for Jetson Nano Real Time Image Classification:
The link for the video : https://youtu.be/AgOdXB34zaA
You can find more Nvidia Jetson Nano tutorials here : https://eranfeit.net/how-to-classify-objects-using-jetson-nano-inference-and-opencv/
Building the Python Video Classification Pipeline Step-by-Step
Below is a high‑level outline of the Python script. The full code can be downloaded from the linked repository and adapted to your needs.
1. Import libraries and load the model
This step prepares the Python environment by importing the required libraries and loading the pre-trained GoogLeNet model from the Jetson Inference framework. The jetson_inference library provides optimized deep learning models that run efficiently on the Jetson Nano GPU using TensorRT acceleration.
Loading the model at the beginning ensures that it is ready to process frames as soon as the video stream starts. The GoogLeNet model used here is typically pre-trained on the ImageNet dataset, which contains 1,000 object categories. This allows the system to recognize common objects such as animals, vehicles, and everyday items without additional training.
By initializing the model once, you avoid repeated loading overhead during the video loop, which improves performance and keeps the frame rate stable — an important factor for real-time edge AI applications.
import cv2 import jetson.inference import jetson.utils # Load the pre‑trained classification model. GoogLeNet offers a good balance # between accuracy and speed on Jetson Nano. net = jetson.inference.imageNet("googlenet") # Open a video file (or set device index for a camera) cap = cv2.VideoCapture('/home/user/videos/wildlife.mp4') cap.set(cv2.CAP_PROP_FRAME_WIDTH, 1280) cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 720) 2. Convert frames for GPU processing
In this step, OpenCV’s VideoCapture is used to open an MP4 video file and prepare it for frame-by-frame processing. OpenCV acts as the bridge between the video source and the deep learning pipeline, allowing you to read frames in a loop.
Using a video file instead of a live camera makes the workflow reproducible and easier to debug. You can test performance, verify predictions, and measure FPS consistently using the same input video.
If the video fails to open, it often indicates missing codecs or an incorrect file path. On Jetson Nano, video decoding may rely on GStreamer pipelines, so ensuring proper codec support is essential for smooth playback.
Once the video is open, frames are read sequentially inside a loop. Each iteration retrieves a single frame that will be processed by the neural network. This frame-by-frame approach enables real-time analysis of video content.
Processing individual frames allows you to apply deep learning inference continuously, making it possible to classify objects throughout the video. This is the foundation of video analytics, where insights are derived from each frame rather than the video as a whole.
The loop also checks whether frames are successfully retrieved. When the video reaches its end, the loop exits gracefully, preventing errors and ensuring clean program termination.
while cap.isOpened(): ret, frame = cap.read() if not ret: break # Convert BGR (OpenCV default) to RGBA frame_rgba = cv2.cvtColor(frame, cv2.COLOR_BGR2RGBA) # Copy frame into CUDA memory cuda_frame = jetson.utils.cudaFromNumpy(frame_rgba) 3. Classify and overlay predictions
Here, the GoogLeNet model analyzes the frame and predicts the most likely class label along with a confidence score. The model evaluates visual patterns such as shapes, textures, and colors to determine what object is present.
Because GoogLeNet is trained on ImageNet, it recognizes a wide variety of everyday objects. The confidence score indicates how certain the model is about its prediction, helping you decide whether to display or filter the result.
This step is the core of the tutorial — transforming raw video frames into meaningful information. It demonstrates how edge AI can interpret visual data in real time without relying on cloud services.
After classification, the predicted label and confidence score are drawn on the frame using OpenCV text rendering. This visual overlay allows users to see the AI’s decision directly on the video output.
Displaying FPS (frames per second) provides insight into system performance. Monitoring FPS helps you optimize resolution, frame skipping, and model selection to achieve smoother real-time results.
This step transforms the system from a backend inference engine into an interactive visual application. It makes the results understandable at a glance and demonstrates the practical value of AI at the edge.
The processed frame is displayed in a window using OpenCV, allowing you to view the classification results in real time. This creates a complete pipeline from video input to AI-enhanced output.
Real-time display is important for debugging and validation. By watching the output, you can verify whether the model is making correct predictions and whether the performance meets your expectations.
This step also highlights the end-to-end nature of the system — from video capture to GPU inference to visual feedback — all running locally on the Jetson Nano.
# Run classification on the GPU class_id, confidence = net.Classify(cuda_frame) class_desc = net.GetClassDesc(class_id) # Only display label if confidence > 0.4 (40 %) if confidence > 0.4: text = f"{class_desc}: {confidence:.2f}" cv2.putText(frame, text, (30, 80), cv2.FONT_HERSHEY_SIMPLEX, 1.0, (255, 255, 255), 2) cv2.imshow('Classification', frame) # Press 'q' to quit if cv2.waitKey(10) & 0xFF == ord('q'): break cap.release() cv2.destroyAllWindows() Optimizations and advanced tips
- Resolution and frame skipping: Downscaling frames or processing every Nth frame (e.g., every third frame) increases FPS. For instance, you can skip frames by incrementing a counter and continuing the loop when
frame_idx % 3 != 0. - Alternative models: Jetson Nano supports models like ResNet‑18, MobileNet‑v2, or Tiny YOLO. Swapping
"googlenet"for another model name inimageNet()loads a different classifier. YOLOv5/YOLOv8 detectors require more compute, so expect lower FPS. - GStreamer pipelines: If
cv2.VideoCapturefails to open your video, use a GStreamer pipeline tailored to your source. For example, to read an MP4 file on Jetson you can use:filesrc location=video.mp4 ! qtdemux ! h264parse ! avdec_h264 ! videoconvert ! appsink. For USB cameras, usev4l2src device=/dev/video0 ! videoconvert ! video/x-raw,format=BGR ! appsinkand open it withcv2.CAP_GSTREAMER. - Recording output: To save the annotated video, create a
cv2.VideoWriterwith matching FPS and resolution and callwrite()on each frame. - Confidence threshold: Adjust the threshold between 0.4–0.6 to balance sensitivity and false positives. Display confidence alongside the label using
f"{confidence:.2f}". - Running headless: When running the script without a display (e.g. over SSH), remove calls to
cv2.imshow()and instead stream frames to a file or to an MJPEG server. - Thermal management: For sustained high FPS, add a heatsink and fan. Set maximum performance mode with
sudo nvpmodel -m 0and ensure good airflow. - Avoid out‑of‑memory errors: Use smaller input sizes (e.g. 224×224), close other applications, and choose lightweight models when memory is limited.
FAQ :
Q: What is Jetson Nano video classification with OpenCV and Python?
A: It is a workflow where you read an MP4 video frame-by-frame with OpenCV, run GPU-accelerated classification on each frame using Jetson Inference, and overlay the predicted label on the output video.
Q: Why use Jetson Inference instead of a normal PyTorch/TensorFlow model?
A: Jetson Inference is optimized for Jetson devices and makes it easy to run TensorRT-accelerated inference with simple Python bindings.
Q: What is the difference between classifying a video and detecting objects in a video?
A: Classification outputs one label for the whole frame, while detection outputs bounding boxes + labels for multiple objects. Classification is usually faster on Jetson Nano.
Q: Why do we convert frames from BGR to RGBA?
A: OpenCV reads frames in BGR, but Jetson’s CUDA pipeline expects RGBA for efficient GPU processing.
Q: My MP4 file doesn’t open with cv2.VideoCapture. What should I do?
A: On Jetson, MP4 decoding can be more reliable through a GStreamer pipeline. Use a filesrc → demux → decode → videoconvert → appsink pipeline.
Q: How can I increase FPS on Jetson Nano?
A: Reduce resolution (for example 640×360), process every Nth frame (frame skipping), and keep the confidence threshold reasonable so overlays aren’t too heavy.

Conclusion
In this guide, you learned how to build a real-time video classifier on the Jetson Nano using OpenCV and Python. We explored why the Jetson Nano is ideal for edge AI applications, thanks to its powerful yet compact hardware and rich software ecosystem. The step-by-step instructions covered everything from installing dependencies and setting up the environment to processing video frames and optimizing performance. We also discussed common troubleshooting tips and best practices for improving frame rates, along with SEO recommendations to help your article gain visibility. By following these techniques, you can confidently develop your own computer-vision projects on the Jetson Nano and adapt them to a variety of real-world use cases.
If you’re comparing classic CV vs. deep learning, see this ResNet50 classification walkthrough: Alien vs Predator Image Classification with ResNet50
You can find the full code here : https://ko-fi.com/s/7a72f61abe
Connect :
☕ Buy me a coffee — https://ko-fi.com/eranfeit
🖥️ Email : feitgemel@gmail.com
🤝 Fiverr : https://www.fiverr.com/s/mB3Pbb
Enjoy,
Eran
