Real-Time Image Classification with Jetson Nano and OpenCV — Complete Guide

Last Updated on 23/10/2025 by Eran Feit

Introduction to real-time image classification with Jetson Nano and Python

In this tutorial, we will build a real-time image classification using Jetson Nano Inference and OpenCV in Python.
The code connects to a webcam, captures live video, classifies each frame using a deep learning model (GoogLeNet), and overlays the recognized object name directly onto the video feed.

This post answers the title by showing you how to go from raw camera input to intelligent predictions in real time.

This guide is focused on image classification with Jetson Nano and OpenCV.
By following along, you will learn the basics of combining OpenCV with NVIDIA Jetson’s jetson.inference library, which is designed for high-performance AI tasks on edge devices.

We’ll break the code into three easy-to-understand sections:

Setting up the webcam and loading the Jetson neural network model.
Processing frames and converting them for GPU classification.
Running classification, overlaying predictions, and displaying results live.

If you’re curious about deep learning models in practice, explore my tutorial on Vehicle classification with Vision Transformers.

You can find more computer vision tutorials in my blog posts page here : https://eranfeit.net/blog/

👉 You can find the code here : https://ko-fi.com/s/7a72f61abe

The link for the video : https://youtu.be/S3i7yhhw11E

You can find more Nvidia Jetson Nano tutorials here : https://eranfeit.net/how-to-classify-objects-using-jetson-nano-inference-and-opencv/

Frequently Asked Questions (Markdown Format)

FAQ

Q: What hardware is this code designed for?
A: It’s designed for NVIDIA Jetson devices, optimized for real-time AI tasks.

Q: Can I run this code on a regular PC?
A: No, the jetson.inference library is specific to Jetson devices.

Q: What model is used here?
A: We use GoogLeNet, a pre-trained model from the ImageNet dataset.

Q: Why convert frames to RGBA?
A: Jetson requires RGBA format for CUDA GPU processing.

Q: What does cudaFromNumpy do?
A: It transfers the frame into CUDA memory for GPU-based classification.

Q: How do I adjust confidence threshold?
A: Change the 0.4 value in the if confidence > 0.4 condition.

Q: Can it detect multiple objects?
A: GoogLeNet classifies the dominant object in the frame, not multiple objects simultaneously.

Q: How do I quit the program?
A: Press the q key while the video window is active.

Q: Can I use another model instead of GoogLeNet?
A: Yes, you can replace "googlenet" with other Jetson-supported models like resnet-18.

Q: Does it work in real-time?
A: Yes, Jetson devices are optimized for high-performance real-time inference.

Setting up the webcam and loading the model

In this first part, we initialize the webcam, configure its resolution, and load a pre-trained GoogLeNet model from Jetson Inference.

### Import OpenCV for video capture and display import cv2    ### Import Jetson Inference and Jetson Utils for deep learning tasks import jetson.inference   import jetson.utils    ### Initialize webcam capture (device 0) cap = cv2.VideoCapture(0)    ### Set resolution of the webcam to 1280x720 cap.set(3,1280)   cap.set(4,720)    ### Load the GoogLeNet pre-trained model from Jetson Inference net = jetson.inference.imageNet("googlenet")

Summary of this part

Here we prepare the tools: OpenCV handles video input, while Jetson Inference loads a powerful pre-trained model.
The model is ready to classify images into thousands of categories from the ImageNet dataset.

Processing frames and converting them for GPU classification

In this second part, we continuously read frames from the webcam, convert them into a format that Jetson can use (CUDA memory), and prepare them for classification.

### Loop until the webcam is closed while cap.isOpened():        ### Capture the current frame from the webcam     re, img = cap.read()        ### Convert the frame from BGR (OpenCV) to RGBA (required by Jetson)     frame_rgba = cv2.cvtColor(img, cv2.COLOR_BGR2RGBA)        ### Convert the frame into CUDA format for GPU processing     cude_frame = jetson.utils.cudaFromNumpy(frame_rgba)

Summary of this part

This stage ensures the video frames are compatible with GPU acceleration.
By moving from BGR to RGBA and then into CUDA memory, we unlock real-time deep learning classification.

💡 Learn how I used another framework in my YOLOv8 object detection tutorial.

Running classification, overlaying predictions, and displaying results

In this last part, we run the classification on each frame, retrieve the predicted label, check the confidence score, and overlay the results on the live video feed.

    ### Run classification on the CUDA frame     class_id , confidence = net.Classify(cude_frame)        ### Get the description of the predicted class     class_desc = net.GetClassDesc(class_id)        ### Display prediction if confidence is above threshold     if confidence > 0.4 :           cv2.putText(img , class_desc , (30,80), cv2.FONT_HERSHEY_COMPLEX, 1, (255,0,0),3)        ### Show the live video with overlayed text     cv2.imshow('img',img)       cv2.moveWindow('img',0,0)        ### Break loop if 'q' key is pressed     if cv2.waitKey(10) & 0xFF == ord('q'):           break    ### Release the camera and close OpenCV windows cap.release()   cv2.destroyAllWindows()

Summary of this part

Now the system is complete: each frame is classified, the predicted label is drawn onto the video, and results are updated live.
You can instantly recognize objects from your webcam feed, thanks to the efficiency of Jetson hardware.

📊 For a real-world segmentation example, check out Image segmentation with UnetR.

What hardware is required to run this code?

This code is optimized for NVIDIA Jetson devices, which are built for AI and computer vision tasks.

Can I run it on a normal PC?

No, the jetson.inference library is specifically designed for Jetson hardware.

What model is used in this tutorial?

We use GoogLeNet, a pre-trained model trained on the ImageNet dataset.

Why do we convert frames to RGBA?

RGBA format is required by Jetson for CUDA-based GPU processing.

What is the purpose of cudaFromNumpy?

It transfers the frame into GPU memory, making it ready for fast classification.

How can I change the confidence threshold?

Modify the threshold value in the code, such as changing 0.4 to another number.

Does this code detect multiple objects?

No, GoogLeNet classifies the most dominant object in the frame.

How do I quit the program?

You can exit the program by pressing the ‘q’ key during execution.

Can I use a different model?

Yes, you can replace GoogLeNet with models like ResNet-18 supported by Jetson Inference.

Does it run in real time?

Yes, Jetson hardware is optimized for real-time classification tasks.

Conclusion

We have built a real-time image classification system using Jetson Inference and OpenCV.
The workflow connects the webcam, processes each frame on the GPU, runs deep learning classification, and overlays predictions live.

This project is a practical example of edge AI, showcasing how small, powerful devices like the NVIDIA Jetson can perform advanced computer vision tasks in real time.
With slight modifications, you can swap models, adjust thresholds, or integrate additional AI pipelines for more complex applications.

Live Image Classification on Jetson with OpenCV and GoogLeNet

This section contains a single, copy-paste-ready script.
It opens the default camera, converts frames to CUDA, runs GoogLeNet classification, and draws class names above the image when confidence is above 0.40.

# ### Import OpenCV for video capture and on-screen drawing.
import cv2
# ### Import the Jetson inference module that provides pretrained classification networks like imageNet.
import jetson.inference
# ### Import Jetson utilities for GPU-friendly image conversions (NumPy <-> CUDA).
import jetson.utils

# ### Open the default camera (index 0) for live capture.
cap = cv2.VideoCapture(0)
# ### Set the camera capture width to 1280 pixels for a 720p layout.
cap.set(3,1280)
# ### Set the camera capture height to 720 pixels.
cap.set(4,720)

# ### Load the pretrained GoogLeNet classifier via jetson.inference.imageNet.
net = jetson.inference.imageNet("googlenet")

# ### Process frames in a loop while the camera is available.
while cap.isOpened():

    # ### Read a frame from the camera as a NumPy array (BGR color order).
    re, img = cap.read()
    
    # ### Convert the BGR frame to RGBA for Jetson utilities and CUDA compatibility.
    frame_rgba = cv2.cvtColor(img, cv2.COLOR_BGR2RGBA)
    # ### Move the RGBA NumPy array into GPU memory as a CUDA image for fast inference.
    cuda_frame = jetson.utils.cudaFromNumpy(frame_rgba)

    # ### Run classification on the CUDA frame and obtain top class ID and confidence score.
    class_id, confidence = net.Classify(cuda_frame)

    # ### Translate the predicted class ID into a human-readable label (e.g., "coffee mug").
    class_desc = net.GetClassDesc(class_id)

    # ### If the prediction is confident enough (greater than 0.40), draw the label on the image.
    if confidence > 0.4:
        cv2.putText(img, class_desc, (30, 80), cv2.FONT_HERSHEY_COMPLEX, 1, (255, 0, 0), 3)

    # ### Show the current frame in a window titled 'img'.
    cv2.imshow('img', img)
    # ### Position the window at the top-left corner of the screen for convenience.
    cv2.moveWindow('img', 0, 0)

    # ### Exit the loop when the user presses the 'q' key.
    if cv2.waitKey(10) & 0xFF == ord('q'):
        break

# ### Release the camera resource gracefully.
cap.release()
# ### Close any OpenCV display windows that may be open.
cv2.destroyAllWindows()

# ### Import OpenCV for video capture and on-screen drawing. import cv2 # ### Import the Jetson inference module that provides pretrained classification networks like imageNet. import jetson.inference # ### Import Jetson utilities for GPU-friendly image conversions (NumPy <-> CUDA). import jetson.utils  # ### Open the default camera (index 0) for live capture. cap = cv2.VideoCapture(0) # ### Set the camera capture width to 1280 pixels for a 720p layout. cap.set(3,1280) # ### Set the camera capture height to 720 pixels. cap.set(4,720)  # ### Load the pretrained GoogLeNet classifier via jetson.inference.imageNet. net = jetson.inference.imageNet("googlenet")  # ### Process frames in a loop while the camera is available. while cap.isOpened():      # ### Read a frame from the camera as a NumPy array (BGR color order).     re, img = cap.read()          # ### Convert the BGR frame to RGBA for Jetson utilities and CUDA compatibility.     frame_rgba = cv2.cvtColor(img, cv2.COLOR_BGR2RGBA)     # ### Move the RGBA NumPy array into GPU memory as a CUDA image for fast inference.     cuda_frame = jetson.utils.cudaFromNumpy(frame_rgba)      # ### Run classification on the CUDA frame and obtain top class ID and confidence score.     class_id, confidence = net.Classify(cuda_frame)      # ### Translate the predicted class ID into a human-readable label (e.g., "coffee mug").     class_desc = net.GetClassDesc(class_id)      # ### If the prediction is confident enough (greater than 0.40), draw the label on the image.     if confidence > 0.4:         cv2.putText(img, class_desc, (30, 80), cv2.FONT_HERSHEY_COMPLEX, 1, (255, 0, 0), 3)      # ### Show the current frame in a window titled 'img'.     cv2.imshow('img', img)     # ### Position the window at the top-left corner of the screen for convenience.     cv2.moveWindow('img', 0, 0)      # ### Exit the loop when the user presses the 'q' key.     if cv2.waitKey(10) & 0xFF == ord('q'):         break  # ### Release the camera resource gracefully. cap.release() # ### Close any OpenCV display windows that may be open. cv2.destroyAllWindows()

You can find the code here : https://ko-fi.com/s/7a72f61abe

The script captures frames from the webcam, prepares an RGBA CUDA image, classifies it with GoogLeNet, and overlays a readable label when the confidence threshold is reached.
This minimal loop is an ideal jumping-off point for adding FPS counters, multi-label overlays, or custom thresholds and filters on Jetson devices.

Connect :

☕ Buy me a coffee — https://ko-fi.com/eranfeit

🖥️ Email : feitgemel@gmail.com

🌐 https://eranfeit.net

🤝 Fiverr : https://www.fiverr.com/s/mB3Pbb

Enjoy,

Eran