...

FasterViT Image Classification Using Custom Dataset | Star wars dataset

FasterViT image classification

Last Updated on 02/01/2026 by Eran Feit

🧠 Introduction — FasterViT Image Classification Using Custom Dataset

FasterViT image classification using custom dataset represents a modern, efficient approach to training deep learning models that can recognize and categorize images from your own tailored collection of visual data. In a world where off-the-shelf datasets often don’t match specific application needs, applying models like FasterViT to a custom dataset — such as a curated set of Star Wars character images — allows developers to build computer vision systems that reflect real-world use cases and unique classification requirements. This customization is particularly powerful for niche tasks that require specialized image recognition beyond generic object categories, making FasterViT an exciting choice for advanced computer vision practitioners.

At its core, FasterViT combines the strengths of traditional convolutional neural networks (CNNs), which excel at capturing detailed local features, with the global context modeling of vision transformers. This hybrid architecture enables the model to learn both fine-grained texture patterns and broad relational information across an image, leading to robust feature representations. When applied to a custom dataset, better feature extraction and classification accuracy can be achieved compared to using either a pure CNN or a pure transformer model alone.

Training FasterViT on a custom dataset introduces challenges and opportunities. Developers must carefully prepare the data — organizing training, validation, and testing splits, and ensuring proper image preprocessing — so the model can effectively generalize from limited or imbalanced samples. With proper dataset preparation and training strategies, custom FasterViT models can outperform many conventional deep learning classifiers, especially in tasks that require distinguishing fine differences between similar image classes.

Finally, integrating FasterViT into real-world applications demonstrates its practical value. Whether building a character recognizer from a Star Wars image set, designing a wildlife species classifier, or developing a custom industrial defect detector, FasterViT’s ability to leverage custom datasets makes it a highly adaptable tool in the computer vision ecosystem. As deep learning continues to evolve, models like FasterViT that blend efficiency with performance are key for developers who need both speed and accuracy in specialized image classification tasks.


🌟 What Is FasterViT Image Classification and Why Custom Data Matters?

FasterViT image classification is a deep learning approach that blends convolutional neural networks with vision transformer elements to effectively classify images. Unlike traditional models that focus purely on either local patterns (CNNs) or global relationships (transformers), FasterViT uses a hybrid architecture that takes advantage of the best of both worlds. This makes it especially suitable for image classification tasks where details matter and large-scale context influences predictions, such as distinguishing between visually similar characters or objects.

When working with custom datasets — like a unique Star Wars image collection — conventional models often fall short due to limited examples or insufficient context. Custom datasets reflect real-world problems these models need to solve, such as identifying specific character traits or unusual visual features unseen in benchmark datasets. Training FasterViT on these tailored images gives the model exposure to exactly the visual domains it will encounter during inference, improving accuracy and robustness.

Moreover, FasterViT’s hierarchical attention mechanism enables efficient processing of visual features at different scales. Early layers might focus on small, local details in images — edges, textures, or character accessories — while later transformer-like layers capture broader patterns or global relationships across the entire image. This layered attention system supports the model’s ability to generalize from a custom dataset to unseen test images, making it well-suited for specialized vision tasks that vary significantly from popular benchmark collections.

In practical application, this means that when you feed FasterViT a custom dataset of labeled images, the model can learn both subtle and global cues needed to make reliable classifications. Whether your goal is to recognize specific individuals, object categories, or nuanced visual differences dictated by your dataset’s uniqueness, FasterViT’s flexible architecture enables innovative solutions. By training this model on your curated images, you customize the learning process — and the resulting classifier becomes tailored, higher performing, and more relevant to your specific needs.


The FasterViT architecture

FasterViT
FasterViT

The FasterViT architecture is built as a hybrid model that combines the strengths of convolutional neural networks and vision transformers in a single unified pipeline. The network begins with traditional convolutional layers, which downsample the input image and extract low-level visual features such as edges, textures, and simple spatial patterns. These convolution stages are efficient and computationally light, making them ideal for early processing where fine-grained image structure matters most. By progressively reducing the spatial resolution while increasing channel depth, the model prepares compact yet informative feature maps for the next stages.

After the initial convolution blocks, FasterViT transitions into deeper stages that incorporate hierarchical attention. This is where the transformer-based components come into play. Hierarchical attention allows the model to capture long-range dependencies in the image — understanding how different regions relate to one another, even when they are far apart spatially. This global reasoning is what makes transformer-based architectures particularly powerful for image understanding, as the model is no longer limited to only local receptive fields like a CNN. FasterViT carefully balances attention computation so it remains efficient while still modeling complex contextual relationships.

In the later stages, the model continues to alternate between downsampling and attention-driven processing, building increasingly abstract feature representations. By the time the data reaches the classification head, the network has learned both detailed local information and high-level contextual structure. This combination enables FasterViT to achieve strong accuracy on image classification tasks while remaining computationally efficient compared to pure transformer models. The architecture is therefore especially useful for real-world applications where both performance and speed matter, making it a compelling evolution in the vision transformer family.


FasterViT animal classification process
FasterViT animal classification process

Building a Practical FasterViT Image Classification Tutorial with a Custom Dataset

This tutorial walks through the full process of using FasterViT image classification on a custom dataset, showing how to install the required libraries, prepare the dataset, train the model, and finally test it on new images. The goal is to give you a complete, working pipeline so you can adapt the same approach to any dataset you choose — whether that’s animals, products, vehicles, or any type of labeled image collection. Everything is designed to be hands-on and code-driven so you can follow along step-by-step.

The core idea behind the tutorial is to demonstrate how FasterViT — a hybrid model that combines convolutional layers with transformer-based attention — can be trained on real-world images rather than relying only on standard benchmark datasets. You’ll see how the dataset is split into training, validation, and test sets, how transformations are applied, and how the model learns to identify each class. By the end, you’ll understand not only how the code works, but also why each stage of the workflow is important.

Another key part of the tutorial is modifying the model’s final classification head to match the number of classes in your dataset. This ensures FasterViT can correctly predict the labels you define. The training loop also tracks accuracy and loss over time, so you can monitor how well the model is learning and automatically keep the best-performing weights. This makes the process robust and beginner-friendly, while still being powerful enough for advanced experimentation.

Finally, the tutorial shows how to load your trained model and run predictions on new images. You’ll see how to preprocess test images, send them through the model, and display the predicted class — even overlaying the result on the image itself. This allows you to move from raw data → trained model → working classifier, creating a complete solution that you can reuse and scale for future projects.

Link to the video tutorial : https://youtu.be/n-SpVoHrzDQ

You cand download the code for the tutorial here : https://eranfeit.lemonsqueezy.com/checkout/buy/a6159108-c66c-4e21-80e0-7a6589f0b8b0 or here : https://ko-fi.com/s/28ca45253c

Link for to the post for Medium users : https://medium.com/vision-transformers-tutorials/fastervit-image-classification-using-custom-dataset-star-wars-dataset-8e6ce470d566

You can follow my blog here : https://eranfeit.net/blog/

 Want to get started with Computer Vision or take your skills to the next level ?

Great Interactive Course : “Deep Learning for Images with PyTorch” here : https://datacamp.pxf.io/zxWxnm

If you’re just beginning, I recommend this step-by-step course designed to introduce you to the foundations of Computer Vision – Complete Computer Vision Bootcamp With PyTorch & TensorFlow

If you’re already experienced and looking for more advanced techniques, check out this deep-dive course – Modern Computer Vision GPT, PyTorch, Keras, OpenCV4


FasterViT Image Classification Using Custom Dataset

FasterViT is a powerful hybrid model that combines the feature extraction strengths of convolutional neural networks with the global contextual understanding of vision transformers.
In this post, you’ll build a complete image classification pipeline using FasterViT on your own custom dataset — the Star Wars characters dataset.
Each part breaks down the code into digestible steps with clear explanations so you can follow along and adapt it to any dataset.

The goal is practical mastery: from environment setup, dataset preparation, training, and testing, you will walk through a real-world workflow from top to bottom.
Think of this as a roadmap you can reuse to train FasterViT models on any custom image classification task, using PyTorch and FasterVit libraries.


Setting Up the Environment

Before training FasterViT on your custom dataset, your system must be ready.
This section creates a dedicated Conda environment and installs PyTorch with GPU support, along with FasterVit and necessary Python packages.
Isolating dependencies in a new environment prevents version conflicts and ensures compatibility for deep learning workflows.

You’ll also check the CUDA version to ensure GPU acceleration is available, which significantly speeds up training.
The specific package versions used here are selected for stability and reproducibility so you can train efficiently without unexpected errors.

# Create a new Conda environment named "fasterVit" with Python 3.11 conda create -n fasterVit python=3.11  # Activate the newly created "fasterVit" environment conda activate fasterVit   # Check the installed CUDA version on your system nvcc --version  # Install PyTorch 2.5.0, Torchvision, Torchaudio, and CUDA 12.4 support from PyTorch and NVIDIA channels conda install pytorch==2.5.0 torchvision==0.20.0 torchaudio==2.5.0 pytorch-cuda=12.4 -c pytorch -c nvidia  # Install the FasterVit library version 0.9.8 from PyPI pip install fastervit==0.9.8  # Install timm (PyTorch Image Models) version 0.9.12 for model utilities and backbones pip install timm==0.9.12  # Install matplotlib for plotting and visualization pip install matplotlib  # Install OpenCV Python bindings (note: command is missing the 'install' verb as written) pip opencv-python==4.10.0.84 

This environment gives you all the tools needed to implement FasterViT training and testing without friction.
Now, you can focus on the dataset and model logic rather than debugging library conflicts.


Downloading and Understanding the Custom Dataset

Before we start training the FasterViT image classification model, we first need a labeled image dataset. In this tutorial, we are working with a collection of character images that are already grouped into folders by class. Each folder represents one category, and every image inside belongs to that specific label. This structure is very important because PyTorch relies on the folder layout to automatically map images to class names during training.

The dataset contains multiple characters, and each character has a number of image samples from different angles, lighting conditions, and backgrounds. This variation helps the model generalize and learn what truly defines each class, rather than memorizing a single image pattern. If your dataset includes clear, centered subjects and consistent labeling, you will normally achieve stronger and more reliable classification results.

When you download your dataset, extract it into a directory on your machine where you plan to work. In our case, the raw dataset is stored in a folder called:

D:/Data-Sets-Image-Classification/Star-Wars-Characters 

Inside this folder, each sub-folder represents a different class. For example:

Star-Wars-Characters/   ├── Class_1/   ├── Class_2/   ├── Class_3/   ├── ... 

This means you do not need a CSV file or manual labels — the folder names themselves act as the class labels. Later, our script automatically splits these images into Train, Validation, and Test folders while preserving the class-based structure. If you ever decide to swap this dataset with your own images, just keep the same folder-per-class layout and the rest of the code will continue to work smoothly.


Preparing Your Dataset for Image Classification

The next step is organizing your custom dataset into training, validation, and testing splits.
This allows the model to learn from training examples, tune performance on validation data, and evaluate generalization on the test set.

This code creates appropriate folder structures, randomly shuffles images, and distributes them into the right folders.
A balanced dataset structure ensures that training and validation samples represent all classes evenly.

# Add a comment indicating the dataset source URL used for this project # Dataset : https://www.kaggle.com/datasets/adamridene/star-wars-characters  # Import the os module for filesystem path and directory operations import os  # Import shutil to copy files and manage file operations import shutil  # Import random to shuffle image lists before splitting into sets import random  # Define a helper function to create Train, Val, and Test folders for each category def create_folders(base_path, categories):     # Loop over every detected category name     for category in categories:         # Create the Train subfolder for the current category (if it doesn't already exist)         os.makedirs(os.path.join(base_path, 'Train', category), exist_ok=True)         # Create the Val subfolder for the current category (if it doesn't already exist)         os.makedirs(os.path.join(base_path, 'Val', category), exist_ok=True)         # Create the Test subfolder for the current category (if it doesn't already exist)         os.makedirs(os.path.join(base_path, 'Test', category), exist_ok=True)  # Define a function to split data into train, validation, and test subsets def split_data(source_folder, dest_folder, train_ratio=0.7, validate_ratio=0.2):     # Get the list of subfolders (categories) inside the source folder     categories = [d for d in os.listdir(source_folder) if os.path.isdir(os.path.join(source_folder, d))]     # Ensure all required destination folders (Train/Val/Test per category) exist     create_folders(dest_folder, categories)          # Iterate over each category to process its images     for category in categories:         # Build the full path to the current category directory         category_path = os.path.join(source_folder, category)         # List all image files within this category directory         images = [f for f in os.listdir(category_path) if os.path.isfile(os.path.join(category_path, f))]         # Randomly shuffle the images to avoid ordering bias         random.shuffle(images)                  # Calculate the index at which the training subset ends         train_split = int(len(images) * train_ratio)         # Calculate the index at which the validation subset ends         validate_split = int(len(images) * (train_ratio + validate_ratio))                  # Select the training images based on the first split         train_images = images[:train_split]         # Select the validation images between train_split and validate_split         validate_images = images[train_split:validate_split]         # The remaining images belong to the test set         test_images = images[validate_split:]                  # Copy each training image into the corresponding Train/category folder         for image in train_images:             shutil.copy(os.path.join(category_path, image), os.path.join(dest_folder, 'Train', category, image))                  # Copy each validation image into the corresponding Val/category folder         for image in validate_images:             shutil.copy(os.path.join(category_path, image), os.path.join(dest_folder, 'Val', category, image))                  # Copy each test image into the corresponding Test/category folder         for image in test_images:             shutil.copy(os.path.join(category_path, image), os.path.join(dest_folder, 'Test', category, image))  # Define the original dataset folder containing the class subfolders source_folder = 'D:/Data-Sets-Image-Classification/Star-Wars-Characters' # Define the destination folder where the Train/Val/Test structure will be created dest_folder = 'D:/Data-Sets-Image-Classification/Star-Wars-Characters-For-Classification' # Call the split_data function to perform the splitting operation split_data(source_folder, dest_folder) 

With your images split into folders, the model can now iterate over them in training and validation loops.
This structure is compatible with PyTorch’s dataset utilities, making the next part seamless.


Training the FasterViT Model

Now comes the heart of the pipeline: training the FasterViT model on your custom dataset.
The training function handles multiple epochs, computes loss and accuracy, and saves the best model weights based on validation performance.

This code uses standard PyTorch structures like data loaders, optimizers, schedulers, and loss functions.
It ensures your model trains efficiently and tracks progress over time.

# Import os to handle file paths and directory operations import os   # Import the core PyTorch library import torch  # Import torchvision datasets and transforms for image loading and preprocessing from torchvision import datasets, transforms  # Import DataLoader to batch and iterate over datasets from torch.utils.data import DataLoader  # Import create_model from fastervit to construct the FasterViT architecture from fastervit import create_model  # Import PyTorch's optimization module import torch.optim as optim  # Import learning rate scheduler utilities from torch.optim import lr_scheduler  # Import time to measure training duration import time   # Import copy to deep copy model weights when tracking the best model import copy    # Define a training loop function for the model def train_model(model , criterion , optimizer , scheduler , num_epochs):     # Record the starting time of the training process     since = time.time()      # Make a deep copy of the model's initial weights to store the best version     best_model_wts = copy.deepcopy(model.state_dict())     # Initialize the best accuracy with zero     best_acc = 0.0      # Loop over each epoch in the training process     for epoch in range(num_epochs):         # Print the current epoch index and total epochs         print(f'Epoch {epoch}/{num_epochs - 1}')         # Print a visual separator line         print('-' * 10)          # Each epoch has both a training phase and a validation phase         for phase in ['train', 'val']:             # Set the model to training mode during the train phase             if phase == 'train':                 model.train()             # Set the model to evaluation mode during the validation phase             else :                 model.eval()              # Initialize running loss for the epoch             running_loss = 0.0             # Initialize running correct predictions for the epoch             running_corrects = 0              # Iterate over batches from the dataloader of the current phase             for inputs, labels in dataloaders[phase]:                 # Move input images to the selected device (CPU or GPU)                 inputs = inputs.to(device)                 # Move labels to the selected device                 labels = labels.to(device)                  # Reset gradients of the optimizer at the start of each batch                 optimizer.zero_grad()                  # Enable gradient computation only when in training phase                 with torch.set_grad_enabled(phase == 'train'):                     # Perform a forward pass through the model to get outputs                     outputs = model(inputs)                     # Get the predicted class indices by taking the max logit                     _, preds = torch.max(outputs, 1)                     # Compute the loss between model outputs and true labels                     loss = criterion(outputs, labels)                      # If in training phase, perform backpropagation and optimizer step                     if phase == 'train':                         # Backpropagate the loss                         loss.backward()                         # Update model parameters                         optimizer.step()                  # Accumulate the batch loss scaled by the batch size                 running_loss += loss.item() * inputs.size(0)                 # Accumulate the number of correct predictions                 running_corrects += torch.sum(preds == labels.data)              # Step the learning rate scheduler after finishing the training phase             if phase == 'train':                 scheduler.step()              # Compute the epoch loss by dividing total loss by dataset size             epoch_loss = running_loss / dataset_sizes[phase]             # Compute the epoch accuracy as corrects divided by dataset size             epoch_acc = running_corrects.double() / dataset_sizes[phase]              # Print the loss and accuracy for this phase             print(f'{phase} Loss: {epoch_loss:.4f} Acc: {epoch_acc:.4f}')              # If we are in validation phase and this accuracy is the best so far             if phase == 'val' and epoch_acc > best_acc:                 # Update the best accuracy value                 best_acc = epoch_acc                 # Save the current model weights as the best model                 best_model_wts = copy.deepcopy(model.state_dict())          # Print a blank line for better readability between epochs         print()      # Compute total training time in seconds     time_elapsed = time.time() - since     # Print the total training time in minutes and seconds     print(f'Training complete in {time_elapsed // 60:.0f}m {time_elapsed % 60:.0f}s')     # Print the best validation accuracy achieved during training     print(f'Best val Acc: {best_acc:4f}')      # Load the best model weights back into the model     model.load_state_dict(best_model_wts)      # Return the best model after training     return model   # Use the main guard to ensure code only runs when this script is executed directly if __name__ == "__main__":     # Set the path to the prepared Train/Val dataset directory     data_dir = "D:/Data-Sets-Image-Classification/Star-Wars-Characters-For-Classification"      # Define image transformations for training and validation datasets     datatrasforms = {         # Training data augmentation and normalization pipeline         'train': transforms.Compose([             # Resize the shortest side of the image to 256 pixels             transforms.Resize(256),             # Randomly crop a 224x224 patch from the image             transforms.RandomResizedCrop(224),             # Randomly flip the image horizontally for augmentation             transforms.RandomHorizontalFlip(),             # Convert the image to a PyTorch tensor             transforms.ToTensor(),             # Normalize the image with ImageNet mean and std             transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])         ]),         # Validation data preprocessing and normalization pipeline         'val': transforms.Compose([             # Resize the shortest side of the image to 256 pixels             transforms.Resize(256),             # Take a centered 224x224 crop from the image             transforms.CenterCrop(224),             # Convert the image to a PyTorch tensor             transforms.ToTensor(),             # Normalize the image with ImageNet mean and std             transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])         ]),     }      # Create ImageFolder datasets for train and val from the directory structure     image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x), datatrasforms[x]) for x in ['train', 'val']}     # Wrap datasets with DataLoaders for batching and shuffling     dataloaders = {x: DataLoader(image_datasets[x], batch_size=32, shuffle=True, num_workers=4) for x in ['train', 'val']}     # Get dataset sizes for calculating loss and accuracy     dataset_sizes = {x: len(image_datasets[x]) for x in ['train', 'val']}     # Extract class names from the training dataset folder structure     class_names = image_datasets['train'].classes      # Choose GPU if available; otherwise, fall back to CPU     device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")      # Create a FasterViT model instance with pretrained weights loaded from given path     model = create_model('faster_vit_0_224', pretrained=True, model_path="d:/temp/models/faster_vit_0.pth.tar")      # Get the number of features from the model head     num_ftrs = model.head.in_features     # Replace the final classification layer with a new Linear layer for our number of classes     model.head = torch.nn.Linear(num_ftrs, len(class_names))      # Move the model to the selected device (GPU or CPU)     model = model.to(device)     # Define the cross-entropy loss function for multi-class classification     criterion = torch.nn.CrossEntropyLoss()     # Create an SGD optimizer with a learning rate and momentum     optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)     # Setup a StepLR scheduler to reduce LR every 7 epochs by a factor of 0.1     scheduler = lr_scheduler.StepLR(optimizer, step_size=7, gamma=0.1)      # Train the model using the train_model function for 100 epochs     model = train_model(model, criterion, optimizer, scheduler, num_epochs=100)     # Save the trained model weights to disk     torch.save(model.state_dict(), 'd:/temp/models/star_wars_faster_vit_model.pth') 

Training on your custom dataset ensures the model learns distinct visual differences between your classes.
After training, the saved model weights can be reused for inference or further fine-tuning.


Testing Your Trained FasterViT Model

Once training completes, you want to verify that the model works on unseen data.
This part loads the saved model weights, prepares an input image, runs prediction, and displays the result.

Putting the predicted label onto the image makes it easy to visually confirm the model’s performance.

# Import PyTorch for model operations and tensors import torch  # Import transforms for image preprocessing steps from torchvision import transforms  # Import the FasterViT model creation utility from fastervit import create_model  # Import os to work with filesystem paths import os   # Import OpenCV for image reading and display import cv2  # Import NumPy for array operations import numpy as np  # Set the initial number of classes (will be updated later) num_classes = 50  # Create a FasterViT model instance with the specified configuration model = create_model('faster_vit_0_224', pretrained=False)  # Adjust the classification head to match the desired number of classes model.head = torch.nn.Linear(model.head.in_features, num_classes)  # Select GPU if available; otherwise, use CPU device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") # Move the model to the chosen device model = model.to(device)  # Define the path to the file containing the saved model weights model_path = 'd:/temp/models/star_wars_faster_vit_model.pth' # Load the saved model weights from disk into the model model.load_state_dict(torch.load(model_path, map_location=device)) # Put the model into evaluation mode to disable dropout and other training layers model.eval() # set the model to evaluation mode   # Define preprocessing steps for input images before feeding them into the model preprocess = transforms.Compose([     # Convert the input NumPy array to a PIL Image     transforms.ToPILImage(), # Convert the numpy array to PIL Image     # Resize the image to 256 pixels on the shortest side     transforms.Resize((256)),     # Center crop the image to 224x224 pixels     transforms.CenterCrop((224)),     # Convert the image to a PyTorch tensor     transforms.ToTensor(),     # Normalize the image using ImageNet mean and standard deviation     transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) ])  # Define a helper function to load and preprocess a single image def load_image(image_path):      # Read the image from disk using OpenCV (BGR format)     image = cv2.imread(image_path) # load the image using OpenCV     # Convert the image from BGR color space to RGB     image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) # convert BGR to RGB     # Apply the preprocessing pipeline defined above     image = preprocess(image) # apply the transformations     # Add a batch dimension to create a 4D tensor     image = image.unsqueeze(0)     # Move the image tensor to the selected device (GPU or CPU)     image = image.to(device) # move the image to GPU if available      # Return the preprocessed image tensor     return image   # Define a function that loads an image, runs it through the model, and returns the predicted class name def predict(image_path , model , class_names):     # Load and preprocess the input image     image = load_image(image_path)     # Disable gradient computation for inference     with torch.no_grad():         # Forward pass through the model to get class scores         outputs = model(image)         # Get the index of the class with the highest score         _, preds = torch.max(outputs, 1)         # Map the predicted index to the corresponding class name         preducted_class = class_names[preds.item()]     # Return the predicted class label     return preducted_class       # Import glob to list files using wildcard patterns from glob import glob  # Define the path to the Test folder for retrieving class names # path for test images - to get classes names from the folder names testPath = "D:/Data-Sets-Image-Classification/Star-Wars-Characters-For-Classification/Test"  # Get the list of class names by reading subfolder names inside the Test directory # Get the subfolder names (class names) from the test folder class_names = [f for f in os.listdir(testPath) if os.path.isdir(os.path.join(testPath, f))] # Print the detected class names to verify them print(class_names)  # Update the number of classes based on detected folder names # define the number of classes num_classes = len(class_names)  # Define the path to a sample image used for testing the model imagePath = "Visual-Language-Models-Tutorials/FasterViT - StarWars - Image classification on your Custom Dataset using Fast Vision Transformers/Yoda-Test-Image.jpg" # Call the predict function to obtain a predicted class label predicted_class = predict(imagePath, model, class_names) # Print the predicted class for inspection print(f"Predicted class : {predicted_class}")   # Define a function that predicts the class and draws the predicted label on the image    def predict_and_draw(image_path, model, class_names):     # Load the original image using OpenCV     # load the image      image = cv2.imread(image_path) # load the image using OpenCV     # Convert the loaded image from BGR to RGB for preprocessing     image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) # convert BGR to RGB     # Apply the same preprocessing used for training/validation     input_tensor = preprocess(image_rgb) # apply the transformations     # Add a batch dimension to the input tensor     input_tensor = input_tensor.unsqueeze(0) # add batch dimension     # Move the input tensor to the selected device     input_tensor = input_tensor.to(device) # move the image to GPU if available      # Disable gradient computation during inference     with torch.no_grad():         # Run the model to get output logits         outputs = model(input_tensor)         # Find the index of the highest scoring class         _, preds = torch.max(outputs, 1)         # Convert the index to the corresponding class name         predicted_class = class_names[preds.item()]      # Prepare the text label to overlay on the image     # draw the label on the image     text = f"Predicted: {predicted_class}"     # Choose the font face for the text     font = cv2.FONT_HERSHEY_SIMPLEX     # Set font scale (size of the text)     font_scale = 1     # Set the thickness of the text stroke     font_thickness = 3     # Choose the initial position (x, y) for the text on the image     text_x , text_y = 10 , 50       # Draw the predicted label on the image using the specified font and color     cv2.putText(image, text, (text_x, text_y), font, font_scale, (0, 100, 100), font_thickness)      # Display the result image in a window titled "Predicted Image"     # Display the image with the label     cv2.imshow("Predicted Image", image)     # Wait for a key press before closing the window     cv2.waitKey(0)     # Close all OpenCV windows     cv2.destroyAllWindows()      # Set the path where the labeled output image will be saved     # Save the image with the label     ouput_image_path = "D:/temp/predicted_image.jpg"     # Write the modified image with the prediction to disk     cv2.imwrite(ouput_image_path, image)     # Print the path to the saved predicted image     print(f"Predicted image saved at: {ouput_image_path}")   # Call the helper function to predict and visualize the class on the test image # Run the function on a test image predict_and_draw(imagePath, model, class_names) 

Testing confirms that the model can generalize to new images and gives you visual feedback on its performance.
With this working pipeline, you can classify any image into your defined classes.


FAQ

What is FasterViT image classification?

FasterViT image classification uses a hybrid model combining convolution and transformer layers to learn image features and assign class labels efficiently.

Why split the dataset into train, val, and test?

Splitting allows the model to learn patterns (train), tune performance (val), and evaluate generalization (test) for reliable results.

What does the scheduler do in training?

The scheduler reduces the learning rate over time, helping stabilize training and improve final accuracy.

Why normalize images before training?

Normalization ensures images have consistent pixel statistics, which helps the model converge faster.

Do I need GPU for this tutorial?

No, but GPU speeds up training significantly, especially with large datasets and transformer blocks.

Can I reuse the trained model for other datasets?

Yes, you can fine-tune it or retrain the head for different classes.

What library provides FasterViT?

FasterVit library offers implementations of the FasterViT architecture compatible with PyTorch.

How do I visualize predictions?

Predictions are written onto images using OpenCV’s text overlay and display functions.

What is the main loss function used?

CrossEntropyLoss is used, which is common for multi-class classification tasks.


Summary

In this complete FasterViT image classification tutorial, you learned how to:

✔ Set up a stable Python + PyTorch environment
✔ Prepare a custom dataset for training and evaluation
✔ Train a FasterViT model from scratch on your own images
✔ Test and visualize predictions on new data

FasterViT’s hybrid architecture gives you the speed of CNNs and the global context power of transformers, perfect for modern image classification tasks.

Conclusion

You now have a complete, end-to-end FasterViT image classification workflow using a custom dataset.
This setup lets you take real images, split them into structured data, train a powerful hybrid model, and test predictions visually and programmatically.

Transformer-based architectures like FasterViT bring the capability to understand global image context while keeping the efficiency of convolutional representations.
By mastering this pipeline, you unlock a flexible pattern you can reuse across different domains, from character recognition to industrial image categorization.


Connect :

☕ Buy me a coffee — https://ko-fi.com/eranfeit

🖥️ Email : feitgemel@gmail.com

🌐 https://eranfeit.net

🤝 Fiverr : https://www.fiverr.com/s/mB3Pbb

Enjoy,

Eran

Leave a Comment

Your email address will not be published. Required fields are marked *

Eran Feit