Build a Food Classification Pipeline with Python and TensorFlow
Introduction
Food classification is the process of teaching a model to recognize different food items from images.
This tutorial shows how to prepare image data so a model can classify food reliably.
You will organize labels, standardize image sizes, and serialize arrays for fast training.
By the end, you will have clean training and testing datasets ready for a food classification model to classify food at scale.
Check out our tutorial here : https://youtu.be/w5T86Z3lod0&list=UULFTiWJJhaH6BviSWKLJUM9sg
Link for the full code : https://ko-fi.com/s/bba2540d4f
You can find more tutorials in my blog : https://eranfeit.net/blog/
Code for Food Classification :
Link for the dataset : https://www.kaggle.com/kritikseth/fruit-and-vegetable-image-recognition
Project Setup and Labels
We set up the project environment, define food labels, and initialize containers for image data and targets.
You will import essential Python libraries that handle files, images, and arrays.
OpenCV reads and resizes images so your dataset is uniform for modeling.
NumPy stores the pixel data efficiently and allows vectorized operations.
Dedicated lists will hold images and labels before they are converted into NumPy arrays for training.
The labels list describes which food item appears in each image.
This list is created once and reused for both training and testing data.
Consistent labels guarantee that your food classification model learns the right mapping.
They also simplify evaluation when you classify food on unseen images.
We also print progress information to the console.
Progress messages help you debug early and confirm that files are discovered.
They also validate that class names are resolved correctly from directory paths.
Small checks like these save time when datasets are large.
### In this tutorial we will build a reproducible pipeline for food classification using Python. # In this tutorial we will make a fruit and vegtables classificaion model using TensorFlow and Keras ### Initial dataset discovery is part of the broader workflow to classify food images. # first , lets look for a dataset : # https://www.kaggle.com/kritikseth/fruit-and-vegetable-image-recognition ### Import os to work with files and directories during data collection for food classification. import os ### Import OpenCV to load and resize images before we classify food with a model. import cv2 ### Import NumPy to manage arrays that will store image pixels for food classification. import numpy as np ### Import save helper to write NumPy arrays to disk for later training runs. from numpy import save ### Define the canonical list of class names for food classification. # Array of all the classes class_names = ["banana", "apple", "pear", "grapes", "orange", "kiwi", "watermelon", "pomegranate", "pineapple", "mango", "cucumber", "carrot", "capsicum", "onion", "potato", "lemon", "tomato", "raddish", "beetroot", "cabbage", "lettuce", "spinach", "soy beans", "cauliflower", "bell pepper", "chilli pepper", "turnip", "corn", "sweetcorn", "sweetpotato", "paprika", "jalepeno", "ginger", "garlic", "peas", "eggplant"] ### Initialize a list to store all training images as arrays to support food classification. train_data_array = [] ### Initialize a list to store numeric labels that correspond to each training image. train_data_labels_array = [] ### Print a status message to confirm the training load process has started. print ("Loading the train data ")
Link for the full code : https://ko-fi.com/s/bba2540d4f
Load and Preprocess Training Images
We load images from the training folders, resize them to a fixed shape, and attach the correct label for food classification.
Consistent image size is crucial for food classification models.
Neural networks expect uniform dimensions to process batches efficiently.
Resizing standardizes the input and allows fair comparison across images.
It also reduces memory usage and speeds up training when you classify food.
You iterate through the training directory tree to discover images.
Each file path is combined with its folder to get a full location.
Images that fail to load are skipped to keep the dataset clean.
This prevents corrupt files from degrading the food classification pipeline.
Resizing to 28×28 creates compact inputs for rapid experimentation.
This size is useful for demonstrating the pipeline and debugging.
You can increase the size later to capture more detail when you classify food.
The key is to keep width and height consistent for every image.
Labels are derived from folder names that mirror the class list.
A stable mapping from name to index ensures reproducibility.
Appending both image arrays and their labels keeps data aligned.
This alignment is essential for supervised training in food classification tasks.
Finally, arrays are converted to NumPy ndarrays for efficient storage.
Shape prints confirm correct counts and dimensions.
You can quickly spot anomalies before moving to modeling.
Early validation increases confidence when you classify food at scale.
### Define the root directory that contains training images organized by food class. rootdir = "C:/Python-cannot-upload-to-GitHub/Fruit-and-Vegetable/train" ### Walk through the directory tree to discover class folders and image files. for subdir , dirs , files in os.walk(rootdir): ### Iterate over files found in the current class folder. for file in files: ### Read the image from disk using OpenCV so we can classify food later. frame = cv2.imread(os.path.join(subdir, file)) ### Skip invalid files to keep the training data for food classification clean. if frame is None: ### Log that this item is not a valid image file. print("not an image") else: ### Print the folder and file name for progress visibility. print(subdir,file) ### Many images have different sizes, so we standardize to a fixed 28x28 shape. ### Resizing ensures uniform inputs for food classification models. resized = cv2.resize(frame,(28,28), interpolation=cv2.INTER_AREA) ### Confirm that resizing succeeded by checking the height dimension. checkSize = resized.shape[0] #checking that the resize was done successfuly ### Append the resized image and its numeric label if the size is correct. if checkSize ==28 : ### Store the resized image in the training data list. train_data_array.append(resized) ### Derive the class index from the folder name for label alignment. index = class_names.index(os.path.basename(subdir)) ### Store the numeric label for supervised learning in food classification. train_data_labels_array.append(index) ### Convert the list of images into a NumPy array for efficient tensor operations. # converts the lists to numpy arrays train_data = np.array(train_data_array) ### Convert the list of numeric labels into a NumPy array for modeling. train_data_lables = np.array(train_data_labels_array) ### Print completion message to signal that the training data is ready. print ("Finished loading the train data") ### Print the number of training samples loaded for food classification. print ("Number of train records : ", train_data.shape[0] ) ### Print shapes to verify dimensions of the training images and labels. print(train_data.shape) print(train_data_lables.shape) ### The following demo code can visualize a couple of training images if needed. # while running , lets add more code : ### Display two sample images to sanity-check labels during food classification development. # lets see 2 examples of the images : ### Select a sample image by index for quick inspection. # demoImage = train_data[4] # lets look at image number 4 ### Show the image in a window. # cv2.imshow("Demo image", demoImage) ### Retrieve and print the human-readable class name for the sample. # index = train_data_lables[4] # print (class_names[index]) ### Repeat the process for a second sample image if desired. # #lets add another sample image # demoImage2 = train_data[5] # lets look at image number 4 # cv2.imshow("Demo image2", demoImage2) # index = train_data_lables[5] # print (class_names[index]) ### Wait for a key press before closing the display windows. # cv2.waitKey(0) ### Persist the training arrays to disk so you can reload them quickly when you classify food. # lets save the data to the disk in numpy binary format: save('c:/temp/train_data.npy', train_data) ### Save the label array alongside the images for training. save('c:/temp/train_data_labels.npy', train_data_lables)
Link for the full code : https://ko-fi.com/s/bba2540d4f
Prepare Test Set and Persist Datasets
We load the test set, create both small and large resized images, validate shapes, and save all arrays for future food classification experiments.
A separate test set helps you estimate how well your model will classify food on unseen images.
Preparing it with the same preprocessing ensures fair evaluation.
Creating a larger preview size is useful for visualization and reporting.
Saving everything to disk accelerates iteration across experiments.
You define the root directory of the test split and repeat the traversal.
Each image is resized to 28×28 for modeling.
A second 280×280 version is created for inspection and result presentation.
This dual size approach serves both modeling and communication needs.
Invalid images are again filtered out to maintain data quality.
Labels are mapped identically from folder names to numeric indices.
This keeps the test set aligned with the training set in your food classification pipeline.
Consistent labeling is critical when you classify food across splits.
After processing, arrays for images, large previews, and labels are created.
Printing shapes confirms counts and dimensions before saving.
Finally, arrays are written to disk using NumPy’s binary format.
This makes reloads instant and keeps preprocessing deterministic.
### Announce the start of test data loading to mirror the training process. #lets continue to the test data print("Start loading the test data ") ### Define the root directory containing test images by food class. rootdir = "C:/Python-cannot-upload-to-GitHub/Fruit-and-Vegetable/test" ### Prepare lists to hold resized test images and their labels. test_data_array = [] ### Prepare a list for numeric labels corresponding to test images. test_data_labels_array = [] ### Keep a separate list for larger 280x280 images for visualization. # lets build another arrray for the bigger images , so we can see it after building the Tensorflow model test_data_big_array = [] ### Walk through the test directory structure to find images for food classification evaluation. for subdir , dirs , files in os.walk(rootdir): ### Iterate through each file discovered in the current class directory. for file in files: ### Load the image from disk using OpenCV. frame = cv2.imread(os.path.join(subdir, file)) ### Skip any entries that are not valid images to keep the test set clean. # check the validity of the image if frame is None: ### Log invalid entries to the console. print("not an image") else: ### Print path and filename for progress tracking. print(subdir,file) ### Create a larger 280x280 version for visualization and reporting. # just to have the ability to see a "normal" size image resizedBig = resized = cv2.resize(frame,(280,280), interpolation=cv2.INTER_AREA) ### Create a compact 28x28 version for model input during food classification. resized = cv2.resize(frame,(28,28), interpolation=cv2.INTER_AREA) ### Verify that the resized image has the expected dimension. checkSize = resized.shape[0] #checking that the resize was done successfuly ### Append the processed image and label if size is correct. if checkSize ==28 : ### Store the 28x28 image for modeling. test_data_array.append(resized) ### Store the 280x280 image for visualization. test_data_big_array.append(resizedBig) ### Lookup the label index from the folder name to align with training classes. index = class_names.index(os.path.basename(subdir)) ### Append the label to the test label list. test_data_labels_array.append(index) ### Convert lists into NumPy arrays for efficient evaluation and storage. test_data = np.array(test_data_array) ### Convert the preview images into a NumPy array. test_data_big = np.array(test_data_big_array) ### Convert the test labels into a NumPy array for metrics calculation. test_data_labels = np.array(test_data_labels_array) ### Print a completion message to confirm that the test set is ready. print("Finished loading the test data ") ### Print how many test records are available for food classification evaluation. print ("Number of test records : ", test_data.shape[0] ) ### Print shapes to confirm array dimensions match expectations. print(test_data.shape) print(test_data_labels.shape) ### Save the processed test arrays to disk to allow instant reuse across runs. # save the numpy arrays as numpy binary to the disk save('c:/temp/test_data.npy', test_data) ### Save the large preview images for visualization workflows. save('c:/temp/test_data_big.npy', test_data_big) ### Save the aligned labels for evaluation consistency in food classification. save('c:/temp/test_data_labels.npy', test_data_labels)
Link for the full code : https://ko-fi.com/s/bba2540d4f
Data Loading and Normalization for Food Classification
This section loads previously saved arrays for training and testing to enable fast iteration in food classification workflows.
It confirms dataset shapes, aligns labels, and normalizes pixel ranges to stabilize optimization when you classify food.
It prepares inputs for TensorFlow by scaling values from 0–255 to 0–1, which helps gradients behave predictably.
It completes the foundation required to move directly into model building for food classification.
Saved arrays accelerate the workflow by removing repetitive disk scans and per-run decoding.
Using consistent class mappings ensures that each image corresponds to the correct label during training and evaluation.
Printing shapes validates the integrity of the data splits and prevents misalignment between images and labels.
Normalization ensures that features share a comparable scale, improving convergence and classification stability.
When you classify food across many categories, clean data handling is essential for reproducible results.
Memory-efficient arrays and clear logging make it easier to manage larger datasets and track progress.
By standardizing the preprocessing, experiments become comparable across epochs and hyperparameters.
This consistency is crucial for measuring real improvements in food classification performance.
These steps also prepare the ground for deployment in lightweight environments.
Tensors normalized to [0,1] are easy to feed into different backends or exported models.
A stable input convention reduces bugs when switching from training to inference.
This reliability improves confidence when you classify food in production systems.
### Move to the modeling stage of the food classification pipeline. # Now we will move to next step of building a model using Tensor Flow and Keras ### Import OS utilities for path handling used by the food classification workflow. import os ### Import OpenCV for optional visualization during food classification. import cv2 ### Import NumPy for array operations with image tensors in food classification. import numpy as np ### Import the load helper to read saved NumPy arrays that back the classify food steps. from numpy import load ### Import TensorFlow to build and train the neural network for food classification. import tensorflow as tf ### Import Keras high-level APIs to define layers and models that classify food. from tensorflow import keras ### Reduce TensorFlow log verbosity so food classification logs are concise. os.environ["TF_CPP_MIN_LOG_LEVEL"]="2" # Reduce the information messages ### Define the canonical set of class names for multiclass food classification. # Array of all the classes class_names = ["banana", "apple", "pear", "grapes", "orange", "kiwi", "watermelon", "pomegranate", "pineapple", "mango", "cucumber", "carrot", "capsicum", "onion", "potato", "lemon", "tomato", "raddish", "beetroot", "cabbage", "lettuce", "spinach", "soy beans", "cauliflower", "bell pepper", "chilli pepper", "turnip", "corn", "sweetcorn", "sweetpotato", "paprika", "jalepeno", "ginger", "garlic", "peas", "eggplant"] ### Load the training image tensors for food classification directly from disk. #load the saved train and test data train_data = load('c:/temp/train_data.npy') ### Load the aligned training labels that map images to food classes. train_data_lables = load('c:/temp/train_data_labels.npy') ### Load the test image tensors used to evaluate how well we classify food. test_data = load('c:/temp/test_data.npy') ### Load the larger test images for visualization of classify food predictions. test_data_big = load('c:/temp/test_data_big.npy') ### Load the test labels for objective accuracy measurement in food classification. test_data_labels = load('c:/temp/test_data_labels.npy') ### Confirm that arrays are loaded and ready to classify food. print("Finish loading the data ") ### Optionally preview a training sample to verify content before you classify food. #show a sample data - image number 116 in the train data # demoImage = train_data[116] # cv2.imshow('demoImage',demoImage) # index = train_data_lables[116] # print(class_names[index]) # cv2.waitKey(0) ### Print data shapes to validate dimensions for the food classification model. # data shape : print("train shape : ", train_data.shape) ### Print label shape to confirm one label per training image in food classification. print("train lables shape : ", train_data_lables.shape) ### Print test image shape for the evaluation split when you classify food. print("test data shape:", test_data.shape) ### Print test label shape to ensure alignment for accuracy computation. print("test data labels shape:", test_data_labels.shape) ### Normalize training images from [0,255] to [0,1] for stable food classification training. # the values of each pixel is 0 to 255 . We would like ot change it between 0 to 1 train_data = train_data / 255.0 ### Normalize test images using the same scale so we can classify food consistently. test_data = test_data / 255.0
Link for the full code : https://ko-fi.com/s/bba2540d4f
Build and Train the Neural Network to Classify Food
This section defines a simple yet effective neural network to perform food classification.
It flattens 28×28×3 inputs, uses a dense hidden layer with ReLU activation, and outputs 36 softmax probabilities.
It compiles the model with Adam and sparse categorical cross-entropy to handle integer labels efficiently.
It trains the network for multiple epochs to learn discriminative patterns that classify food correctly.
Dense architectures can perform strongly when inputs are compact and classes are well defined.
ReLU mitigates vanishing gradients and encourages sparse activations for efficient learning.
Softmax converts logits into a probability distribution across all food classes.
These choices create a balanced baseline for food classification before exploring deeper architectures.
Compilation defines how the model learns and how performance is measured.
Adam provides adaptive learning rates that converge quickly on normalized inputs.
Sparse categorical cross-entropy matches integer labels, keeping memory usage low.
Accuracy reporting offers an intuitive metric for how well the model can classify food.
Training schedules influence the final accuracy of food classification.
More epochs allow the network to refine boundaries between visually similar items.
Batch progression and gradient updates adjust weights to reduce loss steadily.
Careful monitoring prevents overfitting and enables targeted improvements to classify food robustly.
### Create a sequential Keras model to perform food classification on 36 classes. # build the model model = keras.Sequential([ ### Flatten 28x28x3 images into a 1D vector suitable for dense layers in food classification. # first we will flatten the images . We will take the 28X28X3 and flatten the shape as the input to the model keras.layers.Flatten(input_shape=(28,28,3)), # this is the input layer ### Add a hidden dense layer with ReLU activation to learn non-linear patterns that classify food. # lets define the hidden layer. # we dont know what is the exact number , so will try with 512 neurons keras.layers.Dense(512,activation='relu'), # relu has no negative values. ### Add the final softmax layer with 36 outputs, one per food class, to support food classification. # this is the last layer - the classifation for the classes # we have 36 classes (Apple , banana , orange .........) keras.layers.Dense(36,activation='softmax') # softmax has return values between 0 to 1 ]) ### Log completion of the model architecture for food classification. print('Finish build the model skeleton') ### Compile the model with an optimizer, loss, and metric appropriate for classify food tasks. # compile the model model.compile( ### Use Adam optimizer to accelerate convergence for normalized inputs in food classification. # optimizer -> calulate the gradient descent of the network optimizer='adam', ### Use sparse cross-entropy since labels are integer-encoded for classify food training. # loss function loss = 'sparse_categorical_crossentropy', ### Track accuracy to monitor how well we classify food over epochs. metrics=['accuracy'] # metrics measurment ) ### Log completion of the compile step. print('Finish compile the model') ### Train the model for multiple epochs to learn how to classify food accurately. # train the model # we start with a 10 epochs , but probebly will update the number to a bigger one during the sessions model.fit(train_data,train_data_lables,epochs=120)
Link for the full code : https://ko-fi.com/s/bba2540d4f
Evaluate, Predict, Visualize, and Compare Food Classification Results
This section evaluates generalization on a held-out test set to measure how well we classify food on unseen data.
It prints accuracy, generates probability predictions, and extracts the top class with argmax.
It overlays the predicted label on a large preview image for human-readable validation.
It compares predicted and ground-truth labels across the test set to assess consistency.
Objective metrics guide improvements to food classification pipelines.
A strong test accuracy indicates effective feature learning and class separation.
Per-image predictions help discover hard cases and visually similar foods.
This visibility supports targeted data augmentation and model refinements that classify food better.
Interactive visualization bridges the gap between numbers and intuition.
Annotating images with predicted labels makes errors obvious and actionable.
Inspecting successes and failures reveals dataset biases and class overlaps.
These insights inform the next iteration of food classification experiments.
Batch comparisons provide quick scans for systemic issues.
Repeated confusions can suggest merging classes, collecting more samples, or tuning architecture.
Clear logging helps document progress across experiments.
This discipline improves reliability when you classify food at scale.
### Evaluate the trained model on the test split to measure food classification accuracy. # we got loss: 0.2574 - accuracy: 0.9329 #lets test it on a new data . A data that the model never seen test_loss , test_acc = model.evaluate(test_data,test_data_labels,verbose=1) # verbose is a paramter of how detailed is the log in the console ### Print the overall test accuracy to summarize how well we classify food. print("******************* Test accuracy : ", test_acc) ### Generate probability distributions for each test image to classify food. # we have a result of : Test accuracy : 0.941504180431366 ## very good result - near 1 # predictions predictions = model.predict(test_data) ### Optionally print raw prediction arrays for detailed inspection during classify food analysis. #print(predictions) # we will get predicitions for the whole test data ### Announce which sample index will be inspected in detail. # lets show the predictions of a specific image , for example : image number 100 print ('The predicted class index :') ### Extract the top class index for image 100 using argmax to classify food. #first we will show the outcome # for every test image we will get 36 numbers between 0 to 1 # we got 36 numbers. # the higher number is the predicted class , so we have to extracr the index in the 36 list class_index = np.argmax(predictions[100]) # get the max value ### Print the predicted index to the console for traceability. print(class_index) ### Map the index to a human-readable class name for food classification reporting. class_name = class_names[class_index] ### Print the predicted class name. print('the class name :', class_name) ### Retrieve a large preview image and overlay the predicted label to visualize classify food results. # lets show the image number 100 # We saved test data as bigger files 280X280 . We will use it now demoImage = test_data_big[100] ### Draw the predicted class on the image for quick validation in food classification tasks. cv2.putText(demoImage,class_name,(20,20),cv2.FONT_HERSHEY_COMPLEX,1,(255,255,0),1) ### Display the annotated preview image. cv2.imshow('demoImage',demoImage) ### Wait for a key press before closing the image window. cv2.waitKey(0) ### Iterate over all predictions and compare them with ground truth to audit classify food performance. # lets compare all the results : for predict , test_label in zip(predictions,test_data_labels): ### Extract the predicted class index via argmax for each item. class_index = np.argmax(predict) ### Convert the predicted index into a food class name. class_name_predict = class_names[class_index] ### Convert the true label index into its class name for comparison. class_name_original = class_names[test_label] ### Print predicted versus original class names to review classify food outcomes. print('Predicted class :',class_name_predict , ' Original / real class name :', class_name_original ) ### Print a friendly note indicating that predictions appear accurate in this food classification run. # you can see that the preictions are very well !!!! # Thank you , and bye bye
Link for the full code : https://ko-fi.com/s/bba2540d4f
Connect :
☕ Buy me a coffee — https://ko-fi.com/eranfeit
🖥️ Email : feitgemel@gmail.com
🤝 Fiverr : https://www.fiverr.com/s/mB3Pbb
Enjoy,
Eran