Artificial Intelligence 13 min read

Building an Image Classification Model with CNNs

This article explains how to train a convolutional neural network on a remote GPU for image classification, covering convolution, padding, activation, pooling, dropout, flattening, fully‑connected layers, dataset preparation, model definition, training, and prediction using TensorFlow/Keras.

Code DAO

May 27, 2022

Building an Image Classification Model with CNNs

Convolution and Feature Maps

Convolution operations produce feature maps (also called activation maps). A 3×3 kernel slides over the input image, multiplying overlapping elements and summing them to generate each output pixel. For example, a 5×5 input with a 3×3 kernel yields a 3×3 feature map.

Padding

Without padding, feature maps shrink. To keep the output size equal to the input size, zero‑padding is added around the image before convolution. Padding reduces information loss and preserves edge details. Common padding options are “valid” (no padding) and “same” (output size matches input).

Activation Function

After each convolution, a Rectified Linear Unit (ReLU) is applied to introduce non‑linearity. ReLU sets all negative values to zero while leaving positive values unchanged.

Pooling

Pooling further reduces feature‑map dimensions. Max‑pooling with a 2×2 filter selects the maximum value within each 2×2 window, producing a smaller, more abstract representation that speeds up training.

Dropout Regularization

Dropout randomly disables a fraction of neurons during training, preventing over‑fitting by forcing the network to rely on multiple pathways.

Flattening and Fully‑Connected Layers

After pooling, the feature map is flattened into a single column and fed into one or more fully‑connected layers. The final layer uses softmax for multi‑class classification (101 food categories in this case) or sigmoid for binary classification.

Why Use CNNs?

Plain fully‑connected networks cannot efficiently extract spatial hierarchies from images and are computationally expensive. CNNs apply learned filters to capture local patterns, making them better suited for visual tasks.

Example Using Keras VGG19 for Feature Extraction

from tensorflow.keras.applications.vgg19 import VGG19
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.vgg19 import preprocess_input
import numpy as np
model = VGG19(weights='imagenet', include_top=False)
img_path = 'elephant.jpg'
img = image.load_img(img_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
features = model.predict(x)

Food‑101 Dataset Preparation

The Food‑101 dataset contains over 100,000 images across 101 categories.

!wget --no-check-certificate http://data.vision.ee.ethz.ch/cvl/food-101.tar.gz -O food.tar.gz
!tar xzvf food.tar.gz

Display a sample image:

plt.imshow(Image.open('food-101/images/beignets/2802124.jpg'))
plt.axis('off')
plt.show()

Creating a tf.data.Dataset

Images are loaded into TensorFlow data generators with a 20 % validation split. Data augmentation (rescaling, shear, zoom, horizontal flip, width/height shifts) is applied to the training set.

base_dir = 'food-101/images'
train_datagen = ImageDataGenerator(rescale=1./255, shear_range=0.2, zoom_range=0.2,
                                   horizontal_flip=True, width_shift_range=0.1,
                                   height_shift_range=0.1, validation_split=0.2)
validation_gen = ImageDataGenerator(rescale=1./255, validation_split=0.2)
image_size = (200, 200)
training_set = train_datagen.flow_from_directory(base_dir, seed=101,
                         target_size=image_size, batch_size=32,
                         subset='training', class_mode='categorical')
validation_set = validation_gen.flow_from_directory(base_dir,
                         target_size=image_size, batch_size=32,
                         subset='validation', class_mode='categorical')

Model Definition

model = Sequential([
    Conv2D(filters=32, kernel_size=(3,3), input_shape=(200,200,3), activation='relu'),
    MaxPooling2D(pool_size=(2,2)),
    Conv2D(filters=32, kernel_size=(3,3), activation='relu'),
    MaxPooling2D(pool_size=(2,2)),
    Dropout(0.25),
    Conv2D(filters=64, kernel_size=(3,3), activation='relu'),
    MaxPooling2D(pool_size=(2,2)),
    Dropout(0.25),
    Flatten(),
    Dense(128, activation='relu'),
    Dropout(0.25),
    Dense(101, activation='softmax')
])

Key Conv2D parameters: 32 filters, 3×3 kernel, input shape 200×200×3, ReLU activation. Max‑pooling uses 2×2 filters. The final dense layer has 101 units matching the number of food categories.

Compilation and Training

model.compile(optimizer='adam',
              loss=keras.losses.CategoricalCrossentropy(),
              metrics=[keras.metrics.CategoricalAccuracy()])
callback = EarlyStopping(monitor='loss', patience=3)
history = model.fit(training_set, validation_data=validation_set,
                    epochs=100, callbacks=[callback])

Training is performed on a remote GPU provided by the Layer platform. The function is decorated with @fabric('f-gpu-small') and @model(name='food-vision') to run in the GPU environment.

@pip_requirements(packages=["wget","tensorflow","keras"])
@fabric("f-gpu-small")
@model(name="food-vision")
def train():
    # (code that downloads the dataset, creates generators, defines and trains the model)
    return model

Evaluation and Logging

metrics_df = pd.DataFrame(history.history)
layer.log({"Metrics": metrics_df})
loss, accuracy = model.evaluate(validation_set)
layer.log({"Accuracy on test dataset": accuracy})
metrics_df[["loss","val_loss"]].plot()
layer.log({"Loss plot": plt.gcf()})
metrics_df[["categorical_accuracy","val_categorical_accuracy"]].plot()
layer.log({"Accuracy plot": plt.gcf()})

Prediction

To predict a new image, the trained model is retrieved from Layer, the image is resized to 200×200, normalized, expanded to batch dimension, and passed to model.predict. Softmax converts logits to class probabilities.

from keras.preprocessing import image
import numpy as np
image_model = layer.get_model('layer/image-classification/models/food-vision').get_train()
!wget --no-check-certificate https://upload.wikimedia.org/wikipedia/commons/b/b1/Buttermilk_Beignets_%284515741642%29.jpg -O /tmp/Buttermilk_Beignets_.jpg
test_image = image.load_img('/tmp/Buttermilk_Beignets_.jpg', target_size=(200,200))
test_image = image.img_to_array(test_image) / 255.0
test_image = np.expand_dims(test_image, axis=0)
prediction = image_model.predict(test_image)
class_names = os.listdir(base_dir)
scores = tf.nn.softmax(prediction[0]).numpy()
result = f"{class_names[np.argmax(scores)]} with a {(100*np.max(scores)).round(2)} percent confidence."
print(result)

This workflow demonstrates the complete pipeline from data acquisition to model deployment for image classification using CNNs.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

CNN Image Classification Python TensorFlow Keras GPU training Food-101

Written by

Code DAO

We deliver AI algorithm tutorials and the latest news, curated by a team of researchers from Peking University, Shanghai Jiao Tong University, Central South University, and leading AI companies such as Huawei, Kuaishou, and SenseTime. Join us in the AI alchemy—making life better!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.