Building an Image Classification Model with CNNs
This article explains how to train a convolutional neural network on a remote GPU for image classification, covering convolution, padding, activation, pooling, dropout, flattening, fully‑connected layers, dataset preparation, model definition, training, and prediction using TensorFlow/Keras.
Convolution and Feature Maps
Convolution operations produce feature maps (also called activation maps). A 3×3 kernel slides over the input image, multiplying overlapping elements and summing them to generate each output pixel. For example, a 5×5 input with a 3×3 kernel yields a 3×3 feature map.
Padding
Without padding, feature maps shrink. To keep the output size equal to the input size, zero‑padding is added around the image before convolution. Padding reduces information loss and preserves edge details. Common padding options are “valid” (no padding) and “same” (output size matches input).
Activation Function
After each convolution, a Rectified Linear Unit (ReLU) is applied to introduce non‑linearity. ReLU sets all negative values to zero while leaving positive values unchanged.
Pooling
Pooling further reduces feature‑map dimensions. Max‑pooling with a 2×2 filter selects the maximum value within each 2×2 window, producing a smaller, more abstract representation that speeds up training.
Dropout Regularization
Dropout randomly disables a fraction of neurons during training, preventing over‑fitting by forcing the network to rely on multiple pathways.
Flattening and Fully‑Connected Layers
After pooling, the feature map is flattened into a single column and fed into one or more fully‑connected layers. The final layer uses softmax for multi‑class classification (101 food categories in this case) or sigmoid for binary classification.
Why Use CNNs?
Plain fully‑connected networks cannot efficiently extract spatial hierarchies from images and are computationally expensive. CNNs apply learned filters to capture local patterns, making them better suited for visual tasks.
Example Using Keras VGG19 for Feature Extraction
from tensorflow.keras.applications.vgg19 import VGG19
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.vgg19 import preprocess_input
import numpy as np
model = VGG19(weights='imagenet', include_top=False)
img_path = 'elephant.jpg'
img = image.load_img(img_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
features = model.predict(x)Food‑101 Dataset Preparation
The Food‑101 dataset contains over 100,000 images across 101 categories.
!wget --no-check-certificate http://data.vision.ee.ethz.ch/cvl/food-101.tar.gz -O food.tar.gz
!tar xzvf food.tar.gzDisplay a sample image:
plt.imshow(Image.open('food-101/images/beignets/2802124.jpg'))
plt.axis('off')
plt.show()Creating a tf.data.Dataset
Images are loaded into TensorFlow data generators with a 20 % validation split. Data augmentation (rescaling, shear, zoom, horizontal flip, width/height shifts) is applied to the training set.
base_dir = 'food-101/images'
train_datagen = ImageDataGenerator(rescale=1./255, shear_range=0.2, zoom_range=0.2,
horizontal_flip=True, width_shift_range=0.1,
height_shift_range=0.1, validation_split=0.2)
validation_gen = ImageDataGenerator(rescale=1./255, validation_split=0.2)
image_size = (200, 200)
training_set = train_datagen.flow_from_directory(base_dir, seed=101,
target_size=image_size, batch_size=32,
subset='training', class_mode='categorical')
validation_set = validation_gen.flow_from_directory(base_dir,
target_size=image_size, batch_size=32,
subset='validation', class_mode='categorical')Model Definition
model = Sequential([
Conv2D(filters=32, kernel_size=(3,3), input_shape=(200,200,3), activation='relu'),
MaxPooling2D(pool_size=(2,2)),
Conv2D(filters=32, kernel_size=(3,3), activation='relu'),
MaxPooling2D(pool_size=(2,2)),
Dropout(0.25),
Conv2D(filters=64, kernel_size=(3,3), activation='relu'),
MaxPooling2D(pool_size=(2,2)),
Dropout(0.25),
Flatten(),
Dense(128, activation='relu'),
Dropout(0.25),
Dense(101, activation='softmax')
])Key Conv2D parameters: 32 filters, 3×3 kernel, input shape 200×200×3, ReLU activation. Max‑pooling uses 2×2 filters. The final dense layer has 101 units matching the number of food categories.
Compilation and Training
model.compile(optimizer='adam',
loss=keras.losses.CategoricalCrossentropy(),
metrics=[keras.metrics.CategoricalAccuracy()])
callback = EarlyStopping(monitor='loss', patience=3)
history = model.fit(training_set, validation_data=validation_set,
epochs=100, callbacks=[callback])Training is performed on a remote GPU provided by the Layer platform. The function is decorated with @fabric('f-gpu-small') and @model(name='food-vision') to run in the GPU environment.
@pip_requirements(packages=["wget","tensorflow","keras"])
@fabric("f-gpu-small")
@model(name="food-vision")
def train():
# (code that downloads the dataset, creates generators, defines and trains the model)
return modelEvaluation and Logging
metrics_df = pd.DataFrame(history.history)
layer.log({"Metrics": metrics_df})
loss, accuracy = model.evaluate(validation_set)
layer.log({"Accuracy on test dataset": accuracy})
metrics_df[["loss","val_loss"]].plot()
layer.log({"Loss plot": plt.gcf()})
metrics_df[["categorical_accuracy","val_categorical_accuracy"]].plot()
layer.log({"Accuracy plot": plt.gcf()})Prediction
To predict a new image, the trained model is retrieved from Layer, the image is resized to 200×200, normalized, expanded to batch dimension, and passed to model.predict. Softmax converts logits to class probabilities.
from keras.preprocessing import image
import numpy as np
image_model = layer.get_model('layer/image-classification/models/food-vision').get_train()
!wget --no-check-certificate https://upload.wikimedia.org/wikipedia/commons/b/b1/Buttermilk_Beignets_%284515741642%29.jpg -O /tmp/Buttermilk_Beignets_.jpg
test_image = image.load_img('/tmp/Buttermilk_Beignets_.jpg', target_size=(200,200))
test_image = image.img_to_array(test_image) / 255.0
test_image = np.expand_dims(test_image, axis=0)
prediction = image_model.predict(test_image)
class_names = os.listdir(base_dir)
scores = tf.nn.softmax(prediction[0]).numpy()
result = f"{class_names[np.argmax(scores)]} with a {(100*np.max(scores)).round(2)} percent confidence."
print(result)This workflow demonstrates the complete pipeline from data acquisition to model deployment for image classification using CNNs.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Code DAO
We deliver AI algorithm tutorials and the latest news, curated by a team of researchers from Peking University, Shanghai Jiao Tong University, Central South University, and leading AI companies such as Huawei, Kuaishou, and SenseTime. Join us in the AI alchemy—making life better!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
