Understanding Convolutional Neural Networks (CNN) with Keras
The article introduces convolutional neural networks, explains core concepts such as convolution, padding, stride, and pooling, demonstrates how to calculate output dimensions, and provides a step‑by‑step Keras example that builds, compiles, and trains a multi‑layer CNN for image classification.
CNN (Convolutional Neural Network) is widely used in image and video processing. This article explains how CNNs have evolved, why they excel in visual tasks, and demonstrates building a CNN with Keras.
What is a Convolutional Neural Network?
CNNs are similar to ordinary neural networks: they consist of neurons that learn weights and biases. They are most effective for images, taking an image as input and applying the mathematical operation called convolution .
Convolution is an operation on two real‑valued functions. In CNN terminology, the first argument is the input, the second is the kernel, and the output is the feature map.
The operation is mathematically called convolution.
In CNNs, the first parameter of convolution is often called the input, the second the kernel, and the result is a feature map. The following illustration shows how the kernel slides over the input matrix:
Thus, the green matrix is the input (pixel matrix), the yellow matrix is the kernel, and the resulting feature map may change size.
Pooling Layer
Pooling layers are placed between consecutive convolutional layers. They reduce spatial dimensions, lower the number of parameters, and help prevent over‑fitting. A 2×2 max‑pool with stride 2 halves the width and height while keeping depth unchanged, discarding 75% of activations.
Implementation of max pooling is shown in the following image:
Output Size Calculation
The output feature‑map dimensions are computed with the formula that involves padding (p) and stride (s). Details are discussed later.
Padding
Padding adds extra pixels around the border of the input matrix, ensuring that corner pixels receive the same amount of attention as central ones. Zero‑padding adds a layer of zeros around the original matrix.
Stride
Stride determines how many pixels the kernel moves at each step. Larger strides reduce the size of the output feature map and the number of parameters.
Designing a CNN
A typical CNN consists of convolutional layers, pooling layers, and fully‑connected layers (with a Softmax output for multi‑class problems). The architecture below is implemented with Keras.
model = Sequential() model.add(Conv2D(32, kernel_size=(5,5), strides=(1,1), activation='relu', input_shape=input_shape)) model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2))) model.add(Conv2D(64, (5,5), activation='relu')) model.add(MaxPooling2D(pool_size=(2,2))) model.add(Flatten()) model.add(Dense(1000, activation='relu')) model.add(Dense(num_classes, activation='softmax')) model.compile(loss=keras.losses.categorical_crossentropy, optimizer=keras.optimizers.SGD(lr=0.01), metrics=['accuracy']) model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, verbose=1, validation_data=(x_test, y_test), callbacks=[history])
The first layer uses 32 filters of size 5×5 with stride 1 and ReLU activation, followed by a max‑pooling layer. The second convolutional block uses 64 filters, again followed by max‑pooling. After flattening, two dense layers are added—one with ReLU and the final one with Softmax. The network is trained using categorical cross‑entropy loss and stochastic gradient descent (SGD).
Reference: Understanding Convolutional Neural Networks (CNNs) by Subham Tiwari.
Tencent Cloud Developer
Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.