Understanding Convolution, Convolutional Neural Networks, and Their Implementation in Image Processing
This article explains the mathematical concept of 2‑D convolution, demonstrates its use for image filtering with examples such as blurring and Sobel edge detection, introduces artificial neural networks and back‑propagation, and details the design, training, and performance of convolutional neural networks for tasks like Sobel filter learning and MNIST digit recognition, including full Python code examples.
Convolution in two dimensions combines two functions f(x,y) and g(x,y) to produce a new function c(x,y) by integrating the product of f(s,t) and g(x‑s,y‑t) over all s and t.
When applied to digital images, the continuous integral becomes a discrete sum, where a kernel F (e.g., a 3×3 matrix) slides over a grayscale image G, multiplying overlapping values and summing them to obtain the filtered image C.
Examples illustrate how a 3×3 averaging kernel produces a mild blur, while a Sobel operator detects edges; the effect of kernel size and normalization on output pixel ranges is discussed.
The article then introduces artificial neural networks (NN), describing the historical development of perceptrons, the back‑propagation algorithm, and the mathematical model of a neuron (weighted sum plus bias passed through an activation function).
It shows how a convolutional layer can be viewed as a NN layer with shared weights (the kernel) and no bias, and explains the architecture of a simple CNN that learns a Sobel filter by training on a single input‑output image pair.
Further, a more complex CNN for handwritten digit recognition on the MNIST dataset is presented: the network consists of reshaping, two convolution‑pooling blocks, a flatten layer, two fully‑connected layers of 1000 units each, and a linear output layer for ten classes.
Training details include using mean‑square error loss, stochastic gradient descent with momentum, learning‑rate decay, and ten epochs; the resulting model achieves about 96 % accuracy, with per‑class precision, recall and F1‑score reported.
Finally, the full Python implementation using Keras is provided, including data loading, model definition, training loop, and evaluation code, as well as a separate script that trains a single‑filter CNN to reproduce the Sobel operator on the Lena image.
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense, Flatten, Reshape, AveragePooling2D, Convolution2D, Activation
from keras.utils.np_utils import to_categorical
from keras.utils.visualize_util import plot
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, accuracy_score, confusion_matrix
from keras.callbacks import Callback
from keras.optimizers import SGD
class LossHistory(Callback):
def __init__(self):
Callback.__init__(self)
self.losses = []
self.accuracies = []
def on_train_begin(self, logs=None):
pass
def on_batch_end(self, batch, logs=None):
self.losses.append(logs.get('loss'))
self.accuracies.append(logs.get('acc'))
history = LossHistory()
data = pd.read_csv("train.csv")
data = data.sample(n=10000, replace=False)
digits = data[data.columns.values[1:]].values
labels = data.label.values
train_digits, test_digits, train_labels, test_labels = train_test_split(digits, labels)
train_labels_one_hot = to_categorical(train_labels)
test_labels_one_hot = to_categorical(test_labels)
model = Sequential()
model.add(Reshape(target_shape=(1, 28, 28), input_shape=(784,)))
model.add(Convolution2D(nb_filter=32, nb_row=3, nb_col=3, dim_ordering="th", border_mode="same", bias=False, init="uniform"))
model.add(AveragePooling2D(pool_size=(2, 2), dim_ordering="th"))
model.add(Convolution2D(nb_filter=64, nb_row=3, nb_col=3, dim_ordering="th", border_mode="same", bias=False, init="uniform"))
model.add(AveragePooling2D(pool_size=(2, 2), dim_ordering="th"))
model.add(Flatten())
model.add(Dense(output_dim=1000, activation="sigmoid"))
model.add(Dense(output_dim=1000, activation="sigmoid"))
model.add(Dense(output_dim=10, activation="linear"))
with open("digits_model.json", "w") as f:
f.write(model.to_json())
plot(model, to_file="digits_model.png", show_shapes=True)
opt = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss="mse", optimizer=opt, metrics=["accuracy"])
model.fit(train_digits, train_labels_one_hot, batch_size=32, nb_epoch=10, callbacks=[history])
model.save_weights("digits_model_weights.hdf5")
predict_labels = model.predict_classes(test_digits)
print(classification_report(test_labels, predict_labels))
print(accuracy_score(test_labels, predict_labels))
print(confusion_matrix(test_labels, predict_labels))360 Smart Cloud
Official service account of 360 Smart Cloud, dedicated to building a high-quality, secure, highly available, convenient, and stable one‑stop cloud service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.