Artificial Intelligence 12 min read

Explaining Image Recognition: Logistic Regression and Convolutional Neural Networks

This article introduces the principles of image recognition, compares traditional logistic regression with convolutional neural networks, demonstrates their implementation using Python code, visualizes model weights, and explains key concepts such as padding, convolution, pooling, receptive fields, and multi‑layer feature extraction.

Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Explaining Image Recognition: Logistic Regression and Convolutional Neural Networks

1. Introduction

There are many image‑recognition methods, including traditional logistic regression, AdaBoost, convolutional neural networks (CNN), and Transformers. Modern algorithms have surpassed human accuracy, but understanding how they achieve recognition remains a challenge. This article studies the interpretability of two algorithms—logistic regression and CNN—to explain image‑recognition principles.

2. Logistic Regression

Logistic regression is a simple linear model that is efficient for basic tasks and highly interpretable, making it a good starting point for discussing image‑recognition principles.

2.1 Logistic Regression Principle

Logistic regression adds a sigmoid function to linear regression. The model is expressed as:

where X and W are vectors, and the output y lies in [0,1]; values ≥0.5 are classified as class 1, otherwise class 0. Training aims to find optimal W and b .

For image classification, X is the image flattened into a one‑dimensional vector. This flattening discards spatial information, a limitation addressed by CNNs.

2.2 Logistic Regression Implementation

Scikit‑learn provides an implementation. The following code trains a logistic‑regression model on the 8×8 digit dataset (digits 0 and 1):

from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_digits

X, y = load_digits(n_class=2, return_X_y=True)
lr = LogisticRegression()
lr.fit(X, y)

After training, the weight matrix and bias can be inspected:

print(lr.coef_.shape)
print(lr.intercept_.shape)
# Output
# (1, 64)
# (1,)

The number of weights matches the number of image pixels.

2.3 Logistic Regression Image‑Classification Principle

The weight vector W aligns with each pixel, indicating its contribution to the classification. Pixels belonging to class 1 receive positive weights, while those of class 0 receive negative weights.

An illustrative example uses two synthetic classes with distinct white‑region locations. After training, the weight distribution reflects these spatial patterns, which can be reshaped to the original image shape for visualization.

Reshaping and displaying the coefficients:

img = lr.coef_.reshape((8, 8))
plt.imshow(img)
plt.show()

The resulting heatmap shows higher weights in regions corresponding to digit 1 and lower weights where digit 0 appears.

3. Convolutional Neural Networks

3.1 CNN Overview

Compared with logistic regression, CNNs are more complex but can also be explained. A CNN extracts features through convolutional layers and then classifies them, similar to logistic regression.

A single convolution operation consists of three steps: Padding, Convolution, and Pooling.

3.1.1 Padding

Padding adds a border of zeros around the image to preserve spatial dimensions after convolution. For example, a 5×5 image becomes 7×7 after padding.

3.1.2 Convolution

The convolution kernel (a learned matrix) slides over the image, computing dot products at each position to produce a feature map. Larger dot‑product values indicate higher similarity between the kernel and the local region.

3.1.3 Pooling

Pooling (e.g., MaxPooling) reduces spatial resolution by selecting the maximum value within a region, providing translation invariance and reducing computational cost.

3.2 Receptive Field

Small kernels (e.g., 3×3) capture low‑level features such as edges. Stacking convolution and pooling layers enlarges the receptive field, enabling the network to recognize larger, more complex patterns.

3.3 CNN Digit Recognition Example

An example with digits 1 and 2 demonstrates how multiple convolution kernels generate distinct feature maps. The first‑layer kernels produce five feature maps for each digit; the second layer combines these maps with a multi‑channel kernel to form higher‑level representations.

For digit 1, the second‑layer output might be:

[[2, 6, 3, 4, 4],
 [4, 0, 3, 3, 4]]

For digit 2, the output could be:

[[2, 4, 3, 3, 4],
 [4, 0, 3, 4, 4]]

The sum of elements in each vector indicates similarity to the corresponding digit, and a final fully‑connected layer (akin to logistic regression) can make the final classification.

By extending the depth of the network, CNNs can recognize far more complex objects such as cats and dogs.

Past Highlights:

A One‑Page Overview of Twelve Major Deep Neural Networks

Front‑End AI: Pose Prediction Tutorial

How to Implement Text‑to‑Image Search

Ten Image Dehazing Algorithms: Principles and Comparisons

machine learningconvolutional neural networkImage Recognitionlogistic regressionexplainable AI
Rare Earth Juejin Tech Community
Written by

Rare Earth Juejin Tech Community

Juejin, a tech community that helps developers grow.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.