Artificial Intelligence 12 min read

Explaining Image Recognition: Logistic Regression and Convolutional Neural Networks

This article introduces the principles of image recognition, compares traditional logistic regression with convolutional neural networks, demonstrates their implementation using Python code, visualizes model weights, and explains key concepts such as padding, convolution, pooling, receptive fields, and multi‑layer feature extraction.

Rare Earth Juejin Tech Community

Aug 6, 2023

Explaining Image Recognition: Logistic Regression and Convolutional Neural Networks

1. Introduction

There are many image‑recognition methods, including traditional logistic regression, AdaBoost, convolutional neural networks (CNN), and Transformers. Modern algorithms have surpassed human accuracy, but understanding how they achieve recognition remains a challenge. This article studies the interpretability of two algorithms—logistic regression and CNN—to explain image‑recognition principles.

2. Logistic Regression

Logistic regression is a simple linear model that is efficient for basic tasks and highly interpretable, making it a good starting point for discussing image‑recognition principles.

2.1 Logistic Regression Principle

Logistic regression adds a sigmoid function to linear regression. The model is expressed as:

where X and W are vectors, and the output y lies in [0,1]; values ≥0.5 are classified as class 1, otherwise class 0. Training aims to find optimal W and b.

For image classification, X is the image flattened into a one‑dimensional vector. This flattening discards spatial information, a limitation addressed by CNNs.

2.2 Logistic Regression Implementation

Scikit‑learn provides an implementation. The following code trains a logistic‑regression model on the 8×8 digit dataset (digits 0 and 1):

from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_digits

X, y = load_digits(n_class=2, return_X_y=True)
lr = LogisticRegression()
lr.fit(X, y)

After training, the weight matrix and bias can be inspected:

print(lr.coef_.shape)
print(lr.intercept_.shape)
# Output
# (1, 64)
# (1,)

The number of weights matches the number of image pixels.

2.3 Logistic Regression Image‑Classification Principle

The weight vector W aligns with each pixel, indicating its contribution to the classification. Pixels belonging to class 1 receive positive weights, while those of class 0 receive negative weights.

An illustrative example uses two synthetic classes with distinct white‑region locations. After training, the weight distribution reflects these spatial patterns, which can be reshaped to the original image shape for visualization.

Reshaping and displaying the coefficients:

img = lr.coef_.reshape((8, 8))
plt.imshow(img)
plt.show()

The resulting heatmap shows higher weights in regions corresponding to digit 1 and lower weights where digit 0 appears.

3. Convolutional Neural Networks

3.1 CNN Overview

Compared with logistic regression, CNNs are more complex but can also be explained. A CNN extracts features through convolutional layers and then classifies them, similar to logistic regression.

A single convolution operation consists of three steps: Padding, Convolution, and Pooling.

3.1.1 Padding

Padding adds a border of zeros around the image to preserve spatial dimensions after convolution. For example, a 5×5 image becomes 7×7 after padding.

3.1.2 Convolution

The convolution kernel (a learned matrix) slides over the image, computing dot products at each position to produce a feature map. Larger dot‑product values indicate higher similarity between the kernel and the local region.

3.1.3 Pooling

Pooling (e.g., MaxPooling) reduces spatial resolution by selecting the maximum value within a region, providing translation invariance and reducing computational cost.

3.2 Receptive Field

Small kernels (e.g., 3×3) capture low‑level features such as edges. Stacking convolution and pooling layers enlarges the receptive field, enabling the network to recognize larger, more complex patterns.

3.3 CNN Digit Recognition Example

An example with digits 1 and 2 demonstrates how multiple convolution kernels generate distinct feature maps. The first‑layer kernels produce five feature maps for each digit; the second layer combines these maps with a multi‑channel kernel to form higher‑level representations.

For digit 1, the second‑layer output might be:

[[2, 6, 3, 4, 4],
 [4, 0, 3, 3, 4]]

For digit 2, the output could be:

[[2, 4, 3, 3, 4],
 [4, 0, 3, 4, 4]]

The sum of elements in each vector indicates similarity to the corresponding digit, and a final fully‑connected layer (akin to logistic regression) can make the final classification.

By extending the depth of the network, CNNs can recognize far more complex objects such as cats and dogs.

Past Highlights:

A One‑Page Overview of Twelve Major Deep Neural Networks

Front‑End AI: Pose Prediction Tutorial

How to Implement Text‑to‑Image Search

Ten Image Dehazing Algorithms: Principles and Comparisons

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

machine learning convolutional neural network image recognition logistic regression explainable AI

Written by

Rare Earth Juejin Tech Community

Juejin, a tech community that helps developers grow.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.