Artificial Intelligence 20 min read

Build a Handwritten Digit Recognizer with TensorFlow: Step‑by‑Step MNIST Tutorial

Learn the fundamentals of deep learning by building, training, and evaluating a TensorFlow model that recognizes handwritten digits from the MNIST dataset, covering data preparation, network architecture, activation functions, optimizer choices, model compilation, training loops, evaluation metrics, and visualization of predictions.

Alibaba Cloud Developer

Jul 23, 2025

Build a Handwritten Digit Recognizer with TensorFlow: Step‑by‑Step MNIST Tutorial

What is Deep Learning?

Deep learning is a branch of machine learning that uses multi‑layer neural networks to automatically extract hierarchical features from raw data, reducing the need for manual feature engineering. It excels at image, speech, and text tasks.

Key Components of a Neural Network

Node (Neuron) : receives inputs, performs weighted sum, and passes the result to the next layer.

Connection : the weighted link between neurons that carries information and is updated during training.

Layering : stacks of neurons (input → hidden → output) that form the network architecture.

MNIST Handwritten Digit Dataset

The MNIST dataset contains 60,000 training and 10,000 test 28×28 grayscale images of digits 0‑9. It can be loaded directly with Keras, which downloads and caches the data.

Data Preparation

Pixel values are normalized to the range 0‑1. A 20 % split of the training set is used as a validation set.

import numpy as np
import os
import matplotlib.pyplot as plt

mnist_path = os.path.expanduser('~/.keras/datasets/mnist.npz')
data = np.load(mnist_path)

print("MNIST 数据集包含以下数组:")
print(data.files)

x_train = data['x_train']
y_train = data['y_train']
x_test = data['x_test']
y_test = data['y_test']

print(f"
训练图像: {x_train.shape}, 类型: {x_train.dtype}")
print(f"训练标签: {y_train.shape}, 类型: {y_train.dtype}")
print(f"测试图像: {x_test.shape}, 类型: {x_test.dtype}")
print(f"测试标签: {y_test.shape}, 类型: {y_test.dtype}")

plt.figure(figsize=(10,5))
for i in range(10):
    plt.subplot(2,5,i+1)
    plt.imshow(x_train[i], cmap='gray')
    plt.title(f"标签: {y_train[i]}")
    plt.axis('off')
plt.tight_layout()
plt.savefig('mnist_samples.png')
print("
已保存10张示例图像到 mnist_samples.png")

Model Definition

A simple fully‑connected network (MLP) is built with Keras.

model = keras.Sequential([
    keras.Input(shape=(28, 28)),
    keras.layers.Flatten(),
    keras.layers.Dense(128, activation='relu'),
    keras.layers.Dense(64, activation='relu'),
    keras.layers.Dense(10, activation='softmax')
])

Explanation of layers:

Input layer : 28×28 pixels → 784 features.

Flatten : converts the 2‑D image to a 1‑D vector.

Dense(128) and Dense(64) : hidden layers with ReLU activation.

Dense(10) : output layer with softmax for the ten digit classes.

Activation Functions

ReLU – outputs in [0, ∞), mitigates vanishing gradients.

Sigmoid – outputs in (0, 1), used for binary classification.

Softmax – outputs a probability distribution over classes, used for multi‑class classification.

Compilation and Training

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(train_images, train_labels,
          epochs=5,
          batch_size=32,
          validation_data=(val_images, val_labels))

Training uses the Adam optimizer, sparse categorical cross‑entropy loss, and accuracy as the metric. The dataset is split into 48,000 training samples and 12,000 validation samples (20 % of 60,000).

Evaluation

test_loss, test_acc = model.evaluate(test_images, test_labels)
print(f'Test accuracy: {test_acc}')

The model reaches about 97 % accuracy on the test set.

Prediction and Visualization

predictions = model.predict(test_images)
predicted_labels = np.argmax(predictions, axis=1)

for i in range(10):
    print(f"Sample {i}: predicted = {predicted_labels[i]}, true = {test_labels[i]}")

Results can be visualized with Matplotlib (code omitted for brevity).

Alternative CNN Architecture

model = keras.Sequential([
    keras.Input(shape=(28,28,1)),
    keras.layers.Conv2D(32, kernel_size=(3,3), activation='relu'),
    keras.layers.MaxPooling2D(pool_size=(2,2)),
    keras.layers.Conv2D(64, kernel_size=(3,3), activation='relu'),
    keras.layers.MaxPooling2D(pool_size=(2,2)),
    keras.layers.Flatten(),
    keras.layers.Dense(128, activation='relu'),
    keras.layers.Dense(10, activation='softmax')
])

The article concludes with a brief note on Transformers and their relationship to TensorFlow, and mentions the Kimi K2 large‑scale MoE model as a commercial offering.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Image Classification Neural Network TensorFlow MNIST Keras

Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.