Build a Handwritten Digit Recognizer with TensorFlow: Step‑by‑Step MNIST Tutorial
Learn the fundamentals of deep learning by building, training, and evaluating a TensorFlow model that recognizes handwritten digits from the MNIST dataset, covering data preparation, network architecture, activation functions, optimizer choices, model compilation, training loops, evaluation metrics, and visualization of predictions.
What is Deep Learning?
Deep learning is a branch of machine learning that uses multi‑layer neural networks to automatically extract hierarchical features from raw data, reducing the need for manual feature engineering. It excels at image, speech, and text tasks.
Key Components of a Neural Network
Node (Neuron) : receives inputs, performs weighted sum, and passes the result to the next layer.
Connection : the weighted link between neurons that carries information and is updated during training.
Layering : stacks of neurons (input → hidden → output) that form the network architecture.
MNIST Handwritten Digit Dataset
The MNIST dataset contains 60,000 training and 10,000 test 28×28 grayscale images of digits 0‑9. It can be loaded directly with Keras, which downloads and caches the data.
Data Preparation
Pixel values are normalized to the range 0‑1. A 20 % split of the training set is used as a validation set.
import numpy as np
import os
import matplotlib.pyplot as plt
mnist_path = os.path.expanduser('~/.keras/datasets/mnist.npz')
data = np.load(mnist_path)
print("MNIST 数据集包含以下数组:")
print(data.files)
x_train = data['x_train']
y_train = data['y_train']
x_test = data['x_test']
y_test = data['y_test']
print(f"
训练图像: {x_train.shape}, 类型: {x_train.dtype}")
print(f"训练标签: {y_train.shape}, 类型: {y_train.dtype}")
print(f"测试图像: {x_test.shape}, 类型: {x_test.dtype}")
print(f"测试标签: {y_test.shape}, 类型: {y_test.dtype}")
plt.figure(figsize=(10,5))
for i in range(10):
plt.subplot(2,5,i+1)
plt.imshow(x_train[i], cmap='gray')
plt.title(f"标签: {y_train[i]}")
plt.axis('off')
plt.tight_layout()
plt.savefig('mnist_samples.png')
print("
已保存10张示例图像到 mnist_samples.png")Model Definition
A simple fully‑connected network (MLP) is built with Keras.
model = keras.Sequential([
keras.Input(shape=(28, 28)),
keras.layers.Flatten(),
keras.layers.Dense(128, activation='relu'),
keras.layers.Dense(64, activation='relu'),
keras.layers.Dense(10, activation='softmax')
])Explanation of layers:
Input layer : 28×28 pixels → 784 features.
Flatten : converts the 2‑D image to a 1‑D vector.
Dense(128) and Dense(64) : hidden layers with ReLU activation.
Dense(10) : output layer with softmax for the ten digit classes.
Activation Functions
ReLU – outputs in [0, ∞), mitigates vanishing gradients.
Sigmoid – outputs in (0, 1), used for binary classification.
Softmax – outputs a probability distribution over classes, used for multi‑class classification.
Compilation and Training
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(train_images, train_labels,
epochs=5,
batch_size=32,
validation_data=(val_images, val_labels))Training uses the Adam optimizer, sparse categorical cross‑entropy loss, and accuracy as the metric. The dataset is split into 48,000 training samples and 12,000 validation samples (20 % of 60,000).
Evaluation
test_loss, test_acc = model.evaluate(test_images, test_labels)
print(f'Test accuracy: {test_acc}')The model reaches about 97 % accuracy on the test set.
Prediction and Visualization
predictions = model.predict(test_images)
predicted_labels = np.argmax(predictions, axis=1)
for i in range(10):
print(f"Sample {i}: predicted = {predicted_labels[i]}, true = {test_labels[i]}")Results can be visualized with Matplotlib (code omitted for brevity).
Alternative CNN Architecture
model = keras.Sequential([
keras.Input(shape=(28,28,1)),
keras.layers.Conv2D(32, kernel_size=(3,3), activation='relu'),
keras.layers.MaxPooling2D(pool_size=(2,2)),
keras.layers.Conv2D(64, kernel_size=(3,3), activation='relu'),
keras.layers.MaxPooling2D(pool_size=(2,2)),
keras.layers.Flatten(),
keras.layers.Dense(128, activation='relu'),
keras.layers.Dense(10, activation='softmax')
])The article concludes with a brief note on Transformers and their relationship to TensorFlow, and mentions the Kimi K2 large‑scale MoE model as a commercial offering.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
