TensorFlow MNIST Tutorial: Environment Setup, Softmax Regression, and CNN Implementation
This beginner‑friendly TensorFlow tutorial by Chen Yidong walks readers through Windows environment setup, explains TensorFlow’s graph‑execution model, and demonstrates both softmax linear regression and a deep convolutional neural network for MNIST, while also covering utility scripts, TensorBoard visualization, and CPU/GPU or multi‑GPU deployment.
This article, authored by Chen Yidong, a Tencent mobile client development engineer, provides a comprehensive beginner-friendly guide to machine learning using TensorFlow. It covers the entire workflow from environment preparation to running MNIST demo programs, including both softmax regression and deep convolutional neural network (CNN) models.
Content Outline
Environment Setup
Understanding TensorFlow Execution Mechanism
MNIST Softmax Linear Regression
MNIST Deep CNN
Tool Utilities
CPU, GPU, and Multi‑GPU Usage
1. Environment Setup (Windows)
Install Anaconda (e.g., Anaconda3 4.2) to manage Python packages and isolate environments.
Create a TensorFlow isolated environment via the Anaconda Prompt.
Install TensorFlow:
pip install tensorflow # install via package manager
pip install tensorflow‑cpu‑1.2.1‑cp35‑cp35m‑win_amd64.whl # install CPU version from a .whl fileFor GPU support, ensure compatible CUDA and cuDNN versions are installed (e.g., CUDA 8.1 and cuDNN 6). Add the cuDNN bin directory to the system PATH .
2. Understanding TensorFlow Execution Mechanism
TensorFlow uses a symbolic Tensor as a handle to operation outputs. Computation is defined as a graph, which is executed within a Session to obtain concrete values.
3. MNIST Softmax Linear Regression
The MNIST dataset consists of 28×28 grayscale images of handwritten digits (0‑9) and their corresponding labels.
Softmax regression models the probability of each digit by applying a linear transformation followed by a softmax normalization:
logits = tf.matmul(X, W) + b
probabilities = tf.nn.softmax(logits)The loss function is the cross‑entropy between predicted probabilities and true labels.
4. MNIST Deep Convolutional Neural Network (CNN)
The CNN architecture adds convolution, pooling, dropout, and fully‑connected layers to capture hierarchical features:
Reshape: [batch, 784] → [batch, 28, 28, 1]
Conv2D: learnable filters (e.g., 32 kernels of size 5×5)
Pooling: max‑pooling reduces spatial dimensions
Dropout: randomly disables neurons to prevent over‑fitting
Fully Connected (FC): dense layers ending with a softmax output
Typical data flow:
Input → Conv → Pool → Conv → Pool → FC → FC → SoftmaxTraining uses the Adam optimizer (or alternatives such as GradientDescentOptimizer, RMSPropOptimizer, etc.) to minimize the cross‑entropy loss.
5. Tool Utilities
A custom tool.py wraps common TensorFlow operations for easier reuse. Open‑source alternatives include TensorLayer, Keras, and tflearn.
Example checkpoint handling:
# checkpoint
saver = tf.train.Saver(max_to_keep=3, write_version=2)
model_file = tf.train.latest_checkpoint(FLAGS.log_dir)
if model_file:
saver.restore(sess, model_file)
# npz format
tools.load_and_assign_npz_dict(name=FLAGS.log_dir + '/model.npz', sess=sess)TensorBoard visualization can be launched with:
tensorboard --logdir=your-log-pathThis starts a web server (default http://localhost:6006 ) to display graphs, loss curves, and accuracy trends.
6. CPU, GPU, and Multi‑GPU
TensorFlow defaults to using all available CPUs ( /cpu:0 ) and the first GPU ( /gpu:0 ). Specific devices can be selected via tf.device or environment variables (e.g., CUDA_VISIBLE_DEVICES ).
Multi‑GPU training requires manual aggregation of gradients and losses; TensorFlow provides examples, but the process is more involved than in frameworks like Caffe.
Tencent Cloud Developer
Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.