Artificial Intelligence 25 min read

Mastering Inception v3: From Codebase to Rose Recognition with TensorFlow

This article walks through the Inception v3 TensorFlow codebase, explains its design principles, details the training script flags and loss calculations, shows how to fine‑tune the model on a flower dataset, and provides practical tips for building custom datasets and optimizing hyper‑parameters for image classification.

Qizhuo Club

Jul 30, 2018

Mastering Inception v3: From Codebase to Rose Recognition with TensorFlow

Story of an IT Guy

Background Knowledge

Convolutional neural networks are the core of most state‑of‑the‑art computer‑vision solutions. TensorFlow’s Inception code library implements the Inception architecture described in the paper “Rethinking the Inception Architecture for Computer Vision”. The paper proposes design principles such as avoiding early aggressive compression of feature maps, using high‑dimensional feature representations, and balancing network width and depth.

Principle 1: Do not over‑compress input‑output data early. Principle 2: High‑dimensional feature maps can replace deeper networks locally. Principle 3: Dimensionality reduction of low‑dimensional multi‑channel features does not hurt information. Principle 4: Balance network width and depth for optimal performance.

Applying these principles directly is non‑trivial; Inception approximates them. For detailed evaluation see the ICLR 2017 analysis paper where Inception achieves the best accuracy among many models.

Inception Code Introduction

Google open‑sourced the Models repository alongside TensorFlow. It contains four sub‑directories:

official : example models using high‑level TensorFlow APIs.

research : research‑grade models (including Inception) maintained by the community.

samples : small snippets and demos.

tutorials : models used in TensorFlow tutorials.

The Inception code lives in the research part. Its directory tree is:

├── README.md  # comprehensive description of the Inception codebase
├── WORKSPACE  # Bazel build file
├── g3doc
│   └── inception_v3_architecture.png  # architecture diagram
└── inception
    ├── BUILD
    ├── data
    │   ├── build_image_data.py
    │   ├── build_imagenet_data.py
    │   ├── download_and_preprocess_flowers.sh
    │   └── …
    ├── dataset.py
    ├── flowers_data.py
    ├── flowers_eval.py
    ├── flowers_train.py
    ├── image_processing.py
    ├── imagenet_data.py
    ├── imagenet_distributed_train.py
    ├── imagenet_eval.py
    ├── imagenet_train.py
    ├── inception_distributed_train.py
    ├── inception_eval.py
    ├── inception_model.py
    ├── inception_train.py
    └── slim  # lightweight library for model design, training and evaluation

The article focuses on inception_train.py and its flag parameters.

Flag Parameters

train_dir : directory for event logs and checkpoints (default /tmp/imagenet_train). max_steps : maximum training steps (default 10000000). subset : train or validation (default train). num_gpus : number of GPUs to use (default 1). log_device_placement : whether to log device placement (default False). fine_tune : if true, randomly initialize the final layer for a new task (default False). pretrained_model_checkpoint_path : path to a pretrained checkpoint. initial_learning_rate : initial learning rate (default 0.1). num_epochs_per_decay : epochs before learning‑rate decay. learning_rate_decay_factor : decay factor.

tower_loss Function Analysis

The _tower_loss function computes the total loss for a single tower (GPU). It builds the inference graph, computes the loss, gathers regularization losses, and adds them together.

with tf.variable_scope(tf.get_variable_scope(), reuse=reuse_variables):
    logits = inception.inference(images, num_classes, for_training=True,
                                 restore_logits=restore_logits, scope=scope)

split_batch_size = images.get_shape().as_list()[0]
inception.loss(logits, labels, batch_size=split_batch_size)

losses = tf.get_collection(slim.losses.LOSSES_COLLECTION, scope)
regularization_losses = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)
total_loss = tf.add_n(losses + regularization_losses, name='total_loss')

loss_averages = tf.train.ExponentialMovingAverage(0.9, name='avg')
loss_averages_op = loss_averages.apply(losses + [total_loss])

Training Function Analysis

The train function creates a global_step, computes learning‑rate decay, builds an RMSProp optimizer, splits the batch across GPUs, computes gradients per tower, averages them, adds summaries, and finally groups the update ops into train_op. It also creates a Saver, a summary writer, and runs the training loop, periodically writing checkpoints and summaries.

global_step = tf.get_variable('global_step', [], initializer=tf.constant_initializer(0), trainable=False)
num_batches_per_epoch = (dataset.num_examples_per_epoch() / FLAGS.batch_size)
decay_steps = int(num_batches_per_epoch * FLAGS.num_epochs_per_decay)
lr = tf.train.exponential_decay(FLAGS.initial_learning_rate,
                               global_step, decay_steps,
                               FLAGS.learning_rate_decay_factor,
                               staircase=True)
opt = tf.train.RMSPropOptimizer(lr, RMSPROP_DECAY,
                                 momentum=RMSPROP_MOMENTUM,
                                 epsilon=RMSPROP_EPSILON)

Gradients are computed per tower, averaged with _average_gradients, and applied with opt.apply_gradients. Moving averages for variables and batch‑norm updates are also maintained.

Practical Verification

The article demonstrates fine‑tuning Inception v3 on a small flower dataset (five classes). It shows how to download the pretrained model, convert the flower images to TFRecord format using download_and_preprocess_flowers.sh, and run flowers_train with appropriate flags ( --fine_tune=True, reduced learning rate, etc.). After ~2000 steps the validation accuracy reaches ~93.4%.

# Build and run the flower preprocessing script
FLOWERS_DATA_DIR=/tmp/flowers-data/
bazel-bin/inception/download_and_preprocess_flowers "$FLOWERS_DATA_DIR"

# Train on flowers
MODEL_PATH="${INCEPTION_MODEL_DIR}/inception-v3/model.ckpt-157585"
TRAIN_DIR=/tmp/flowers_train/
bazel-bin/inception/flowers_train \
  --train_dir="$TRAIN_DIR" \
  --data_dir="$FLOWERS_DATA_DIR" \
  --pretrained_model_checkpoint_path="$MODEL_PATH" \
  --fine_tune=True \
  --initial_learning_rate=0.001 \
  --input_queue_memory_factor=1

Evaluation is performed with flowers_eval, reporting precision and recall.

Building a New Dataset

Use build_image_data.py to convert a directory‑structured image dataset into sharded TFRecord files. The script expects a training and validation directory where each sub‑folder corresponds to a class. It adds a background class (label 0) automatically.

$TRAIN_DIR/dog/image0.jpeg
$TRAIN_DIR/dog/image1.jpg
$TRAIN_DIR/cat/weird-image.jpeg
$VALIDATION_DIR/dog/imageA.jpeg
…

Running the script produces sharded files such as train-00000-of-00128 and validation-00000-of-00024.

Training Considerations

Key hyper‑parameters include INITIAL_LEARNING_RATE , batch_size , and num_gpus . Larger batch sizes enable higher learning rates but require more GPU memory. If GPU memory is insufficient, reduce --batch_size or increase --num_gpus. CPU memory usage can be lowered by decreasing --input_queue_memory_factor, though very low values may slightly hurt accuracy.

“A good rule of thumb is to use the largest batch size that fits in GPU memory.”

References and Notes

Inception code repository: https://github.com/tensorflow/models/tree/master/research/inception

Rethinking the Inception Architecture for Computer Vision: https://arxiv.org/pdf/1512.00567.pdf

Analysis of deep neural network models for practical applications (ICLR 2017): https://arxiv.org/pdf/1605.07678.pdf

ImageNet dataset: http://www.image-net.org/

RMSProp optimizer description.

PS: The author is a beginner in machine learning and welcomes corrections.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

CNN image classification Python TensorFlow transfer learning Inception

Written by

Qizhuo Club

360 Mobile tech channel sharing practical experience and original insights from 360 Mobile Security and other teams across Android, iOS, big data, AI, and more.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.