Upgrading HED Edge Detection to TensorFlow 1.7: Refactored Code and New Layer Techniques
This tutorial walks through rewriting the HED edge‑detection network for TensorFlow 1.7, covering deprecated API fixes, migration from TF‑Slim to tf.layers, matrix initialization, batch normalization nuances, and a comprehensive review of convolution variants such as 1×1, depthwise, separable, and dilated convolutions, plus guidance on transposed convolutions and modern architectures like ResNet and Inception.
Preface
This article continues a previous blog about document scanning with TensorFlow and OpenCV, describing the upgrade of the HED edge‑detection model from TensorFlow 1.0 (which used the TF‑Slim API) to TensorFlow 1.7. The author encountered deprecated‑API errors and a non‑converging model when loading old checkpoints, decided to rewrite the code, and used the opportunity to integrate a year’s worth of new knowledge.
TensorFlow Code Style for CNN Nets
Using the low‑level tf.nn.conv2d API results in verbose code, while the TF‑Slim style simplifies layer definitions. However, TF‑Slim cannot express many newer layer types, so the author switched to tf.layers, which offers lower‑level flexibility without the extra abstraction.
input = ...
with tf.name_scope('conv1_1') as scope:
kernel = tf.Variable(tf.truncated_normal([3, 3, 64, 128], dtype=tf.float32, stddev=1e-1), name='weights')
conv = tf.nn.conv2d(input, kernel, [1, 1, 1, 1], padding='SAME')
biases = tf.Variable(tf.constant(0.0, shape=[128], dtype=tf.float32), trainable=True, name='biases')
bias = tf.nn.bias_add(conv, biases)
conv1 = tf.nn.relu(bias, name=scope)With tf.layers the same operation becomes:
input = ...
net = slim.conv2d(input, 128, [3, 3], scope='conv1_1')The author notes that while TF‑Slim is elegant for standard convolutions, tf.layers handles newer layer structures more comfortably.
Matrix Initialization
Common initializers such as tf.truncated_normal or tf.truncated_normal_initializer work, but the author recommends Xavier initialization for deeper networks:
W = tf.get_variable('W', shape=[784, 256], initializer=tf.contrib.layers.xavier_initializer())Batch Normalization
Batch Normalization (BN) normalizes layer inputs during training, accelerating convergence and allowing higher learning rates. When BN is used, a bias term is unnecessary because the subsequent scaling and shifting replace it.
def _vgg_conv2d(inputs, filters, kernel_size):
use_bias = True
if const.use_batch_norm:
use_bias = False
outputs = tf.layers.conv2d(inputs, filters, kernel_size, padding='same', activation=None, use_bias=use_bias, kernel_initializer=filter_initializer, kernel_regularizer=weights_regularizer)
if const.use_batch_norm:
outputs = tf.layers.batch_normalization(outputs, training=is_training)
outputs = tf.nn.relu(outputs)
return outputsBN should be placed before the activation function, but placing it after ReLU is also acceptable and may slightly improve accuracy. BN operates on the layer input, not the output, so it should not be added after the final output layer.
From Convolution Operations to Various Convolution Layers
Standard convolution uses tf.nn.conv2d with a filter shape (filter_height, filter_width, in_channels, out_channels). The article illustrates the channel‑wise interaction with diagrams.
1×1 Convolution reduces channel dimensionality by applying a scalar weight per channel, effectively performing a channel‑wise linear combination.
Depthwise Convolution processes each input channel independently, producing in_channels * channel_multiplier output channels.
tf.nn.depthwise_conv2d(input, filter, strides, padding)Separable Convolution combines a depthwise step followed by a 1×1 pointwise convolution, reducing computation compared to a full convolution.
tf.nn.separable_conv2d(input, depthwise_filter, pointwise_filter, strides, padding)Dilated (atrous) convolutions introduce a rate parameter to enlarge the receptive field without increasing parameters. Although not used in the HED model, they are essential in architectures like DeepLab.
Transposed Convolution / Deconvolution Initialization
The HED network uses transposed convolutions for up‑sampling. Initially the author tried bilinear kernel initialization, but later switched to Xavier initialization, achieving comparable training results. The kernel size is set inside the helper function _dsn_deconv2d_with_upsample_factor.
From VGG to ResNet, Inception, Xception
VGG’s simple stack of standard convolutions suffers from gradient‑vanishing when deepened. Modern architectures address this:
ResNet : adds identity shortcuts (Y = F(X) + X) to mitigate gradient loss.
Inception : widens the network by applying multiple kernel sizes in parallel and uses 1×1 convolutions for dimensionality reduction.
Xception : treats each channel as an independent group, effectively using depthwise separable convolutions.
These ideas can be combined, e.g., Inception‑ResNet.
References and Resources
The full source code is available on GitHub: https://github.com/fengjian0106/hed-tutorial-for-document-scanning . Additional reading includes tutorials on weight initialization, batch normalization, convolution arithmetic, and papers on DeepLab, ResNet, Inception, and Xception.
Images illustrating convolution concepts:
Tencent TDS Service
TDS Service offers client and web front‑end developers and operators an intelligent low‑code platform, cross‑platform development framework, universal release platform, runtime container engine, monitoring and analysis platform, and a security‑privacy compliance suite.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
