Artificial Intelligence 23 min read

Running a CNN on Mobile: TensorFlow & OpenCV Document Detection Guide

This article walks through a real‑world mobile implementation of a convolutional neural network for document detection, covering problem definition, limitations of traditional OpenCV pipelines, the adoption of a HED edge‑detection network, data preparation, model training, TensorFlow library trimming, and deployment tricks for iOS and Android.

Tencent TDS Service

May 25, 2017

Running a CNN on Mobile: TensorFlow & OpenCV Document Detection Guide

Mobile CNN Document Detection with TensorFlow & OpenCV

1. Introduction

This piece is not a beginner tutorial on neural networks or machine learning; it demonstrates key techniques for running a CNN on a mobile device using a real product case.

2. Requirement

The goal is to locate the four corner coordinates of a rectangular document within an image.

3. Traditional Technical Solutions

Typical OpenCV tutorials rely on cv2.Canny() and cv2.findContours(), but they work only on ideal demo images. Real‑world photos contain noise, broken edges, and non‑rectangular contours, requiring extensive tuning.

4. Limitations of Traditional Methods

Edge detection depends on manually set thresholds, reducing robustness.

The mathematical model built on the edge map is complex and often fails on irregular edges.

5. Rethinking the Approach

After exhausting traditional tuning, the team turned to machine learning to improve the two critical steps: edge detection and rectangle extraction.

6. Ineffective Neural‑Network Attempts

6.1 End‑to‑End Regression

Directly regressing the four corner points failed because the problem is not purely regression‑friendly.

6.2 YOLO & FCN

YOLO for object detection and FCN for semantic segmentation did not achieve the required precision and were too heavy for real‑time mobile inference.

7. Effective Neural‑Network Solution

The team replaced the Canny step with a neural network that performs edge detection, simplifying the subsequent geometric algorithm.

7.1 Network Input/Output

The network takes an image and outputs an enhanced edge map suitable for rectangle extraction.

7.2 HED (Holistically‑Nested Edge Detection) Network

HED, built on VGG16, uses multi‑scale feature fusion. Unnecessary fully‑connected and softmax layers are removed, and only the five convolutional groups are retained.

8. Training the Network

8.1 Loss Function

All scales originally contributed to the loss; later only the fused final output was used, improving edge thinness.

8.2 Transposed Convolution Initialization

Bilinear up‑sampling kernels and a small learning rate were employed to aid convergence.

8.3 Cold‑Start Training

Training began with a small sample set (≈2000 images) for a few thousand iterations; if convergence was not observed, the run was aborted and restarted.

9. Training Dataset

Both synthetic (≈80,000 images) and manually annotated real images (≈1,200) were used to cover diverse perspectives and backgrounds.

10. Running TensorFlow on Mobile Devices

10.1 Using TensorFlow Libraries

iOS and Android are supported, but protobuf version conflicts may require namespace adjustments or manual library patches.

10.2 Deploying the Trained Model

After training, the checkpoint is converted to a frozen .pb file, which can be loaded directly via the TensorFlow C++ API on mobile.

11. Debugging a Crash

A missing Mul operation error was traced to unsupported TensorFlow ops on mobile; the offending code was rewritten to avoid tf.shape and tf.pack in deconvolution.

12. Trimming TensorFlow

Only the required ops (46 out of 200+) were kept in tf_op_files.txt, reducing the library size dramatically.

13. Model Pruning

By reducing the number of filters in each VGG group, the model size dropped from 56 MB to 4.2 MB while maintaining ~0.1 s per‑frame inference on an iPhone 7 Plus.

14. Choosing TensorFlow APIs

Higher‑level APIs such as TensorFlow‑Slim were used to improve code readability and reuse.

15. Complementary OpenCV Algorithm

After HED edge detection, HoughLinesP extracts line segments, which are extended, merged, and filtered to compute intersection points and finally select the best rectangle.

16. Summary

Algorithm Perspective

Parameter tuning is largely empirical.

Neural‑network development is an experimental science.

Labeling data is costly and often a bottleneck.

Balancing accuracy, model size, and speed is essential.

Engineering Perspective

When end‑to‑end fails, a pipeline approach with targeted networks works.

Master at least one deep‑learning framework and maintain high code quality.

Learn core patterns and adapt them across problems.

Bridge academic advances with practical engineering constraints.

CNN TensorFlow OpenCV Edge Detection Document Detection

Written by

Tencent TDS Service

TDS Service offers client and web front‑end developers and operators an intelligent low‑code platform, cross‑platform development framework, universal release platform, runtime container engine, monitoring and analysis platform, and a security‑privacy compliance suite.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.