Artificial Intelligence 18 min read

Building Image Recognition Systems: From Basics to Advanced AI Techniques

This article summarizes a computer‑vision salon where Dr. Ji Yongnan explains imaging pipelines, traditional feature‑based methods, deep‑learning breakthroughs, Tencent Cloud AI services, real‑world case studies, and answers audience questions about machine‑vision versus computer‑vision and data‑scarcity challenges.

Tencent Cloud Developer

Apr 16, 2019

Building Image Recognition Systems: From Basics to Advanced AI Techniques

The April 13 computer‑vision salon, led by Dr. Ji Yongnan—Ph.D. from the University of Nottingham and senior researcher at Tencent Cloud AI—provided a comprehensive overview of building image‑recognition systems, from fundamental imaging concepts to cutting‑edge AI applications.

Imaging Pipeline Overview

The speaker divided the pipeline into four layers. The first, the imaging layer, covers standard RGB cameras, industrial cameras, 3D structured‑light or TOF sensors, infrared, CT, medical imaging, and remote‑sensing modalities. The second layer handles low‑level processing such as denoising and geometric feature extraction (points, lines, planes). The third layer focuses on mid‑level tasks like object detection, segmentation, and registration. The fourth layer comprises high‑level applications, including face recognition, autonomous driving, and other AI‑driven services.

Traditional Image‑Processing Techniques

Early methods relied on spatial and frequency filters (Gaussian, Fourier, wavelet) and handcrafted features. Classic detectors such as Haar features, SIFT, and HoG were used for classification and localization. Segmentation techniques included watershed, MSER (maximally stable extremal regions), level‑set methods, and ASM (active shape models) for shape‑aware segmentation.

Deep‑Learning Evolution

With the advent of convolutional neural networks (CNNs), GPUs, and large pre‑trained models, training deeper networks became feasible. Typical classification networks consist of convolutional layers followed by fully‑connected layers. Object detection adds proposal modules, while segmentation often employs U‑Net‑style encoder‑decoder architectures. These advances have dramatically improved performance across many vision tasks.

Tencent Cloud AI Services

Tencent Cloud now offers high‑level APIs for OCR, video analysis, and image processing, including face‑landmark detection with up to 100 points. The platform provides virtual machines, compute resources, and a suite of tools that enable developers to build applications from coarse‑grained to fine‑grained levels.

Real‑World Case Studies

Examples demonstrated include a face‑fusion pipeline (localization → registration → segmentation → rendering) and an industrial defect‑detection system for smartphone‑screen production lines, where the goal is to separate defect regions (e.g., black spots) from a relatively static background using traditional and learning‑based methods.

Audience Q&A Highlights

Key questions addressed the distinction between machine vision (often industrial, traditional methods) and computer vision (broader, includes major tech firms), the maturity of classification and detection models (stable for generic scenarios but limited for niche cases), and strategies for handling scarce training data, such as problem definition, data augmentation, and custom synthetic data generation.