Artificial Intelligence 23 min read

How to Build World-Class Visual AI Technology

This presentation outlines the fundamentals of computer vision, discusses key factors such as algorithm research, large‑scale training platforms, intelligent data processing, and hardware optimization, and shares practical experiences from DeepGlint on building a world‑class visual AI system and its real‑world applications.

DataFunTalk
DataFunTalk
DataFunTalk
How to Build World-Class Visual AI Technology

Speaker: Deng Yafeng, CTO of DeepGlint (格灵深瞳 CTO)

Source: AI Pioneer Conference – "How to Build World-Class Visual AI Technology"

Community: DataFun

The talk is divided into several parts. It starts with a brief background of computer‑vision technology, then, based on DeepGlint’s practice, introduces how to build a top‑level visual AI system from the perspectives of algorithm research, training platform, intelligent data processing, and heterogeneous computing, and finally presents the company’s real‑world deployment cases.

1. Computer Vision and Related Technologies

Computer vision is one of the most important directions in artificial intelligence. Its foundation is machine‑learning algorithms, especially deep learning, which has become the dominant approach thanks to increasing compute power and massive data. In many scenarios—mobile internet, autonomous driving, smart cities, medical imaging, robotics, AR/VR, industrial automation—vision algorithms have already surpassed human performance. Vision also serves as the digital gateway to the physical world and, when combined with big‑data techniques, offers vast application potential.

1.1 Overview of Computer Vision

Computer vision aims to enable computers to understand the visual world. It has evolved from basic object detection and classification to richer tasks such as attribute recognition, behavior analysis, and relationship modeling. The field now covers a spectrum of granularity: from image‑level classification to pixel‑level semantic segmentation.

1.2 Vision Technology Vision

Vision technology seeks richer functionality, broader category coverage, and finer understanding granularity, while product performance demands higher accuracy, faster speed, lower cost, larger scale, and richer features.

2. How to Build a World‑Class Visual AI System

2.1 Basic Workflow of a Vision System

A typical vision application is decomposed into sub‑modules (e.g., face detection, landmark localization, face recognition). Each module follows a pipeline: define I/O, collect data, annotate, select training framework, implement algorithm, train model, select the best model, and deploy.

2.2 Key Factors for Building a Vision System

Core algorithm R&D: develop more accurate, faster, and feature‑rich algorithms.

Automated large‑scale training framework: support massive clusters and automate training workflows.

Intelligent data mining and labeling: achieve high‑quality data at low cost.

Hardware‑aware computation optimization: choose appropriate chips (GPU, ARM, FPGA, ASIC) and optimize inference (e.g., TensorRT, CUDA).

2.2.1 Core Algorithm R&D

Academic research focuses on novel algorithms, while industry emphasizes functional, performance, and stability metrics that meet business constraints. In practice, algorithm improvement involves data handling, model architecture, loss design, acceleration techniques, and system‑level workflow redesign.

2.2.2 Automated Large‑Scale Training Framework

The platform unifies data management, code/parameter versioning, and resource scheduling, enabling engineers to launch training jobs via a web UI, run multiple hyper‑parameter sets, and automatically evaluate and select the best model.

Although open‑source frameworks (TensorFlow, PyTorch, MXNet) are powerful, they often lack features for extreme‑scale tasks (e.g., billions of classes). DeepGlint built a hybrid data‑parallel + model‑parallel system to train a 100‑million‑class face‑recognition model, distributing both data and weight matrices across multiple machines to overcome GPU memory limits.

2.2.3 Intelligent Data Mining and Annotation

DeepGlint developed an automated face‑labeling pipeline that detects faces, extracts quality metrics, clusters similar faces, removes duplicates, and iteratively refines the model, reducing manual labeling effort dramatically. Similar pipelines are used for vehicle plate collection, vehicle attribute labeling, etc.

2.2.4 Hardware‑Platform Optimization

Beyond algorithmic improvements, selecting appropriate hardware (GPU, ARM, FPGA, DSP, ASIC) and applying model‑level tricks (depth‑wise convolutions, ShuffleNet, neural architecture search, knowledge distillation) are essential for achieving low latency and cost‑effective deployment.

3. DeepGlint Company Overview and Application Cases

3.1 Vision

Founded in 2013, DeepGlint aims to let computers understand the world and make AI improve lives. It focuses on core AI algorithms (especially computer vision) and transforms them into low‑cost, large‑scale deployable products for smart city, smart retail, public safety, etc.

Key achievements include:

Face‑recognition accuracy of 90% at a 1‑in‑1‑billion false‑accept rate.

Vehicle brand/model recognition covering 20,000 categories.

Person re‑identification achieving 98.1% on Market1501 without test‑time training.

3.2 DeepGlint Brain Architecture

The "DeepGlint Brain" integrates data, algorithms, and training platforms, continuously producing industry‑leading models (full‑target detection, attribute recognition, face clustering, image‑search, behavior analysis, SLAM, etc.) and powers downstream smart devices, perception clouds, and robots.

3.3 Application Scenarios

Examples include:

Public‑security video search: combine body‑image retrieval with face‑recognition to locate suspects across heterogeneous camera networks.

Smart retail: use face‑ID to link offline purchases to a unified customer profile, generate heat‑maps and traffic flow analysis.

4. Future Trends

DeepGlint envisions a data‑driven, self‑learning visual AI system that continuously improves as data, algorithms, and applications evolve. The long‑term goal is to create AI with warmth and real value for humanity.

Author: Deng Yafeng, CTO of Beijing DeepGlint Information Technology Co., Ltd.

Recruitment: Algorithm Engineer (full‑time/intern) – contact [email protected].

data pipelinecomputer visiondeep learningHardware Optimizationtraining platformvisual AI
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.