Deep Learning Overview and Introduction to the Lightweight Distributed Inference Engine Avior
This article reviews deep learning and AI frameworks, highlights challenges of online model serving, and presents Avior—a lightweight, distributed inference engine designed for high‑performance AI services, detailing its architecture, layer design, benchmark results, and future development plans.
Deep learning (DL) is a new research direction in machine learning (ML) that brings the field closer to artificial intelligence (AI). It has achieved breakthroughs in Go (AlphaGo), gene editing (CGBE‑SMART), protein structure prediction (AlphaFold), and many applications such as search, data mining, machine translation, natural language processing, multimedia, speech, recommendation, and personalization. These achievements rely on AI frameworks such as TensorFlow, PyTorch, and Caffe.
Most AI frameworks focus on offline training, while online inference serving in industry faces several problems: (1) load‑balancing when deploying a model across multiple nodes (e.g., TensorFlow Serving); (2) PyTorch lacks a built‑in serving component, requiring conversion to ONNX which may cause accuracy loss; (3) there is no lightweight, general‑purpose inference platform—LightSeq only supports NLP sequence models, and solutions like Kubeflow or Knative hide networking layers, making them unsuitable for production.
The Cloud Construction Network R&D team has independently developed a lightweight AI distributed inference engine called Avior (patent application No. 202111043005.9). It is already integrated into supplier search services and is planned for use in recommendation, marketing, advertising, vision, and natural language processing.
Avior is a top‑down development framework that enables rapid integration of algorithm modules and high‑performance deployment, reducing code redundancy. It provides a large‑scale distributed computing platform for AI online inference, supporting tens of thousands of service nodes, automatic load balancing, dynamic scaling, and pay‑per‑use billing.
As shown in Figure 1, Avior uses GPU/CPU compute nodes to provide the basic compute power for online inference. The platform packages trained models in Docker containers or Kubernetes and embeds a Django/FastAPI server to accept external HTTP requests.
Avior is organized into four layers:
1. API layer : business‑level logic customization by assembling service components to produce specific inputs and outputs.
2. Service layer : defines atomic service assemblies to achieve particular functions; services may run in parallel or series (e.g., supplier search, OCR, recommendation).
3. Operator layer : implements concrete functions such as image upload/download, image detection, and text recognition.
4. Component layer : common components used by the framework, including RabbitMQ, Triton inference server, and OSS storage.
Avior was benchmarked with models trained by three frameworks. Table 1 shows latency on CPU and GPU for four tasks. All tests were performed on a Xeon Platinum 8269CY CPU @ 2.50 GHz.
Model
Framework
CPU
GPU
Face Alignment
TensorFlow
77 ms
20 ms
OCR Recognition
PyTorch
231 ms
76 ms
Machine Translation
ONNX
120 ms
32 ms
Face Semantic Segmentation
PyTorch
47 ms
11 ms
Future work includes automating code pull from Git, integrating Jenkins for CI/CD, adding monitoring metrics, and eventually open‑sourcing Avior to foster continuous innovation.
YunZhu Net Technology Team
Technical practice sharing from the YunZhu Net Technology Team
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.