Artificial Intelligence 8 min read

Deep Learning Overview and Introduction to the Lightweight Distributed Inference Engine Avior

This article reviews deep learning and AI frameworks, highlights challenges of online model serving, and presents Avior—a lightweight, distributed inference engine designed for high‑performance AI services, detailing its architecture, layer design, benchmark results, and future development plans.

YunZhu Net Technology Team

Oct 22, 2021

Deep Learning Overview and Introduction to the Lightweight Distributed Inference Engine Avior

Deep learning (DL) is a new research direction in machine learning (ML) that brings the field closer to artificial intelligence (AI). It has achieved breakthroughs in Go (AlphaGo), gene editing (CGBE‑SMART), protein structure prediction (AlphaFold), and many applications such as search, data mining, machine translation, natural language processing, multimedia, speech, recommendation, and personalization. These achievements rely on AI frameworks such as TensorFlow, PyTorch, and Caffe.

Most AI frameworks focus on offline training, while online inference serving in industry faces several problems: (1) load‑balancing when deploying a model across multiple nodes (e.g., TensorFlow Serving); (2) PyTorch lacks a built‑in serving component, requiring conversion to ONNX which may cause accuracy loss; (3) there is no lightweight, general‑purpose inference platform—LightSeq only supports NLP sequence models, and solutions like Kubeflow or Knative hide networking layers, making them unsuitable for production.

The Cloud Construction Network R&D team has independently developed a lightweight AI distributed inference engine called Avior (patent application No. 202111043005.9). It is already integrated into supplier search services and is planned for use in recommendation, marketing, advertising, vision, and natural language processing.

Avior is a top‑down development framework that enables rapid integration of algorithm modules and high‑performance deployment, reducing code redundancy. It provides a large‑scale distributed computing platform for AI online inference, supporting tens of thousands of service nodes, automatic load balancing, dynamic scaling, and pay‑per‑use billing.

As shown in Figure 1, Avior uses GPU/CPU compute nodes to provide the basic compute power for online inference. The platform packages trained models in Docker containers or Kubernetes and embeds a Django/FastAPI server to accept external HTTP requests.

Avior is organized into four layers:

1. API layer : business‑level logic customization by assembling service components to produce specific inputs and outputs.

2. Service layer : defines atomic service assemblies to achieve particular functions; services may run in parallel or series (e.g., supplier search, OCR, recommendation).

3. Operator layer : implements concrete functions such as image upload/download, image detection, and text recognition.

4. Component layer : common components used by the framework, including RabbitMQ, Triton inference server, and OSS storage.

Avior was benchmarked with models trained by three frameworks. Table 1 shows latency on CPU and GPU for four tasks. All tests were performed on a Xeon Platinum 8269CY CPU @ 2.50 GHz.

Model

Framework

CPU

GPU

Face Alignment

TensorFlow

77 ms

20 ms

OCR Recognition

PyTorch

231 ms

76 ms

Machine Translation

ONNX

120 ms

32 ms

Face Semantic Segmentation

PyTorch

47 ms

11 ms

Future work includes automating code pull from Git, integrating Jenkins for CI/CD, adding monitoring metrics, and eventually open‑sourcing Avior to foster continuous innovation.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Deep Learning Distributed Inference Performance Benchmark AI frameworks model serving Avior

Written by

YunZhu Net Technology Team

Technical practice sharing from the YunZhu Net Technology Team

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.