Operations 13 min read

How Alibaba Accelerated Tengine Gzip with Intel QAT: A Performance Case Study

This article examines how Alibaba's Tengine access layer tackled the CPU bottleneck of Gzip compression by adopting hardware acceleration with Intel QAT cards, detailing the analysis, implementation challenges, performance gains, and operational safeguards that resulted in up to 15% CPU savings and reduced system load.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
How Alibaba Accelerated Tengine Gzip with Intel QAT: A Performance Case Study

Background

General‑purpose CPUs are reaching the limits of Moore's Law while machine‑learning and web services grow exponentially, prompting Alibaba to explore hardware acceleration for its Tengine access layer. Gzip compression consumes 15‑20% of CPU in Tengine, making hardware offload essential for performance and cost.

Analysis and Research

Hardware acceleration replaces software algorithms with dedicated hardware, offering higher efficiency. Two main approaches are considered:

FPGA – field‑programmable gate array, customizable for specific algorithms (e.g., smart NICs).

ASIC – application‑specific integrated circuit, such as Intel QAT cards that accelerate SSL, compression, and decompression.

Comparative tables (omitted) show the trade‑offs. Alibaba evaluated three solutions:

Intel QAT Card

QAT (Quick Assist Technology) accelerates RSA/ECDH/ECDSA/DH/DSA and provides a zlib compression shim compatible with existing code, requiring minimal changes.

Intelligent NIC

INIC offers two modes: (a) host‑side API returns compressed data; (b) host sends uncompressed packets, NIC compresses and re‑packs them. Both require significant integration effort.

FPGA Card

FPGA demands a full redesign of the zlib algorithm and driver, leading to high development cost.

After comparison, the QAT ASIC was selected for Tengine Gzip offload.

Implementation

The QAT driver uses Userspace I/O (UIO) with most logic in user space, simplifying debugging and avoiding kernel floating‑point limitations. SR‑IOV enables sharing the PCIe device across up to 32 VMs. The acceleration chain links Zlib Shim, QAT user‑space API, and the QAT driver, minimizing impact on upper‑level services.

Key challenges addressed:

Initial driver caused high CPU usage in kernel mode (ioctl, memory allocation). Replaced with an OOT memory manager (USDM) using a huge‑page pool.

Open, ioctl, and futex calls spiked after acceleration; driver and shim were tuned to reduce these calls.

Reloading workers could exhaust the limited QAT instance pool (64 instances). Updated driver increased the pool to 256 and added automatic fallback to software compression.

Huge‑page memory leaks in the shim caused QAT core dumps; fixing the lifecycle of (In)Deflate calls eliminated the leaks.

Operational safeguards include automatic detection of QAT availability, deployment of dual binaries (software vs. hardware), and runtime fallback to software compression when resources are insufficient.

Performance Results

Test environment: Intel Xeon E5‑2650 v2 (32 cores), Zlib 1.2.8, QAT driver intel‑qatOOT40052.

With QAT enabled, average CPU usage dropped from ~48% to ~41%, system load decreased from 14.22 to 12.09, and Gzip hot‑spot functions were largely eliminated, confirming near‑complete offload.

Conclusion

The collaboration between Alibaba's Tair & Tengine teams and Intel delivered a robust hardware‑accelerated Gzip solution that improves performance, reduces CPU consumption, and lays groundwork for future SSL + Gzip integration, filling a gap in the industry’s access‑layer acceleration landscape.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

GzipServer Architecture
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.