How Tengine Boosts HTTPS Performance with Intel QAT Async Acceleration

This article explains how Alibaba’s lightweight web server Tengine leverages Intel QuickAssist Technology to offload SSL operations via an async ssl_async module, detailing its architecture, integration with OpenSSL, performance gains across various cipher suites, and the resulting multi‑fold increase in HTTPS throughput.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
How Tengine Boosts HTTPS Performance with Intel QAT Async Acceleration

Background

Although HTTPS is widely discussed, few Chinese sites adopt it because it slows access and raises CPU cost. Software optimizations (session reuse, OCSP stapling, etc.) cannot keep up with traffic growth, and CPU scaling is limited, prompting the use of dedicated hardware offload. Tengine was the first open‑source server to employ Intel QAT acceleration cards, doubling HTTPS processing capacity.

Acceleration Scheme

Tengine-2.2.2 introduces new features such as ssl_async for asynchronous OpenSSL, TLS 1.3 with 0‑RTT, and upstream include directives. The ssl_async mode can offload SSL‑intensive operations to Intel QAT hardware, effectively doubling HTTPS handling performance.

Architecture

The async acceleration framework consists of three components: the ssl_async directive, the OpenSSL + QAT engine, and the QAT driver. Tengine-2.2.2 adapts OpenSSL‑1.1.0’s async interface, moving private‑key operations to the QAT engine, which communicates with the hardware via the QAT driver.

Principle

Tengine extends the socket event loop with an async_fd that receives notifications from the QAT engine. When OpenSSL requires a private‑key operation, it returns an async error code; Tengine registers the associated eventfd with epoll, frees the CPU, and later resumes processing once the QAT engine signals completion.

Performance Data

Test environment: Intel Xeon E5-2650 v2 (32 cores), 10 GbE NIC, OpenSSL 1.1.0-f, QAT engine v0.5.30, QAT driver qatmux.l.2.6.0-60. Benchmarks measured local 10‑byte responses under various certificates.

With the RSA-RSA-AES128-GCM-SHA256 suite, enabling ssl_async with QAT reaches 17.6 k QPS on 8 cores, whereas without QAT 32 cores achieve only 29 k QPS.

For ECDHE-RSA-AES128-GCM-SHA256, QAT‑accelerated mode reaches 15 k QPS on 16 cores, while the non‑accelerated server needs 32 fully‑loaded cores to hit 9.4 k QPS.

Using ECDHE-ECDSA-AES128-GCM-SHA256 (secp384r1), QAT delivers 13 k QPS on 8 cores, versus 11 k QPS on 32 cores without acceleration.

With ECDHE-ECDSA-AES128-GCM-SHA256 (prime256v1), QAT reaches its hardware peak of 16 k QPS on 8 cores, while the non‑accelerated version reaches 29 k QPS on 32 cores.

Conclusion

When ssl_async with QAT is enabled, HTTPS throughput for RSA-RSA ciphers improves by 3.8×, for ECDHE-RSA by 2.65×, and for ECDHE-ECDSA (P-384) by 2× compared with the non‑accelerated Tengine. However, for ECDHE-ECDSA (P-256) the gain is only about 23% because the hardware already operates near its peak.

We thank the Intel QuickAssist Technology team for their support and the Tengine open‑source community for their contributions.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

TLS1.3Backend PerformanceIntel QATTengineasync SSLHTTPS acceleration
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.