How Tengine Boosts HTTPS Performance with Intel QAT Async Acceleration
This article explains how Alibaba’s lightweight web server Tengine leverages Intel QuickAssist Technology to offload SSL operations via an async ssl_async module, detailing its architecture, integration with OpenSSL, performance gains across various cipher suites, and the resulting multi‑fold increase in HTTPS throughput.
Background
Although HTTPS is widely discussed, few Chinese sites adopt it because it slows access and raises CPU cost. Software optimizations (session reuse, OCSP stapling, etc.) cannot keep up with traffic growth, and CPU scaling is limited, prompting the use of dedicated hardware offload. Tengine was the first open‑source server to employ Intel QAT acceleration cards, doubling HTTPS processing capacity.
Acceleration Scheme
Tengine-2.2.2 introduces new features such as ssl_async for asynchronous OpenSSL, TLS 1.3 with 0‑RTT, and upstream include directives. The ssl_async mode can offload SSL‑intensive operations to Intel QAT hardware, effectively doubling HTTPS handling performance.
Architecture
The async acceleration framework consists of three components: the ssl_async directive, the OpenSSL + QAT engine, and the QAT driver. Tengine-2.2.2 adapts OpenSSL‑1.1.0’s async interface, moving private‑key operations to the QAT engine, which communicates with the hardware via the QAT driver.
Principle
Tengine extends the socket event loop with an async_fd that receives notifications from the QAT engine. When OpenSSL requires a private‑key operation, it returns an async error code; Tengine registers the associated eventfd with epoll, frees the CPU, and later resumes processing once the QAT engine signals completion.
Performance Data
Test environment: Intel Xeon E5-2650 v2 (32 cores), 10 GbE NIC, OpenSSL 1.1.0-f, QAT engine v0.5.30, QAT driver qatmux.l.2.6.0-60. Benchmarks measured local 10‑byte responses under various certificates.
With the RSA-RSA-AES128-GCM-SHA256 suite, enabling ssl_async with QAT reaches 17.6 k QPS on 8 cores, whereas without QAT 32 cores achieve only 29 k QPS.
For ECDHE-RSA-AES128-GCM-SHA256, QAT‑accelerated mode reaches 15 k QPS on 16 cores, while the non‑accelerated server needs 32 fully‑loaded cores to hit 9.4 k QPS.
Using ECDHE-ECDSA-AES128-GCM-SHA256 (secp384r1), QAT delivers 13 k QPS on 8 cores, versus 11 k QPS on 32 cores without acceleration.
With ECDHE-ECDSA-AES128-GCM-SHA256 (prime256v1), QAT reaches its hardware peak of 16 k QPS on 8 cores, while the non‑accelerated version reaches 29 k QPS on 32 cores.
Conclusion
When ssl_async with QAT is enabled, HTTPS throughput for RSA-RSA ciphers improves by 3.8×, for ECDHE-RSA by 2.65×, and for ECDHE-ECDSA (P-384) by 2× compared with the non‑accelerated Tengine. However, for ECDHE-ECDSA (P-256) the gain is only about 23% because the hardware already operates near its peak.
We thank the Intel QuickAssist Technology team for their support and the Tengine open‑source community for their contributions.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
