How Intel QAT Accelerates TLS: Multi-Buffer Software vs. Hardware Offload
Intel’s QuickAssist Technology (QAT) offers both Multi‑Buffer SIMD software acceleration and dedicated hardware offload to dramatically speed up TLS handshakes, with up to 5× faster RSA signing and 12× faster ECDH key exchange, while maintaining compatibility via OpenSSL’s async engine.
Background
Transport Layer Security (TLS) is the dominant protocol for securing Internet traffic, but its cryptographic operations, especially during the handshake, impose significant CPU overhead. Traditional software optimizations (session reuse, OCSP stapling, TLS 1.3) are insufficient as traffic grows, prompting the adoption of hardware‑assisted acceleration.
Intel QuickAssist Technology (QAT) Overview
Intel QAT provides a complete asynchronous TLS acceleration solution. It supports the full range of TLS cryptographic algorithms, covering both asymmetric operations (RSA, ECDSA, ECDHE) during the handshake and symmetric AES‑GCM encryption/decryption for data transfer.
QAT Engine Software Stack
The QAT Engine sits between applications and the QAT hardware, handling I/O data transfer. It is delivered as an OpenSSL third‑party engine, allowing applications to use standard OpenSSL APIs without extensive code changes. The engine can operate in two modes:
Software acceleration (qat_sw): Uses Intel Multi‑Buffer SIMD (AVX‑512) to parallelize cryptographic operations.
Hardware acceleration (qat_hw): Offloads cryptographic calculations to a dedicated QAT card.
Software Acceleration – Intel Multi‑Buffer
Multi‑Buffer leverages CPU SIMD instructions (AVX‑512) to process up to eight cryptographic jobs in parallel. It is implemented in two open‑source libraries that the QAT Engine integrates:
intel‑ipsec‑mb – provides SIMD‑optimized AES‑GCM and other symmetric algorithms. Repository: https://github.com/intel/intel-ipsec-mb ipp‑crypto – provides SIMD‑optimized RSA/ECDSA/ECDHE using Intel’s IFMA instructions. Repository:
https://github.com/intel/ipp/crypto/tree/develop/sources/ippcp/crypto_mbThese libraries enable significant performance gains for asymmetric operations (1.6‑2×) and modest gains for symmetric encryption (10‑15%).
Hardware Acceleration – QAT Card Offload
When using a QAT accelerator card, the asymmetric cryptographic workload is transferred from the CPU to the card, freeing CPU cycles and delivering higher throughput. A typical deployment combines Nginx (patched for OpenSSL async mode), the QAT Engine, and the QAT driver.
Nginx (Async Mode): Intel provides a patch for Nginx 1.18 to enable OpenSSL async mode. Patch: https://github.com/intel/asynch_mode_nginx OpenSSL (Async Mode): OpenSSL 1.1.1 adds native async support, allowing non‑blocking calls.
QAT Engine: OpenSSL engine plugin that forwards crypto jobs to the QAT card. Repository: https://github.com/intel/QAT_Engine QAT Driver: User‑space and kernel‑space components that expose the card’s API.
OpenSSL Async Mode and QAT Engine Flow
When async mode is enabled, OpenSSL creates coroutine‑like jobs for each cryptographic operation. The main job initiates the TLS handshake, then pauses while the QAT Engine offloads RSA/ECDHE signing to the card. The engine polls the card via an eventfd; once the card signals completion, the paused job resumes.
Application calls SSL_accept and waits for a client handshake.
OpenSSL creates handshake and I/O jobs, scheduled via ASYNC_start_job().
During RSA signing, the job is offloaded to the QAT card; the main job pauses ( ASYNC_pause_job()).
The engine polls the card; upon completion it writes to an eventfd and wakes the main job.
The main job resumes, finishes the handshake, and marks the job as ASYNC_FINISH.
Performance Evaluation
Tests were run on an Intel Xeon Platinum 8369B server (8 cores) with Linux 4.19, OpenSSL 1.1.1g, and the full QAT software stack. Benchmarks used openssl speed with and without the QAT engine (8 async jobs).
Key results:
RSA‑2048 sign/verify speedup: 4.9× / 2.9×
ECDH key exchange speedup: ~12×
AES‑256‑CBC performance: roughly unchanged (hardware offload offers little benefit for symmetric encryption in this test).
These numbers demonstrate that QAT’s greatest impact is on the TLS handshake’s asymmetric operations, making it ideal for high‑throughput gateways and long‑lived connections.
Advantages and Disadvantages
Advantages
High performance for compute‑intensive cryptography, reducing CPU load and increasing throughput.
Lower power consumption by offloading work to dedicated hardware.
Disadvantages
Additional hardware cost for the QAT card.
Limited to Intel‑specific platforms; not all processors support the required instructions.
Application code must be adapted to use OpenSSL async mode and the QAT engine, incurring migration effort.
Typical Use Cases
Ingress gateways (e.g., Nginx, long‑connection proxies) that terminate many TLS sessions.
VPN appliances that encrypt/decrypt large volumes of traffic.
Storage systems that benefit from hardware‑accelerated compression/decompression.
Video transcoding pipelines that can offload certain codecs to the QAT card.
Conclusion
Intel QAT provides two complementary acceleration paths—software Multi‑Buffer SIMD and hardware offload—that together deliver substantial TLS handshake performance gains. While the hardware solution offers the highest throughput, the software path delivers meaningful improvements without extra cards, making QAT a versatile option for security‑critical, high‑traffic environments.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
vivo Internet Technology
Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
