How Intel QAT Accelerates TLS: Multi-Buffer Software vs. Hardware Offload

Intel’s QuickAssist Technology (QAT) offers both Multi‑Buffer SIMD software acceleration and dedicated hardware offload to dramatically speed up TLS handshakes, with up to 5× faster RSA signing and 12× faster ECDH key exchange, while maintaining compatibility via OpenSSL’s async engine.

vivo Internet Technology
vivo Internet Technology
vivo Internet Technology
How Intel QAT Accelerates TLS: Multi-Buffer Software vs. Hardware Offload

Background

Transport Layer Security (TLS) is the dominant protocol for securing Internet traffic, but its cryptographic operations, especially during the handshake, impose significant CPU overhead. Traditional software optimizations (session reuse, OCSP stapling, TLS 1.3) are insufficient as traffic grows, prompting the adoption of hardware‑assisted acceleration.

Intel QuickAssist Technology (QAT) Overview

Intel QAT provides a complete asynchronous TLS acceleration solution. It supports the full range of TLS cryptographic algorithms, covering both asymmetric operations (RSA, ECDSA, ECDHE) during the handshake and symmetric AES‑GCM encryption/decryption for data transfer.

QAT support for TLS algorithms
QAT support for TLS algorithms

QAT Engine Software Stack

The QAT Engine sits between applications and the QAT hardware, handling I/O data transfer. It is delivered as an OpenSSL third‑party engine, allowing applications to use standard OpenSSL APIs without extensive code changes. The engine can operate in two modes:

Software acceleration (qat_sw): Uses Intel Multi‑Buffer SIMD (AVX‑512) to parallelize cryptographic operations.

Hardware acceleration (qat_hw): Offloads cryptographic calculations to a dedicated QAT card.

Intel QAT Engine software stack
Intel QAT Engine software stack

Software Acceleration – Intel Multi‑Buffer

Multi‑Buffer leverages CPU SIMD instructions (AVX‑512) to process up to eight cryptographic jobs in parallel. It is implemented in two open‑source libraries that the QAT Engine integrates:

intel‑ipsec‑mb – provides SIMD‑optimized AES‑GCM and other symmetric algorithms. Repository: https://github.com/intel/intel-ipsec-mb ipp‑crypto – provides SIMD‑optimized RSA/ECDSA/ECDHE using Intel’s IFMA instructions. Repository:

https://github.com/intel/ipp/crypto/tree/develop/sources/ippcp/crypto_mb

These libraries enable significant performance gains for asymmetric operations (1.6‑2×) and modest gains for symmetric encryption (10‑15%).

Hardware Acceleration – QAT Card Offload

When using a QAT accelerator card, the asymmetric cryptographic workload is transferred from the CPU to the card, freeing CPU cycles and delivering higher throughput. A typical deployment combines Nginx (patched for OpenSSL async mode), the QAT Engine, and the QAT driver.

Nginx + Intel QAT software stack
Nginx + Intel QAT software stack

Nginx (Async Mode): Intel provides a patch for Nginx 1.18 to enable OpenSSL async mode. Patch: https://github.com/intel/asynch_mode_nginx OpenSSL (Async Mode): OpenSSL 1.1.1 adds native async support, allowing non‑blocking calls.

QAT Engine: OpenSSL engine plugin that forwards crypto jobs to the QAT card. Repository: https://github.com/intel/QAT_Engine QAT Driver: User‑space and kernel‑space components that expose the card’s API.

OpenSSL Async Mode and QAT Engine Flow

When async mode is enabled, OpenSSL creates coroutine‑like jobs for each cryptographic operation. The main job initiates the TLS handshake, then pauses while the QAT Engine offloads RSA/ECDHE signing to the card. The engine polls the card via an eventfd; once the card signals completion, the paused job resumes.

QAT Engine async execution flow
QAT Engine async execution flow

Application calls SSL_accept and waits for a client handshake.

OpenSSL creates handshake and I/O jobs, scheduled via ASYNC_start_job().

During RSA signing, the job is offloaded to the QAT card; the main job pauses ( ASYNC_pause_job()).

The engine polls the card; upon completion it writes to an eventfd and wakes the main job.

The main job resumes, finishes the handshake, and marks the job as ASYNC_FINISH.

Performance Evaluation

Tests were run on an Intel Xeon Platinum 8369B server (8 cores) with Linux 4.19, OpenSSL 1.1.1g, and the full QAT software stack. Benchmarks used openssl speed with and without the QAT engine (8 async jobs).

Key results:

RSA‑2048 sign/verify speedup: 4.9× / 2.9×

ECDH key exchange speedup: ~12×

AES‑256‑CBC performance: roughly unchanged (hardware offload offers little benefit for symmetric encryption in this test).

These numbers demonstrate that QAT’s greatest impact is on the TLS handshake’s asymmetric operations, making it ideal for high‑throughput gateways and long‑lived connections.

Advantages and Disadvantages

Advantages

High performance for compute‑intensive cryptography, reducing CPU load and increasing throughput.

Lower power consumption by offloading work to dedicated hardware.

Disadvantages

Additional hardware cost for the QAT card.

Limited to Intel‑specific platforms; not all processors support the required instructions.

Application code must be adapted to use OpenSSL async mode and the QAT engine, incurring migration effort.

Typical Use Cases

Ingress gateways (e.g., Nginx, long‑connection proxies) that terminate many TLS sessions.

VPN appliances that encrypt/decrypt large volumes of traffic.

Storage systems that benefit from hardware‑accelerated compression/decompression.

Video transcoding pipelines that can offload certain codecs to the QAT card.

Conclusion

Intel QAT provides two complementary acceleration paths—software Multi‑Buffer SIMD and hardware offload—that together deliver substantial TLS handshake performance gains. While the hardware solution offers the highest throughput, the software path delivers meaningful improvements without extra cards, making QAT a versatile option for security‑critical, high‑traffic environments.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Performance TestingIntel QATHardware offloadMulti-Buffer SIMDOpenSSL asyncTLS acceleration
vivo Internet Technology
Written by

vivo Internet Technology

Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.