Asynchronous RSA Proxy Computation and Performance Optimization for SSL Handshake
The article presents a comprehensive engineering solution that separates RSA operations, employs parallel and asynchronous processing, modifies OpenSSL and Nginx event handling, and adds symmetric‑encryption optimizations to boost SSL/TLS handshake throughput by over threefold while reducing CPU load.
RSA Asynchronous Proxy Computation
The most CPU‑intensive part of HTTPS is the RSA calculation during key exchange, so the optimization focuses on separating this step, using parallel hardware acceleration, and making the computation asynchronous.
Algorithm Separation
Only RSA‑based key‑exchange algorithms (ECDHE_RSA, DHE_RSA, RSA) are targeted. The RSA signing step uses a 2048‑bit private key, which dominates CPU usage.
Parallel Computing
Reduce the time of a single request by using higher‑frequency CPUs or dedicated accelerator cards.
Increase concurrent request handling by employing multiple CPUs or accelerator cards, achieving up to a 4× throughput gain.
Asynchronous Request Handling
Nginx normally blocks until OpenSSL finishes ServerKeyExchange or the premaster secret decryption. By offloading RSA_sign to another CPU or hardware and returning immediately, Nginx can continue processing other requests.
Nginx receives a request and calls RSA_sign .
RSA_sign invokes RSA_private_encrypt and returns without waiting for the signature result.
Nginx proceeds with other work.
RSA_private_encrypt performs the costly modular exponentiation using the private key.
The heavy computation runs on a separate CPU/accelerator, so the local CPU is not blocked.
Engineering Implementation Challenges
Working with OpenSSL and Nginx core code requires deep knowledge of SSL/TLS (RFC 5246, 5280, 4492), PKI, ECC, and the massive, legacy OpenSSL codebase (over 500 k lines, inconsistent style, extensive macros).
OpenSSL Stack Refactoring
Because OpenSSL only supports synchronous RSA, the stack must be modified to allow asynchronous delegation. After evaluating forks, BoringSSL was rejected due to limited compatibility, and LibreSSL was rejected because its ECDHE performance is ~¼ of OpenSSL. The project ultimately stayed with OpenSSL.
Nginx Event Mechanism Refactoring
Nginx has 11 processing phases, but custom modules can only hook into 7 after HTTP headers are parsed, preventing intervention in the TLS handshake. Therefore, the core event code ( ngx_ssl_handshake in src/event/ngx_event_openssl.c ) was extended to invoke the asynchronous RSA proxy without altering existing logic.
Performance Results
RSA asynchronous proxy raised Nginx ECDHE_RSA full‑handshake throughput from ~18 000 qps to ~65 000 qps (≈3.5× improvement).
Symmetric Encryption Optimizations
For large payloads, symmetric ciphers dominate; asynchronous offloading is unsuitable. Instead, hardware acceleration is used: Intel AES‑NI instructions provide ~20% speedup (4.3 W → 5.1 W). The aes-ni: OPENSSL_ia32cap="~0x200000200000000" openssl speed -elapsed -evp environment variable can enable testing. Additionally, ChaCha20‑Poly1305 offers >30% improvement on platforms without AES‑NI.
Session Resume Enhancements
Reducing full handshakes further improves performance. Two mechanisms are discussed:
Session cache: server stores a session ID (32‑48 bytes) and reuses it on subsequent connections, saving one RTT.
Session ticket (RFC 5077): server issues an encrypted ticket, eliminating server‑side state; requires a shared key across distributed Nginx instances.
Both mechanisms are compared in a table of mechanisms, advantages, and drawbacks.
Conclusions
Increase session‑resume ratio via distributed session cache and tickets to cut full handshakes.
Asynchronous RSA proxy boosts SSL handshake throughput ~3.5×, reducing hardware costs.
Adopt high‑performance symmetric ciphers (AES‑GCM, ChaCha20‑Poly1305) and enable AES‑NI.
Tencent Architect
We share insights on storage, computing, networking and explore leading industry technologies together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.