Backend Development 17 min read

Cloudflare’s Pingora: A High‑Performance Rust‑Based Reverse Proxy Replacing Nginx

Cloudflare replaced Nginx with Pingora, a Rust‑written reverse‑proxy service that delivers higher speed, better efficiency, and stronger security by redesigning the worker model, using multithreading, and optimizing connection reuse, resulting in significant CPU, memory, and latency savings across billions of requests.

Architecture Digest
Architecture Digest
Architecture Digest
Cloudflare’s Pingora: A High‑Performance Rust‑Based Reverse Proxy Replacing Nginx

Introduction

Today we are pleased to introduce Pingora, a new HTTP proxy built in Rust that handles over one trillion requests per day, improves performance, adds many new features for Cloudflare customers, and consumes only one‑third of the CPU and memory of the previous proxy infrastructure.

As Cloudflare grew, we outpaced NGINX’s capabilities. Although NGINX performed well for years, its architectural limits at our scale forced us to build something new because we could no longer achieve the performance or features required for our complex environment.

Many Cloudflare customers use the global network as a proxy between HTTP clients (browsers, apps, IoT devices, etc.) and origin servers. We have invested heavily in protocols such as QUIC and HTTP/2 to make that connection more efficient.

In this article we focus on the other side of the equation: the service that proxies traffic between our network and the servers on the Internet, powering CDN, Workers fetch, Tunnel, Stream, R2, and many other products.

We will examine why we chose to replace the legacy service and describe the development process of Pingora, a system designed specifically for Cloudflare’s use cases and scale.

Why Build a New Proxy

Over the years we encountered limitations with NGINX. Some were mitigated by workarounds, but others proved difficult to overcome.

Architectural limits hurt performance

NGINX’s worker‑process model caused operational defects for our workloads, harming performance and efficiency.

Each request in NGINX is handled by a single worker, leading to load imbalance across CPU cores and slower overall speed.

This request‑process locking effect means CPU‑intensive or I/O‑blocking requests can slow down other requests. We spent considerable time trying to resolve these issues.

The most critical problem for us was poor connection reuse. NGINX’s connection pool is tied to a single worker; when a request arrives at a worker it can only reuse connections owned by that worker. Adding more workers for scaling further fragments the connection pool, reducing reuse rates, increasing TTFB, and consuming more resources and money.

We have already addressed some of these problems in past blog posts, but solving the fundamental issue—replacing the worker/process model—would naturally resolve all of them.

Some features are hard to add

NGINX is an excellent web server, load balancer, and simple gateway, but Cloudflare’s needs go far beyond that. Extending NGINX without diverging too much from its upstream codebase proved difficult.

For example, retrying failed requests sometimes requires sending them to a different origin with a different set of request headers, which NGINX does not support.

NGINX is written in C, which is not memory‑safe, making it easy to introduce subtle bugs even for experienced engineers. We also supplemented C with Lua, which has lower performance and lacks static typing, and the NGINX community is relatively closed.

Choosing to build our own

Over the past few years we evaluated three options:

Continue investing in NGINX, paying for customizations to meet 100% of our requirements.

Migrate to another third‑party proxy project such as Envoy, which risks repeating the same cycle in a few years.

Build an internal platform and framework from scratch, requiring the largest upfront engineering investment.

Quarterly reviews showed no clear formula for the best choice; eventually the ROI of a custom proxy appeared more compelling, and we decided to design and implement our own.

Pingora Project

Design Decisions

To deliver a proxy capable of handling millions of requests per second while remaining fast, efficient, and secure, we made several key design choices.

We selected Rust as the implementation language because it offers the performance of C with memory‑safety guarantees.

Although excellent third‑party HTTP libraries like hyper exist, we built our own to maximize flexibility in handling HTTP traffic and to innovate at our own pace.

Operating at Internet scale forces us to support many non‑RFC‑compliant HTTP cases. Balancing strict protocol adherence with real‑world client and server quirks required a tolerant, customizable HTTP library.

We needed a robust, extensible HTTP stack that could survive the diverse risk environments of the global Internet.

For workload scheduling we chose a multithreaded model (instead of multiprocess) to share resources easily, especially connection pools, and we implemented work‑stealing to avoid performance pitfalls. Tokio’s async runtime fit our needs perfectly.

We aimed for an intuitive, developer‑friendly platform that could be extended with additional features, not just a final product.

We implemented a programmable request‑lifecycle API similar to NGINX/OpenResty, allowing developers to run code at stages such as a request‑filter to modify or reject requests, cleanly separating business logic from generic proxy logic.

Pingora Is Faster in Production

Pingora now handles almost all HTTP requests that need to talk to origin servers (e.g., cache misses). Performance data shows a median TTFB reduction of 5 ms and a 95th‑percentile reduction of 80 ms compared to the previous service.

The speed gains come from the new architecture that shares connections across all threads, improving connection reuse and reducing time spent on TCP/TLS handshakes.

Across all customers, Pingora creates only one‑third as many new connections per second as the old service. For a major customer, connection reuse rose from 87.1 % to 99.92 %, cutting new connections by a factor of 160 and saving the equivalent of 434 years of handshake time per day.

New features can be added more quickly because the developer‑friendly interface removes previous limitations. For example, adding HTTP/2 upstream support (enabling gRPC) required far less engineering effort than it would have in NGINX.

We also introduced Cache Reserve, using R2 storage as a caching layer, and continue to add capabilities that were previously infeasible.

More Efficient

In production, Pingora consumes roughly 70 % less CPU and 67 % less memory than the legacy service under the same traffic load.

Rust code runs more efficiently than the previous Lua code, and architectural differences further improve performance. In NGINX/OpenResty, Lua must read HTTP headers from C structures, allocate a Lua string, copy the data, and later garbage‑collect it. Pingora accesses header strings directly.

The multithreaded model enables efficient cross‑request data sharing. While NGINX also has shared memory, each access requires a mutex and can only store strings or numbers. Pingora can share most items via atomic reference‑counted pointers.

Another major CPU saving comes from fewer new connections, as TLS handshakes are expensive compared to sending data over existing connections.

More Secure

At our scale, releasing features quickly and safely is challenging. Fuzz testing and static analysis can only mitigate a fraction of edge cases that arise when processing millions of requests per second.

Rust’s memory‑safety semantics protect us from undefined behavior, giving confidence that the service runs correctly.

With these guarantees, engineers can focus on how service changes interact with other services or client sources, developing features at a higher pace without the burden of memory‑safety bugs or hard‑to‑diagnose crashes.

Since Pingora’s inception, we have processed quadrillions of requests without a single crash caused by our service code. When crashes do occur, they are rarely related to our software; recent incidents were traced to kernel bugs or hardware issues.

Conclusion

In summary, we have built a faster, more efficient, and more versatile internal proxy that serves as a platform for our current and future products.

Future posts will dive deeper into the technical challenges, optimization techniques, and lessons learned from building and deploying Pingora at Internet scale, as well as our open‑source plans.

performanceRustReverse ProxyPingoraCloudflareNginx Replacement
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.