How a Simple PgBouncer Switch Saved Us $10 Million in Cloud Costs
When a sudden 38% rise in AWS bills revealed hidden connection‑storm costs in a Kubernetes‑based microservice architecture, the team introduced PgBouncer as a transaction‑pooling proxy, slashing database connections from over 14,000 to under 400 and cutting monthly cloud spend by more than $300,000, ultimately saving $10.8 million over three years.
Background and the Cost Spike
In a high‑traffic microservice environment running dozens of Spring Boot services, each service maintained its own HikariCP pool of connections to a shared PostgreSQL cluster. With automatic horizontal scaling in Kubernetes, every new pod added 50‑100 database connections. Deployed across four regions, the theoretical maximum connections exceeded 14,000, overwhelming the database and causing a 38% increase in the AWS cost forecast for a single month.
Root Cause Analysis
The team examined the situation: no batch jobs, no major releases, no new regions—only a slight traffic increase. Yet the infrastructure bill exploded. The hidden culprit was the uncontrolled proliferation of database connections caused by autoscaling pods.
Architecture Before the Change
┌─────────────┐
│ Service A ├─────────┐
└─────────────┘ │
┌─────────────┐ │
│ Service B ├────────────┐
└─────────────┘ │ │
┌─────────────┐ ▼ ▼
│ Service C ├──────▶ PostgreSQL
└─────────────┘ ▲ ▲
... │ │
┌─────────────┐ │ │
│ Service N ├─────────┘ │
└─────────────┘ │
Hundreds of long‑lived connectionsThis diagram shows each microservice directly opening many persistent connections to PostgreSQL, leading to resource exhaustion.
Decision: Introduce PgBouncer
Instead of rewriting services or changing the database, the team added a shared connection proxy—PgBouncer—running in transaction‑pool mode inside the Kubernetes cluster. Each service now connects to PgBouncer, which reuses connections and returns them to the pool as soon as a transaction completes.
┌──────────────┐
│ Service A ├──────┐
└──────────────┘ │
┌──────────────┐ ▼
│ Service B ├──▶ PgBouncer ──▶ PostgreSQL
└──────────────┘ ▲
... │
┌──────────────┐ │
│ Service N ├──────┘
└──────────────┘
Connection pooling handled outside the appWhy PgBouncer Worked So Well
Early connection termination : In transaction mode, connections are returned to the pool immediately after a query finishes.
Drastic reduction in open connections : From ~14,000 down to < 400 stable connections.
Protection against connection storms : No spikes during deployments or failovers.
Faster service startup : No need to wait for full pool or database handshake.
Deployment Details (Spring Boot Example)
Only the JDBC URL changes to point to PgBouncer’s default port 6432; no application code changes are required.
spring:
datasource:
url: jdbc:postgresql://pgbouncer-cluster:6432/mydb
username: myuser
password: ${DB_PASSWORD}
hikari:
maximum-pool-size: 20
minimum-idle: 5
idle-timeout: 30000
connection-timeout: 20000
max-lifetime: 600000Real‑World Benefits (Observed Metrics)
Database memory usage down 47%
Pod startup time reduced 22%
Database CPU usage under load dropped from 75% to 38%
Cluster size halved (12 nodes → 6 nodes)
Monthly cloud bill reduced by > $300,000
Over three years this translated to a total savings of $10.8 million.
When to Use a Connection‑Pool Proxy
Running more than 10 autoscaling microservices.
Database shows high memory or CPU during deployments.
Frequent “max_connections” errors.
Paying for larger database clusters just to avoid random timeouts.
Services hang at startup waiting for the database.
If any of these apply, a silent cost leak may be happening.
Key Lessons Learned
Default connection‑pool sizes can be dangerous; HikariCP’s 100‑connection default isn’t always appropriate.
Autoscaling can unintentionally break your database if connections aren’t centrally managed.
Architecture—not code or cache—is where real cost savings lie.
Invisible problems often cost the most, even if they never earn public praise.
Takeaway
You don’t need to rewrite your backend to save millions; simply examine what you’re scaling—connections versus actual throughput—and consider a pooling proxy like PgBouncer when appropriate.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
