Databases 26 min read

How Suning Scaled with PostgreSQL & Citus: Lessons from 3000+ Instances

This article details Suning's journey of replacing commercial databases with PostgreSQL and Citus, covering background, feature and performance comparisons, reliability tests, deployment architecture, real‑world use cases, operational tips, and a comprehensive Q&A on migration and maintenance.

dbaplus Community
dbaplus Community
dbaplus Community
How Suning Scaled with PostgreSQL & Citus: Lessons from 3000+ Instances

Background

Suning began adopting PostgreSQL in 2017. By the 2019 Double‑11 shopping festival more than 3,000 PostgreSQL instances were running across various business lines, replacing many commercial databases to reduce licensing costs and improve performance.

Database Selection

Suning’s 2016 research project evaluated MySQL and PostgreSQL as alternatives to foreign commercial databases. PostgreSQL was chosen because it provides richer SQL features, better performance on large data volumes, and functional parity with Oracle/DB2, while MySQL is limited to simple OLTP workloads.

Feature & Performance Comparison

Sysbench high‑concurrency primary‑key query: PostgreSQL and MySQL performed similarly.

Sysbench high‑concurrency pure‑write test: PostgreSQL achieved roughly 4× the throughput of MySQL.

TPCH benchmark (MySQL 5.6 vs PostgreSQL 9.5): PostgreSQL outperformed MySQL by an order of magnitude.

Reliability Comparison

Physical replication in PostgreSQL delivered higher write TPS (≈10 w/s vs. 1 w/s for MySQL) and eliminated replication lag. PostgreSQL’s checkpoint and WAL design also enabled fast recovery after power loss.

Selection Conclusion

Suning adopted a hybrid strategy: MySQL + MyCAT for simple OLTP scenarios and PostgreSQL + Citus for workloads requiring horizontal scaling, complex queries, or OLAP capabilities.

PostgreSQL & Citus Deployment

Business Scenario

The first PostgreSQL‑based system processed real‑time large‑scale data (>50 k ops/s, 90 % updates) on a wide table, supporting high‑concurrency reporting and detail queries.

Technical Selection

Citus was selected to turn multiple single‑node PostgreSQL instances into a distributed cluster, providing horizontal scalability.

Deployment Architecture

Detail‑table updates bypass the coordinator. The coordinator first determines record locations, then workers execute batch INSERT … ON CONFLICT statements directly on the target shards.

Citus deployment diagram
Citus deployment diagram

Production Results

Replacing a single‑node commercial database with a 1 CN + 8 Worker Citus cluster yielded >10× performance improvement, meeting a ten‑fold capacity expansion goal. The cluster later scaled to 16 workers while maintaining ~10 % CPU utilization.

Performance chart
Performance chart

Representative Use Cases (Citus)

Case 1 – Billing Settlement

Logistics billing platform with hundreds of nodes and tables exceeding hundreds of billions of rows. Citus eliminated custom sharding logic and reduced operational overhead.

Billing settlement
Billing settlement

Case 2 – Precise Marketing

Combined Citus with the pg_roaringbitmap extension to store and query billions of tag records efficiently for real‑time marketing.

Marketing case
Marketing case

Case 3 – Geo‑Based Search & Recommendation

PostgreSQL + PostGIS stored location data and supported high‑performance queries for offline store search, scaling up to 11 nodes during peak traffic.

Geo search
Geo search

Operational Experience

Issue 1 – GIN Index

GIN indexes support flexible multi‑field queries and use little storage for low‑cardinality columns, outperforming B‑tree in many scenarios. Drawbacks include poor range‑query performance, higher write overhead, and lack of support for unique constraints or prefix‑LIKE optimization.

GIN index comparison
GIN index comparison

Issue 2 – Backend Memory Consumption

Each PostgreSQL backend process holds private memory for metadata (e.g., pg_statistic, pg_attribute). Large numbers of tables, partitions, or columns increase this memory. The following command can inspect a backend’s memory usage:

gdb --batch-silent -ex 'call MemoryContextStatsDetail(TopMemoryContext,100)' -p ${backend_pid}
Memory usage diagram
Memory usage diagram

Mitigation strategies:

Reduce application pool size.

Deploy pgbouncer as a connection pooler.

Limit the number of partitions.

Isolate connections that access many partitions.

Adjust ALTER TABLE SET STATISTICS to lower statistics collection.

Issue 3 – Garbage Collection (VACUUM)

PostgreSQL uses MVCC with physical row deletion and a background autovacuum process. Tuning key parameters improves cleanup speed: autovacuum_vacuum_cost_limit – increase to ~10000 on SSDs (default 200). autovacuum_vacuum_scale_factor – lower for very large tables (e.g., from 0.2 to 0.05).

Partition large tables (≤1 billion rows or ≤10 GB) to keep autovacuum work manageable.

Autovacuum tuning
Autovacuum tuning

Q&A (Technical Highlights)

Recovery vs. Oracle: PostgreSQL supports point‑in‑time recovery and integrates with tools such as pg_rman and Barman, providing capabilities comparable to Oracle.

HA Solution: Native streaming replication with a virtual IP (VIP) for read/write traffic; read‑only traffic uses JDBC multi‑host URLs for load balancing.

Table Partitioning: Keep tables under 1 billion rows or 10 GB. Use pg_pathman on PostgreSQL 10 and earlier, or native partitioning from PostgreSQL 12+. For sharding, use Citus.

MySQL → PostgreSQL Migration: Use mysql_fdw for foreign‑data access, ensure UTF‑8 encoding, and strip illegal \0000 Unicode characters.

Citus HA: Two options – multi‑replica sharding (writes to multiple worker replicas) or native PostgreSQL streaming replication for both coordinator and workers.

max_locks_per_transaction: Default value 64 is sufficient for most workloads; increasing it unnecessarily raises memory usage.

Pacemaker‑based HA: Requires fencing devices to avoid split‑brain scenarios because metadata synchronization is not strongly consistent.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

performance benchmarkPostgreSQLdatabase migrationCitusOperational Experience
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.