Backend Development 12 min read

Performance Optimization of Beike IM: Scaling Group Chat to Over 300 QPS

This article details how the Beike instant‑messaging system was analyzed, bottlenecks identified, and a series of backend optimizations—including business isolation, increased concurrency, computation reduction, and Redis connection‑pool redesign—were applied to boost 300‑person group‑chat throughput from 15 QPS to over 320 QPS, achieving more than a twenty‑fold performance gain.

Beike Product & Technology
Beike Product & Technology
Beike Product & Technology
Performance Optimization of Beike IM: Scaling Group Chat to Over 300 QPS

Beike IM provides over 70% of the online business opportunities for Beike Real Estate, handling millions of daily conversations with real‑time, ordered, reliable, consistent, and secure messaging requirements. The article explains the optimization process for single‑chat and group‑chat messages.

Background : A demand for promotional activities in large group chats (300 participants, 100 QPS target) revealed that the system stalled at 15 QPS due to Redis queue back‑pressure, prompting a performance‑improvement effort.

System Overview : Messages are sent via an HTTP API, placed into a sending queue, and then processed by a delivery service that writes to each user’s inbox (Redis ZSET), persists to a history store, and notifies recipients via long‑connection or push services. The write‑amplification effect (300 users × 300 writes = 90 000 QPS) is a key bottleneck.

Optimization Measure 1 – Business Isolation : Separate single‑chat, group‑chat, and public‑account streams, vertically splitting delivery pipelines to isolate high‑priority single‑chat traffic.

Optimization Measure 2 – Increase Concurrency : The delivery service processes messages in two stages using goroutine pools (256 goroutines in stage 1, 1 000 in stage 2) and channels for communication. By moving the relatively heavy unread‑count update from stage 1 to stage 2, the system achieved a 5× QPS increase (up to 75 QPS) for 300‑person groups.

Optimization Measure 3 – Reduce Computation : Consolidated repeated data fetches (sender info, group info, do‑not‑disturb settings) by caching them once per batch and adding a reverse mapping for group‑wide do‑not‑disturb flags, resulting in a further 2× boost (150 QPS) and a total 10× improvement over the original baseline.

Optimization Measure 4 – Redis Connection‑Pool Redesign : Identified that the legacy imredis pool created new connections when the pool was exhausted, causing excessive CPU usage. A new pool with a ten‑times larger backup pool and token‑bucket‑limited connection creation was implemented. The Go code for the original Get method is shown below:

// Get retrieves an available redis client. If there are none available it will create a new one on the fly func (p *Pool) Get() (*redis.Client, error) { select { case conn := <-p.pool: return conn, nil default: return p.df(p.Network, p.Addr, p.Auth, p.Db, p.Timeout) } }

After the redesign, load tests demonstrated that a 300‑person group could handle around 320 QPS, a 20× improvement compared with the end‑of‑2022 baseline, and single‑chat throughput rose from 2 000 to 12 000 QPS.

Conclusion : Systematic performance tuning—identifying bottlenecks, isolating workloads, increasing parallelism, eliminating redundant computation, and fixing connection‑pool inefficiencies—enabled Beike IM to meet current and near‑future traffic demands, with further scalability achievable by expanding Redis clusters and delivery services.

backendPerformance OptimizationscalabilityRedisGoIMMessaging
Beike Product & Technology
Written by

Beike Product & Technology

As Beike's official product and technology account, we are committed to building a platform for sharing Beike's product and technology insights, targeting internet/O2O developers and product professionals. We share high-quality original articles, tech salon events, and recruitment information weekly. Welcome to follow us.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.