Backend Development 25 min read

Performance Optimization Practices in Meitu XiuXiu Community

The Meitu XiuXiu community tackled rapid user‑growth‑induced performance bottlenecks by deploying end‑to‑end monitoring (client Hubble and RED‑based server metrics), full‑link load testing, DNS and image‑delivery optimizations, and server‑side tuning such as bias‑locking removal and JIT warm‑up, emphasizing user‑experience and cross‑team collaboration.

Meitu Technology
Meitu Technology
Meitu Technology
Performance Optimization Practices in Meitu XiuXiu Community

This article, derived from a presentation, shares the performance optimization experience of the Meitu XiuXiu community, which has grown rapidly within a year and presents significant challenges for performance engineering.

1. Current Situation and Challenges

Meitu XiuXiu, a popular photo editing and sharing app with over 100 million MAU, transitioned to an image‑centric community in 2018. The rapid user growth leads to three main challenges:

Traffic far exceeds that of newer products, especially during holidays and weekends.

Fast product iteration leaves little time for systematic performance tuning.

Existing technology accumulations (recommendation, advertising services, etc.) are now dependencies that affect community performance.

2. Performance Optimization Focus

The optimization must consider the entire end‑to‑end call chain, not only server‑side metrics. The core concerns are:

Integrated monitoring (client + server + infrastructure).

Full‑link load testing to reveal bottlenecks under scaled traffic.

2.1 Integrated Monitoring

The monitoring system collects data from the client request start, through service processing, to resource response. It consists of client monitoring (Hubble) and application monitoring, supplemented by basic, resource, container, and link monitoring.

2.1.1 Client Monitoring – Hubble

Hubble is a self‑developed SDK that reports real‑time user experience data. Its benefits include:

Accurate reflection of user‑perceived performance.

Guidance for performance‑related decisions (architecture, service, network).

CDN link quality monitoring and alerting.

Hubble aggregates data by region and carrier, enabling rapid fault isolation. It also tracks request volume, error rate, and latency, and breaks down end‑to‑end latency into four stages: thread‑switch wait, DNS resolution & TCP/SSL handshake, request‑response round‑trip, and data download.

2.1.2 Application Monitoring

Server‑side metrics follow the RED model (Rate, Error, Duration). Monitoring dimensions include:

Visit‑trend data (throughput, slow‑request ratio, latency distribution, error classification).

Resource profile data (per‑instance/service latency).

JVM data (memory usage, GC pause time).

Thread‑pool data (active threads, task backlog).

2.2 Full‑Link Load Testing

The load‑testing platform copies live traffic, tags it, and isolates it from production services (e.g., ads, recommendation). Mock services replace sensitive components, and test results are stored separately. Traffic is gradually increased while monitoring key metrics; the test stops if latency spikes.

3. Practical Optimization Cases

3.1 DNS Optimization – FastDNS

Issues with local DNS cache, hijacking, and cross‑carrier routing were mitigated by a client SDK (FastDNS) that pre‑resolves critical domains via LocalDNS or HttpDNS, caches results in an LRU store, and performs health checks. Tests show average DNS latency reduced by ~50% (max <200 ms) and overall HTTP latency reduced by 80–100 ms.

3.2 Image Transmission Optimization

Three aspects were addressed:

Fusion scheduling (Chaos) – multi‑CDN selection with real‑time quality scoring.

Image size reduction – evaluation of JPEG, WebP, and HEIF formats. HEIF saves ~20 % bandwidth at low resolution and up to 50 % at higher resolutions, with better iOS decoding performance. Android uses WebP due to limited HEIF support.

Client‑side pre‑loading strategies.

HEIF encoding is performed asynchronously because its encoding time is ~45× that of JPEG.

3.3 Server‑Side Optimization

A case study from early 2019 showed performance degradation after service restarts (slow‑request ratio up to 10 %). Root‑cause analysis revealed:

Fluctuating response times of Memcached, Redis, MySQL.

CPU usage doubled.

Increased Safepoint (STW) frequency.

Solutions included disabling biased locking to reduce Safepoint pauses and pre‑warming JIT compilation by routing only 10 % of traffic after a restart, allowing hotspot code to compile before full load. The slow‑request ratio dropped from 10 % to <0.5 %.

4. Summary

Performance optimization should be user‑experience driven, leveraging integrated monitoring and full‑link testing, and requires cross‑team collaboration (backend, frontend, architecture, SRE).

backendMonitoringperformance optimizationimage compressionDNS optimizationfull-link testing
Meitu Technology
Written by

Meitu Technology

Curating Meitu's technical expertise, valuable case studies, and innovation insights. We deliver quality technical content to foster knowledge sharing between Meitu's tech team and outstanding developers worldwide.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.