Operations 7 min read

Stability Governance of Xianyu Messaging System

Since launching a systematic stability‑governance program in August 2022, Xianyu’s messaging system has employed gray releases, dedicated monitoring, daily automated regression, dependency reviews and drills, resulting in near‑zero online incidents within six months and demonstrating that continuous, context‑specific measures and vigilant change management are essential for reliable C2C transactions.

Xianyu Technology
Xianyu Technology
Xianyu Technology
Stability Governance of Xianyu Messaging System

Introduction

Xianyu, a C2C e‑commerce platform, relies on its messaging system to build trust between buyers and sellers. System stability directly impacts user experience and transaction efficiency. In August 2022 the team launched a systematic stability‑governance program.

Problem Definition

The goal is to reduce online incidents. Issues were classified into high‑risk/high‑probability problems (e.g., change risk, weak dependency risk) and deep‑water problems with high remediation cost (e.g., strong dependency risk, architectural flaws). Corresponding measures include gray‑release, monitoring & alerts, automated regression, dependency management, drills, and refactoring.

Problem Governance

Gray Release

A “safe‑production” environment receives 1% live traffic plus full internal traffic, providing a closed‑loop for validation. MQ topics were isolated via Spring Conditional beans to keep traffic within the safe environment.

Monitoring Alerts

Separate monitoring dashboards, alert thresholds, and offline reports were created for the safe‑production environment, covering request volume, latency, error rates, and message delay. Continuous review of coverage, timeliness, and effectiveness ensures long‑term alert health.

Automated Regression

End‑to‑end regression tests are integrated into CI/CD and run daily. Interface‑level traffic replay using the Phoenix tool (built on JVMTI) records and replays RPC traffic to verify stability.

Dependency Governance

Dependencies are reviewed at code level, unnecessary strong dependencies are downgraded to weak ones, and dedicated monitoring and rapid‑recovery plans are added. Dependency drills validate the expected behavior of strong and weak links.

Conclusion

Six months after implementation, online incidents have approached zero. The experience shows that stability governance requires focused, context‑specific measures, continuous investment, and a vigilant mindset toward every change.

Monitoringautomationgray-releasedependency managementMessagingstability
Xianyu Technology
Written by

Xianyu Technology

Official account of the Xianyu technology team

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.