Key Takeaways from the 58 Group Technical Salon on Monitoring Platforms
The article summarizes the 58 Group technical salon where experts from Momo and 58 shared practical experiences on monitoring platform architectures, coverage, alarm configurations, convergence techniques, custom dimensions, multi‑view dashboards, and future directions for intelligent and automated monitoring across the company.
Background
On November 2, 2018, the 58 Group Technical Salon (Session 2 – "Monitoring Platform") was held at the Beijing headquarters, organized jointly by the Technology Engineering Platform Group and the Human Resources Department's Magic Academy. Speakers from Momo's Technical Assurance Department, 58's Framework Component Department, and 58's System Operations Monitoring Team shared their monitoring practices.
1. Momo Monitoring Platform
The platform aims to effectively monitor online services, quickly locate issues, and provide early warnings of service health through large‑scale data collection and real‑time computation, delivering high availability, accuracy, real‑time performance, high coverage, and good user experience.
System Architecture
(Figure provided by Momo)
Monitoring Coverage
Java services use an SDK provided by the platform, while other languages use agents. Coverage spans client, network, CDN, DNS, Nginx, micro‑services, RPC, middleware, DB, process, container, and hardware layers.
Alarm Configuration
Alarms are configured via strategy templates: default templates for common metrics, guided templates for metric‑heavy clusters, and custom strategy groups for business‑specific needs.
Alarm Strategies
Basic strategies include threshold, segment‑length, and same‑period comparisons. Extended strategies cover continuous N‑times alarms, composite strategies, multi‑metric calculations, and sliding‑window variance detection.
Alarm Convergence
To avoid alarm storms, Momo applies convergence, achieving about 20% compression. Convergence is performed at granularity (service, machine, cluster, event) and level (critical, warning, notice, email), with support for custom tags and metric aggregation.
Momo’s platform has undergone three architectural upgrades and now meets full‑link monitoring needs, with future goals in intelligent monitoring and operational automation.
2. 58 Business Monitoring System – WMonitor
WMonitor is a self‑developed, generic business monitoring platform that abstracts monitoring requirements into high‑level standards, separating business logic from monitoring logic to address data aggregation, storage, visualization, and alerting.
System Architecture
Custom Monitoring Dimensions
WMonitor abstracts business monitoring into "attributes" with unique IDs, decoupling custom dimensions from the platform. Data is collected via SDK aggregation interfaces, enabling flexible monitoring of any custom dimension.
Multi‑View Dashboard
Views allow users to combine multiple attributes in a single visual panel; an attribute can belong to multiple views, supporting diverse monitoring needs.
3. 58 Monitoring System
This system, like Momo’s, provides a flexible, multi‑dimensional monitoring service across all business lines.
Key Features
Automatic baseline monitoring via agents that detect server downtime and resource overuse, synchronized with CMDB for cluster ownership.
Page and interface monitoring through periodic active probing of status codes, response times, and keywords.
Cluster availability monitoring by aggregating Nginx logs for traffic, error codes, and latency.
Intelligent traffic monitoring using machine‑learning models to forecast daily volume and detect anomalies.
Custom monitoring allowing users to develop bespoke data collection programs.
Alarm Practices
Accuracy is ensured by filtering out noise with continuous‑N‑times checks. Time‑based policies allow different thresholds for day and night. Synchronization propagates configuration changes (servers, ports, processes) to suppress alerts during deployments. Real‑time pipelines (Kafka + Storm) guarantee timely detection and notification. Convergence techniques limit repeated alerts, control intervals, cap repeat counts, provide recovery notices, and assign severity levels (p0‑p6). Escalation mechanisms upgrade alerts (e.g., SMS → voice) and notify leaders after 30 minutes of persistence.
Conclusion
Monitoring platforms are foundational services within the company; both Momo and 58 have achieved extensive coverage, multi‑dimensional support, and user‑friendly alert displays. Future work will focus on further reducing alert noise while delivering complete information and advancing intelligent, automated monitoring.
Next Salon Preview
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
58 Tech
Official tech channel of 58, a platform for tech innovation, sharing, and communication.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
