Operations 9 min read

Key Takeaways from the 58 Group Technical Salon on Monitoring Platforms

The article summarizes the 58 Group technical salon where experts from Momo and 58 shared practical experiences on monitoring platform architectures, coverage, alarm configurations, convergence techniques, custom dimensions, multi‑view dashboards, and future directions for intelligent and automated monitoring across the company.

58 Tech
58 Tech
58 Tech
Key Takeaways from the 58 Group Technical Salon on Monitoring Platforms

Background

On November 2, 2018, the 58 Group Technical Salon (Session 2 – "Monitoring Platform") was held at the Beijing headquarters, organized jointly by the Technology Engineering Platform Group and the Human Resources Department's Magic Academy. Speakers from Momo's Technical Assurance Department, 58's Framework Component Department, and 58's System Operations Monitoring Team shared their monitoring practices.

1. Momo Monitoring Platform

The platform aims to effectively monitor online services, quickly locate issues, and provide early warnings of service health through large‑scale data collection and real‑time computation, delivering high availability, accuracy, real‑time performance, high coverage, and good user experience.

System Architecture

(Figure provided by Momo)

Monitoring Coverage

Java services use an SDK provided by the platform, while other languages use agents. Coverage spans client, network, CDN, DNS, Nginx, micro‑services, RPC, middleware, DB, process, container, and hardware layers.

Alarm Configuration

Alarms are configured via strategy templates: default templates for common metrics, guided templates for metric‑heavy clusters, and custom strategy groups for business‑specific needs.

Alarm Strategies

Basic strategies include threshold, segment‑length, and same‑period comparisons. Extended strategies cover continuous N‑times alarms, composite strategies, multi‑metric calculations, and sliding‑window variance detection.

Alarm Convergence

To avoid alarm storms, Momo applies convergence, achieving about 20% compression. Convergence is performed at granularity (service, machine, cluster, event) and level (critical, warning, notice, email), with support for custom tags and metric aggregation.

Momo’s platform has undergone three architectural upgrades and now meets full‑link monitoring needs, with future goals in intelligent monitoring and operational automation.

2. 58 Business Monitoring System – WMonitor

WMonitor is a self‑developed, generic business monitoring platform that abstracts monitoring requirements into high‑level standards, separating business logic from monitoring logic to address data aggregation, storage, visualization, and alerting.

System Architecture

Custom Monitoring Dimensions

WMonitor abstracts business monitoring into "attributes" with unique IDs, decoupling custom dimensions from the platform. Data is collected via SDK aggregation interfaces, enabling flexible monitoring of any custom dimension.

Multi‑View Dashboard

Views allow users to combine multiple attributes in a single visual panel; an attribute can belong to multiple views, supporting diverse monitoring needs.

3. 58 Monitoring System

This system, like Momo’s, provides a flexible, multi‑dimensional monitoring service across all business lines.

Key Features

Automatic baseline monitoring via agents that detect server downtime and resource overuse, synchronized with CMDB for cluster ownership.

Page and interface monitoring through periodic active probing of status codes, response times, and keywords.

Cluster availability monitoring by aggregating Nginx logs for traffic, error codes, and latency.

Intelligent traffic monitoring using machine‑learning models to forecast daily volume and detect anomalies.

Custom monitoring allowing users to develop bespoke data collection programs.

Alarm Practices

Accuracy is ensured by filtering out noise with continuous‑N‑times checks. Time‑based policies allow different thresholds for day and night. Synchronization propagates configuration changes (servers, ports, processes) to suppress alerts during deployments. Real‑time pipelines (Kafka + Storm) guarantee timely detection and notification. Convergence techniques limit repeated alerts, control intervals, cap repeat counts, provide recovery notices, and assign severity levels (p0‑p6). Escalation mechanisms upgrade alerts (e.g., SMS → voice) and notify leaders after 30 minutes of persistence.

Conclusion

Monitoring platforms are foundational services within the company; both Momo and 58 have achieved extensive coverage, multi‑dimensional support, and user‑friendly alert displays. Future work will focus on further reducing alert noise while delivering complete information and advancing intelligent, automated monitoring.

Next Salon Preview

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

monitoringSystem ArchitectureOperationsObservabilityDevOpsAlerting
58 Tech
Written by

58 Tech

Official tech channel of 58, a platform for tech innovation, sharing, and communication.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.