Tagged articles
17 articles
Page 1 of 1
Pan Zhi's Tech Notes
Pan Zhi's Tech Notes
Mar 14, 2025 · Cloud Native

Integrating Hystrix Dashboard with Spring Cloud for Visual Service Circuit Breaker Monitoring – A Hands‑On Guide

This article walks through the background of Hystrix, shows how to add Hystrix Dashboard and Turbine dependencies to a Spring Cloud consumer project, configure the necessary annotations and properties, test the services, and use the dashboard and Turbine streams to visualize circuit‑breaker metrics for both single instances and clusters.

HystrixHystrix DashboardSpring Cloud
0 likes · 12 min read
Integrating Hystrix Dashboard with Spring Cloud for Visual Service Circuit Breaker Monitoring – A Hands‑On Guide
vivo Internet Technology
vivo Internet Technology
Feb 19, 2025 · Backend Development

vivo HTTPDNS End-to-End Integrated Solution: Architecture, Optimizations, and Business Impact

vivo’s end‑to‑end HTTPDNS solution integrates a client SDK, high‑performance service, unified scheduling gateway, and full‑link monitoring, cutting resolution latency by 36%, boosting DNS success to 99.85%, handling 1.5 billion queries daily, and preventing hijacking and misblocking across its ecosystem.

Backend DevelopmentDNS ResolutionHTTPDNS
0 likes · 19 min read
vivo HTTPDNS End-to-End Integrated Solution: Architecture, Optimizations, and Business Impact
NetEase Cloud Music Tech Team
NetEase Cloud Music Tech Team
Nov 7, 2023 · Operations

How NetEase Cloud Music Built Pylon APM: A Deep Dive into Tracing, Metrics, and Automated Diagnosis

This article details the design and implementation of the Pylon APM monitoring platform for NetEase Cloud Music, covering background challenges, the choice of Pinpoint, extensions to trace models, tail‑based exception sampling, Prometheus integration, automated JStack collection, and the resulting APM product features.

APMBackendJava Agent
0 likes · 12 min read
How NetEase Cloud Music Built Pylon APM: A Deep Dive into Tracing, Metrics, and Automated Diagnosis
Java High-Performance Architecture
Java High-Performance Architecture
Jan 25, 2022 · Cloud Native

Why Is Debugging Microservices on Kubernetes So Hard? Proven Strategies to Overcome It

Debugging microservices in a Kubernetes environment is challenging due to the abstraction of pods, network complexities, infrastructure issues, and application-level faults, but by monitoring at the service layer, aggregating data, and applying machine‑learning‑based anomaly detection, teams can effectively identify and resolve problems.

KubernetesMicroservicesmachine learning
0 likes · 6 min read
Why Is Debugging Microservices on Kubernetes So Hard? Proven Strategies to Overcome It
Ops Development Stories
Ops Development Stories
May 25, 2021 · Backend Development

Mastering Sentinel Console: Real‑World Guide to Machine Monitoring, Flow Control, and Rule Configuration

This comprehensive tutorial walks you through Sentinel's console features, from environment setup and machine health monitoring to real‑time service metrics, flow‑control, degrade, hotspot, authorization, system, and cluster rules, including practical code examples and configuration tips for Spring Cloud Alibaba applications.

flow-controlsentinelservice monitoring
0 likes · 31 min read
Mastering Sentinel Console: Real‑World Guide to Machine Monitoring, Flow Control, and Rule Configuration
vivo Internet Technology
vivo Internet Technology
Apr 21, 2021 · Operations

System Health Check: Principles and Implementation

System health checks, akin to medical exams, are vital for modern IT infrastructure, using active and passive monitoring, failover strategies, and tools like Spring Boot Actuator to detect hardware, network, load, or software issues, prevent single points of failure, and ensure continuous high‑availability service operation.

Network ReliabilityRocketMQSpring Boot Actuator
0 likes · 12 min read
System Health Check: Principles and Implementation
NetEase Smart Enterprise Tech+
NetEase Smart Enterprise Tech+
Feb 4, 2021 · Operations

How NetEase Cloud Communication Builds a Real-Time Service Monitoring Platform

NetEase Cloud Communication’s service monitoring platform leverages data collection, preprocessing, alerting, and visualization pipelines—using HTTP APIs, Kafka, custom scripts, and NTSDB—to provide real-time insights, ensure stability, and support scalable, high‑throughput audio‑video services.

Operationscloud communicationdata pipeline
0 likes · 11 min read
How NetEase Cloud Communication Builds a Real-Time Service Monitoring Platform
IT Architects Alliance
IT Architects Alliance
Sep 14, 2020 · Operations

Implementation of Service Chain Monitoring and End-to-End Process Monitoring

This article explains how to design and implement service‑chain (APM) monitoring and end‑to‑end process monitoring in distributed systems, covering concepts such as spans and traces, TRACE_ID generation, logging practices, visualisation techniques, and a practical expense‑report use case with code examples.

APMDistributed TracingMicroservices
0 likes · 15 min read
Implementation of Service Chain Monitoring and End-to-End Process Monitoring
Yanxuan Tech Team
Yanxuan Tech Team
May 25, 2020 · Operations

How NetEase Cloud Music Built a Scalable Full‑Link Tracing System for Real‑Time Service Diagnosis

This article details the design, implementation, and evolution of NetEase Cloud Music's full‑link tracing platform, covering its motivations, architecture, low‑overhead data collection, multi‑dimensional analysis, service grooming, automated diagnosis, and future plans for AI‑driven anomaly detection and big‑data processing.

Distributed SystemsObservabilityservice monitoring
0 likes · 19 min read
How NetEase Cloud Music Built a Scalable Full‑Link Tracing System for Real‑Time Service Diagnosis
Efficient Ops
Efficient Ops
Apr 1, 2020 · Operations

How to Use Nagios for Business-Level Service Monitoring: A Step-by-Step Guide

This article explains why traditional server and service monitoring (e.g., Zabbix) may miss business outages, then walks through setting up Nagios on Debian to monitor web page URLs, API health checks, and related services, including configuration files, plugins, and a desktop alert tool, Nagstamon.

NagiosOpsbusiness availability
0 likes · 18 min read
How to Use Nagios for Business-Level Service Monitoring: A Step-by-Step Guide
Qunar Tech Salon
Qunar Tech Salon
Sep 5, 2018 · Operations

Tencent SNG Operations: Business Profiling for Capacity Planning, Activity Modeling, and Multi‑Region Deployment

The article explains how Tencent's SNG operations team uses business profiling—including capacity, activity, core‑link, and SET models—to address performance testing across device types, forecast activity‑driven resource needs, identify core versus peripheral services, and plan reliable multi‑region deployments.

Operationsbusiness profilingcapacity planning
0 likes · 9 min read
Tencent SNG Operations: Business Profiling for Capacity Planning, Activity Modeling, and Multi‑Region Deployment
Meituan Technology Team
Meituan Technology Team
May 31, 2018 · Operations

High‑Availability Practices for Account Services at Meituan/Dianping

Meituan/Dianping ensures its critical account service stays online by combining real‑time business monitoring, circuit‑breaker‑driven graceful degradation, and active‑active cross‑region deployment with isolated dependencies, versioned data sync, and automated cache updates, dramatically extending MTBF while cutting MTTR and latency.

data synchronizationfault tolerancehigh availability
0 likes · 13 min read
High‑Availability Practices for Account Services at Meituan/Dianping
dbaplus Community
dbaplus Community
Oct 10, 2017 · Operations

How to Build Effective Service Monitoring: Principles, Practices, and Technical Implementation

This article explains why service monitoring is essential for large‑scale microservice environments, outlines design principles, core monitoring components, dependency mapping, call‑chain analysis, capacity planning, root‑cause analysis, and presents a practical technical architecture for implementing robust monitoring solutions.

Distributed TracingOperationscapacity planning
0 likes · 12 min read
How to Build Effective Service Monitoring: Principles, Practices, and Technical Implementation
dbaplus Community
dbaplus Community
Nov 28, 2016 · Backend Development

Adaptive Service Monitoring and Self‑Healing Calls for Microservices

This article explains how to implement service context monitoring and runtime awareness, capture performance metrics via automatic discovery, transmit data through heartbeats or message queues, and apply adaptive mechanisms such as load balancing, circuit breaking, retry and isolation to achieve resilient microservice communication.

Adaptive routingCircuit Breakingload balancing
0 likes · 17 min read
Adaptive Service Monitoring and Self‑Healing Calls for Microservices