Tagged articles

2179 articles

Page 15 of 22

Jan 6, 2021 · Mobile Development

How Zhenkun's Mobile Team Built a Scalable Componentized Architecture with Flutter

This article outlines Zhenkun's mobile engineering journey from 2020 to 2021, detailing the shift to componentized architecture, the adoption of Flutter hybrid development, CI automation, and real‑time monitoring to boost development efficiency and product quality.

ComponentizationFlutterMobile Development

0 likes · 12 min read

How Zhenkun's Mobile Team Built a Scalable Componentized Architecture with Flutter

macrozheng

Jan 6, 2021 · Backend Development

Essential Spring Boot Practices for Building Robust Microservices

This article outlines the golden rules for constructing Spring Boot microservices, covering monitoring with Spring Boot Admin and Grafana, exposing metrics via Actuator, centralized logging with ELK, clear API documentation using Swagger, YApi or smart‑doc, transparent build info, and keeping dependencies up‑to‑date.

API documentationMicroservicesSpring Boot

0 likes · 8 min read

Essential Spring Boot Practices for Building Robust Microservices

Liangxu Linux

Jan 5, 2021 · Operations

How to Install and Use nmon Monitoring Tool on CentOS 7

This guide shows how to download, extract, and run the lightweight nmon performance monitoring tool on CentOS 7, including the exact commands to fetch the package, choose the correct binary, start the utility, and view CPU and memory statistics using interactive keys.

Linuxcentos7monitoring

0 likes · 3 min read

How to Install and Use nmon Monitoring Tool on CentOS 7

Ops Development Stories

Jan 4, 2021 · Cloud Native

Integrate SkyWalking Monitoring into Nginx Ingress on Kubernetes

This guide walks through installing SkyWalking‑nginx‑lua, renaming conflicting scripts, modifying the nginx‑ingress controller’s template to inject SkyWalking environment variables and tracing buffer, building a custom Docker image, and deploying it with the required environment variables so that request traces appear in the SkyWalking UI.

DockerIngressKubernetes

0 likes · 7 min read

Integrate SkyWalking Monitoring into Nginx Ingress on Kubernetes

Architect

Jan 2, 2021 · Operations

Layered Architecture of Microservice Monitoring and Key Practices

This article explains the layered architecture of microservice monitoring, detailing five monitoring levels—from infrastructure to end-user experience—along with essential monitoring points such as logs, metrics, tracing, alerts, and health checks, and presents a typical monitoring stack using agents, Kafka, ELK, and InfluxDB.

MetricsOperationslogging

0 likes · 6 min read

Layered Architecture of Microservice Monitoring and Key Practices

MaGe Linux Operations

Jan 1, 2021 · Operations

How to Deploy Nightingale: A Step‑by‑Step Docker Guide for High‑Availability Monitoring

This article provides a comprehensive, step‑by‑step tutorial for installing the open‑source Nightingale monitoring platform using Docker, covering code retrieval, Docker‑compose setup, node configuration, service startup, Grafana integration, and essential UI features, enabling a high‑availability, hybrid‑cloud monitoring solution.

DockerGrafanaKubernetes

0 likes · 7 min read

How to Deploy Nightingale: A Step‑by‑Step Docker Guide for High‑Availability Monitoring

Youzan Coder

Dec 30, 2020 · Operations

ERROR Log Governance and Monitoring Alerting Practice at Youzan

Youzan’s log‑governance guide uses a car‑dashboard analogy to show why precise ERROR logs and sensible alerts matter, defines INFO/WARN/ERROR levels, sets daily reduction targets, leverages top‑error analysis and water‑level monitoring, and ultimately cut daily ERROR entries from thousands to about one hundred while catching issues before incidents.

AlertingError HandlingLog Management

0 likes · 9 min read

ERROR Log Governance and Monitoring Alerting Practice at Youzan

Architecture Digest

Dec 30, 2020 · Databases

Redis Latency Analysis and Mitigation Strategies

This article examines common causes of increased latency in Redis—including high‑complexity commands, large keys, concentrated expirations, memory limits, fork overhead, CPU binding, AOF settings, swap usage, and network saturation—and provides practical monitoring and configuration techniques to diagnose and reduce delays.

Latencymonitoringoptimization

0 likes · 17 min read

Redis Latency Analysis and Mitigation Strategies

Programmer DD

Dec 27, 2020 · Databases

Build a Powerful MySQL Monitoring Platform with Prometheus and Grafana

This guide walks through building a comprehensive MySQL monitoring platform using Prometheus and Grafana, covering exporter installation, configuration, key performance metrics such as replication health, query throughput, slow queries, connection limits, buffer pool usage, and provides ready‑made Grafana dashboards and alerting rules.

ExporterGrafanaMetrics

0 likes · 17 min read

Build a Powerful MySQL Monitoring Platform with Prometheus and Grafana

Youzan Coder

Dec 25, 2020 · Big Data

Metadata Governance and Collection in a Data Asset Platform

The platform implements comprehensive metadata governance by extracting, standardizing, and ingesting basic, trend, resource, lineage, and task metadata from offline and real‑time systems via a Kafka‑based SDK, enabling unified storage, monitoring, alerts, and future automation to improve data asset visibility and quality.

Big DataData GovernanceSDK

0 likes · 18 min read

Metadata Governance and Collection in a Data Asset Platform

Architecture Digest

Dec 24, 2020 · Backend Development

WeChat Architecture: Strategies, Agile Practices, and Large‑Scale System Design

The article details WeChat’s three‑in‑one strategy of precise product, agile projects, and robust technical support, explaining how the team achieves massive scalability, high availability, extensible protocols, resilient disaster recovery, and embedded monitoring through practices like small‑system‑big‑scale, gray‑release, and foundational components.

BackendOperationsWeChat

0 likes · 17 min read

WeChat Architecture: Strategies, Agile Practices, and Large‑Scale System Design

JD Tech Talk

Dec 18, 2020 · Artificial Intelligence

Model Online Inference System: Architecture, Components, and Deployment Strategies

This article examines the challenges of moving machine‑learning models from offline training to online serving, proposes a modular architecture—including model gateway, data source gateway, business service center, monitoring, and RPC components—to enable rapid model deployment, version management, traffic mirroring, gray‑release, and real‑time monitoring.

Model Servingmachine learningmonitoring

0 likes · 10 min read

Model Online Inference System: Architecture, Components, and Deployment Strategies

Continuous Delivery 2.0

Dec 18, 2020 · Operations

Applying the VALET Model for SRE Transformation at Home Depot (THD)

The article explains how Home Depot (THD) adopted the VALET model—a five‑dimensional SLO language covering Volume, Availability, Latency, Error, and Ticket—to unify communication, automate data collection, and improve reliability across its massive retail and e‑commerce infrastructure.

OperationsReliabilitySLO

0 likes · 9 min read

Applying the VALET Model for SRE Transformation at Home Depot (THD)

Full-Stack Internet Architecture

Dec 17, 2020 · Operations

How to Set Up Prometheus, Grafana, and Node Exporter for Monitoring

This guide explains the core concepts of logging, metrics, and tracing in micro‑service monitoring, introduces Prometheus, exporters, and Grafana, and provides step‑by‑step instructions for downloading, installing, configuring, and visualizing metrics using these tools.

ExporterGrafanaInstallation

0 likes · 8 min read

How to Set Up Prometheus, Grafana, and Node Exporter for Monitoring

Big Data Technology & Architecture

Dec 16, 2020 · Big Data

Designing a Real‑Time Data Processing Platform with Flink: Architecture, Deployment, and Operations

This article explains how to build a real‑time data processing platform using Flink, covering the Lambda architecture, design approaches, SQL and custom‑Jar task definitions, UI drag‑and‑drop, cluster resource management on Yarn and Kubernetes, submission modes, scheduling, permission and metadata handling, logging, and monitoring with Prometheus and Grafana.

Cluster ManagementFlinkLambda architecture

0 likes · 19 min read

Designing a Real‑Time Data Processing Platform with Flink: Architecture, Deployment, and Operations

58 Tech

Dec 16, 2020 · Big Data

Building a High‑Performance ClickHouse Data Analytics Platform: Architecture, Operations, and Optimization

This article describes how 58.com designed and optimized a ClickHouse‑based OLAP platform for massive user‑behavior data, covering the reasons for choosing ClickHouse, its key features, multi‑layer architecture, configuration management, automation scripts, monitoring, performance benchmarks, and future improvement plans.

OLAPclickhousedata-warehouse

0 likes · 20 min read

Building a High‑Performance ClickHouse Data Analytics Platform: Architecture, Operations, and Optimization

Practical DevOps Architecture

Dec 16, 2020 · Operations

Monitoring CPU, Memory, and Disk Usage with Prometheus and Grafana

This guide explains how to use Prometheus' irate function and Grafana to calculate and visualize CPU, memory, and disk usage percentages for each host, including the necessary PromQL queries and formulas in a monitoring setup.

CPUGrafanaOperations

0 likes · 5 min read

Monitoring CPU, Memory, and Disk Usage with Prometheus and Grafana

Top Architect

Dec 15, 2020 · Backend Development

From Monolith to Service Mesh: A Comprehensive Guide to Microservice Architecture Evolution

This article walks through the transformation of a simple online supermarket from a monolithic application to a fully fledged microservice architecture, covering design principles, common pitfalls, monitoring, tracing, logging, service discovery, circuit breaking, testing strategies, and the role of service meshes.

BackendMicroservicesarchitecture

0 likes · 22 min read

From Monolith to Service Mesh: A Comprehensive Guide to Microservice Architecture Evolution

Efficient Ops

Dec 14, 2020 · Operations

Decoding iostat: How to Interpret Linux I/O Metrics Correctly

This article explains the meaning of iostat fields, the limitations of svctm and await, how /proc/diskstats provides raw counters, and offers formulas and examples for accurately analyzing Linux disk performance.

I/O performanceLinuxdiskstats

0 likes · 12 min read

Decoding iostat: How to Interpret Linux I/O Metrics Correctly

Yanxuan Tech Team

Dec 14, 2020 · Operations

Mastering Stability Governance: Practical Strategies for Reliable Supply‑Chain Systems

This article examines the critical role of stability governance in evolving systems, outlines a three‑stage framework—usability, monitoring alerts, and online emergency—illustrated with a case study of an electronic waybill service, and shares concrete strategies for prevention, detection, response, and post‑mortem to achieve predictable, observable, and fast‑acting reliability.

Operationsgovernanceincident response

0 likes · 11 min read

Mastering Stability Governance: Practical Strategies for Reliable Supply‑Chain Systems

Practical DevOps Architecture

Dec 14, 2020 · Operations

Step-by-Step Guide to Install and Configure Alertmanager with Prometheus on Kubernetes

This tutorial walks through installing Alertmanager on a Kubernetes node, configuring its SMTP settings, integrating it with Prometheus for alerting, defining alert rules, and verifying that email notifications are correctly sent when a monitored node fails.

AlertingAlertmanagerDevOps

0 likes · 6 min read

Step-by-Step Guide to Install and Configure Alertmanager with Prometheus on Kubernetes

Practical DevOps Architecture

Dec 12, 2020 · Cloud Native

Step-by-Step Installation and Configuration of Prometheus, Node Exporter, and Grafana on a Kubernetes Cluster

This guide walks through installing Prometheus on the master node, deploying node_exporter on both master and worker nodes, and setting up Grafana on a third node, including service files, systemd registration, and verification of monitoring endpoints within a Kubernetes environment.

GrafanaInstallationKubernetes

0 likes · 7 min read

Step-by-Step Installation and Configuration of Prometheus, Node Exporter, and Grafana on a Kubernetes Cluster

NetEase Yanxuan Technology Product Team

Dec 11, 2020 · Operations

How to Build Effective Stability Governance for E‑commerce Logistics Services

This article analyzes the concept of stability governance, outlines its five fault‑management sub‑domains, examines the pain points of an electronic waybill service, and presents a comprehensive three‑phase strategy—prevention, perception, reach, mitigation, and post‑mortem—backed by concrete implementation steps in availability, monitoring, and online emergency handling.

LogisticsOperationsincident response

0 likes · 12 min read

How to Build Effective Stability Governance for E‑commerce Logistics Services

iQIYI Technical Product Team

Dec 11, 2020 · Cloud Native

iQIYI Microservice Standard Architecture: Design Principles, Components, and Practices

iQIYI’s middleware team introduced a unified microservice standard architecture—combining a single SDK, centralized infrastructure (Nacos registry, Kong gateway, Apollo config, Prometheus‑SkyWalking monitoring, ChaosBlade), the QDAS platform, and extensible open‑source practices—to eliminate redundant builds, ensure high availability, streamline governance, and pave the way for cloud‑native service‑mesh evolution.

NacosService Meshcloud-native

0 likes · 17 min read

iQIYI Microservice Standard Architecture: Design Principles, Components, and Practices

Practical DevOps Architecture

Dec 11, 2020 · Operations

Installing Prometheus on Linux via Binary and Docker

This guide explains how to install Prometheus on a Linux server using both the binary method and a Docker container, covering downloading, extracting, configuring the prometheus.yml file, running the service, and verifying the installation.

DockerInstallationLinux

0 likes · 3 min read

Installing Prometheus on Linux via Binary and Docker

21CTO

Dec 10, 2020 · Operations

How Netflix’s Telltale Transforms Application Monitoring and Incident Response

This article explains how Netflix built the Telltale monitoring system to consolidate data sources, provide multidimensional health assessments, deliver intelligent alerts, and streamline incident management for over 100 production applications, reducing on‑call fatigue and improving service reliability.

Netflixincident responsemonitoring

0 likes · 14 min read

How Netflix’s Telltale Transforms Application Monitoring and Incident Response

Java Backend Technology

Dec 10, 2020 · Operations

Deploy and Explore the All‑in‑One Open‑Source Monitoring System Xrkmonitor

This article introduces Xrkmonitor, a Chinese open‑source monitoring platform that combines point monitoring, log collection, data visualization, and alerting, and provides detailed advantages, featured functions, online and offline deployment guides, and the underlying technology stack.

ApacheDeploymentLinux

0 likes · 7 min read

Deploy and Explore the All‑in‑One Open‑Source Monitoring System Xrkmonitor

Programmer DD

Dec 9, 2020 · Operations

Step-by-Step Guide to Installing Apache SkyWalking with Elasticsearch and InfluxDB

This tutorial walks through installing and configuring Apache SkyWalking, an open‑source APM system for micro‑services and cloud‑native environments, covering its architecture, Elasticsearch and InfluxDB storage setup, agent deployment, service startup, alarm integration, and essential documentation links.

APMDockerElasticsearch

0 likes · 12 min read

Step-by-Step Guide to Installing Apache SkyWalking with Elasticsearch and InfluxDB

Alibaba Cloud Developer

Dec 8, 2020 · Operations

From Ops Engineer to Cloud Leader: 10 Years of Growth at Alibaba

This article chronicles a senior Alibaba technologist’s decade‑long journey through operations, monitoring, resource management, and product development, sharing practical insights on system automation, team leadership, career promotion, and the mindset needed to evolve from a junior engineer to a cloud‑native solutions architect.

Career DevelopmentOperationsautomation

0 likes · 21 min read

From Ops Engineer to Cloud Leader: 10 Years of Growth at Alibaba

Alibaba Cloud Native

Dec 8, 2020 · Operations

Boost Microservice Resilience with ChaosBlade and SkyWalking: A Hands‑On Guide

This article explains how to use ChaosBlade for fault injection and SkyWalking for monitoring to improve the high‑availability of distributed microservice systems, covering tool installation, experiment design, step‑by‑step execution, and real‑world case studies with detailed commands and metrics.

ChaosBladeDistributed SystemsFault Injection

0 likes · 15 min read

Boost Microservice Resilience with ChaosBlade and SkyWalking: A Hands‑On Guide

Ops Development Stories

Dec 8, 2020 · Cloud Native

Deploy a StatefulSet Prometheus & Alertmanager Cluster with Persistent Storage on Kubernetes

This guide walks through manually deploying a highly available Prometheus and Alertmanager stack on Kubernetes using StatefulSets, StorageClasses, and persistent volumes, covering environment setup, RBAC, ConfigMaps, services, node exporters, kube‑state‑metrics, and verification steps.

AlertmanagerKubernetesPrometheus

0 likes · 23 min read

Deploy a StatefulSet Prometheus & Alertmanager Cluster with Persistent Storage on Kubernetes

dbaplus Community

Dec 7, 2020 · Databases

Why InfluxDB’s max‑value‑per‑tag Error Occurs and How to Resolve It

This article explains the cause of InfluxDB’s max‑value‑per‑tag error when Prometheus remote‑writes high‑cardinality tags, analyzes why the built‑in memory index triggers OOM protection, and presents three practical solutions—including moving indexes to disk, storing high‑cardinality tags as fields, and filtering them before write—to ensure stable monitoring data persistence.

Database ConfigurationInfluxDBTime Series

0 likes · 11 min read

Why InfluxDB’s max‑value‑per‑tag Error Occurs and How to Resolve It

MaGe Linux Operations

Dec 3, 2020 · Cloud Native

Essential Kubernetes Tools: Deploy, Monitor, and Develop with Ease

This article introduces a curated list of Kubernetes tools—including cluster deployment solutions, monitoring utilities, CLI helpers, and development aids—explaining how each simplifies container orchestration, enhances DevOps workflows, and empowers engineers to manage, observe, and extend their Kubernetes environments efficiently.

CLI toolsCluster ManagementDevOps

0 likes · 7 min read

Essential Kubernetes Tools: Deploy, Monitor, and Develop with Ease

Programmer DD

Dec 3, 2020 · Operations

Mastering Prometheus in Kubernetes: Practical Tips, Exporter Guide, and Common Pitfalls

This article shares practical experiences with Prometheus in Kubernetes, covering core principles, limitations, common exporters, metric selection, capacity planning, high‑availability strategies, query optimization, and integration with Grafana, offering actionable guidance for building reliable, scalable monitoring solutions.

ExportersGrafanaKubernetes

0 likes · 31 min read

Mastering Prometheus in Kubernetes: Practical Tips, Exporter Guide, and Common Pitfalls

IT Architects Alliance

Dec 2, 2020 · Operations

How to Diagnose and Optimize Business System Performance Issues

This article outlines a comprehensive process for identifying root causes of performance bottlenecks in production business systems, covering hardware, database, middleware, JVM settings, code inefficiencies, and monitoring tools, and provides practical optimization techniques for each layer.

JVMdatabasediagnostics

0 likes · 16 min read

How to Diagnose and Optimize Business System Performance Issues

Efficient Ops

Dec 1, 2020 · Operations

Zero‑Downtime Ops: Inside Tencent’s Panshi High‑Availability Platform

At the 2020 GOPS Global Operations Conference, Tencent’s senior operations engineer Xie Hailin detailed the design and implementation of the Panshi platform—a comprehensive, high‑availability solution that unifies change management, fault handling, continuous operation, and disaster recovery to ensure uninterrupted payment services for billions of daily transactions.

Operationsaiopschange management

0 likes · 24 min read

Zero‑Downtime Ops: Inside Tencent’s Panshi High‑Availability Platform

Open Source Linux

Nov 30, 2020 · Operations

Essential Linux Shell Commands for System Monitoring and Maintenance

This guide compiles a comprehensive set of Linux shell commands for deleting zero‑byte files, inspecting processes, checking CPU, memory, disk usage, network load, and other system metrics, plus a collection of useful regular expressions for text processing and validation.

LinuxSystem Administrationmonitoring

0 likes · 13 min read

Essential Linux Shell Commands for System Monitoring and Maintenance

Code Ape Tech Column

Nov 27, 2020 · Operations

From Monolith to Microservices: Real‑World Lessons and Practical Strategies

This article walks through the evolution of an online supermarket from a simple monolithic website to a fully split microservice architecture, highlighting the pitfalls of ad‑hoc growth, the need for service abstraction, monitoring, tracing, fault tolerance, testing, and the trade‑offs of frameworks versus service mesh.

MicroservicesService Mesharchitecture

0 likes · 24 min read

From Monolith to Microservices: Real‑World Lessons and Practical Strategies

JD Cloud Developers

Nov 27, 2020 · Operations

How JD Cloud’s Log Service Powered the Record‑Breaking 11.11 Sale

During JD.com’s 11.11 Global Shopping Festival, the JD Cloud Log Service handled petabyte‑scale log data, delivering real‑time monitoring, cost‑effective storage, high‑availability architecture, circuit‑breaking, rate‑limiting, auto‑scaling and comprehensive dashboards to ensure stable operation of the massive traffic surge.

Log Servicecloud computingmonitoring

0 likes · 10 min read

How JD Cloud’s Log Service Powered the Record‑Breaking 11.11 Sale

Ops Development Stories

Nov 27, 2020 · Operations

How to Monitor Redis with Zabbix Agent2: A Complete Guide

This article explains how to use Zabbix Agent2 to monitor Redis, covering the plugin's architecture, configuration priority, methods for retrieving INFO, CONFIG, health status, and slow‑query logs, as well as practical steps to set up the Redis template in Zabbix.

Agent2DevOpsOperations

0 likes · 9 min read

How to Monitor Redis with Zabbix Agent2: A Complete Guide

HaoDF Tech Team

Nov 25, 2020 · Operations

Microservice Governance and Stability Platform at Haodf.com: Architecture, Monitoring, and SLO Design

The article presents a comprehensive case study of Haodf.com's transition to a micro‑service architecture, detailing the challenges of service stability and observability, the design of a unified governance platform with log‑holographic analysis, real‑time alerts, application profiling, SLO/SLA definition, and future roadmap for capacity and reliability improvements.

MicroservicesSLOlogging

0 likes · 16 min read

Microservice Governance and Stability Platform at Haodf.com: Architecture, Monitoring, and SLO Design

Taobao Frontend Technology

Nov 23, 2020 · Operations

Achieving 1‑5‑10 Front‑End Monitoring with JSTracker for Double‑11

This article explains how the JSTracker platform was used to build a comprehensive end‑to‑end front‑end monitoring and data analysis solution that meets the 1‑5‑10 safety production goal—detecting issues within one minute, locating them in five, and fixing them in ten—by improving coverage, subscription, metrics, and gray‑release monitoring for Alibaba’s Double‑11 promotion.

Operationsgray releaseincident response

0 likes · 15 min read

Achieving 1‑5‑10 Front‑End Monitoring with JSTracker for Double‑11

dbaplus Community

Nov 22, 2020 · Operations

Building a Closed‑Loop ‘Monitor‑Manage‑Control’ System for Bank IT Operations

This article outlines how a city‑commercial bank redesigned its monitoring architecture using a closed‑loop “monitor‑manage‑control” strategy, detailing the current challenges, the three‑tier solution, its advantages, and future directions for automated, AI‑enhanced operations.

CMDBIT infrastructureOperations

0 likes · 12 min read

Building a Closed‑Loop ‘Monitor‑Manage‑Control’ System for Bank IT Operations

MaGe Linux Operations

Nov 20, 2020 · Operations

How to Install and Configure Prometheus, Grafana, and Alertmanager for Full‑Stack Monitoring

This guide walks you through installing Prometheus, Grafana, Alertmanager, node_exporter, cadvisor, and blackbox_exporter on CentOS 7, configuring them with Docker or binaries, setting up Prometheus scrape jobs and alert rules, and using a script to add or remove monitored targets.

AlertmanagerDockerGrafana

0 likes · 51 min read

How to Install and Configure Prometheus, Grafana, and Alertmanager for Full‑Stack Monitoring

DeWu Technology

Nov 19, 2020 · Operations

HBase Operations and Use Cases for High‑Concurrency E‑commerce

In this talk, Yun Jin explains how HBase’s petabyte‑scale, horizontally‑scalable architecture—built on Hadoop, HMaster, RegionServers, and Zookeeper—enables e‑commerce platforms to handle extreme promotion‑day traffic by supporting high‑throughput reads/writes, time‑series monitoring, massive order storage, and robust HA, while covering essential table operations, monitoring, and troubleshooting techniques.

Big DataHBaseOperations

0 likes · 6 min read

HBase Operations and Use Cases for High‑Concurrency E‑commerce

Java Backend Technology

Nov 19, 2020 · Backend Development

Why Long Database Transactions Crash Services and How to Prevent Them

The article explains how long‑running database transactions can exhaust connection pools, block threads, and cause widespread service failures, then offers practical strategies—including keeping transactions short, removing RPC calls, enhancing monitoring, and reviewing code—to detect and prevent these high‑risk issues.

Backend PerformanceDatabase Connection Poollong transactions

0 likes · 7 min read

Why Long Database Transactions Crash Services and How to Prevent Them

Practical DevOps Architecture

Nov 19, 2020 · Databases

Elasticsearch Index Basics, Custom Shard and Replica Settings, and Monitoring Commands

This article explains fundamental Elasticsearch concepts such as indexes, index types, and the immutable nature of shard counts, demonstrates how to define custom shard and replica settings when creating or updating indices, and outlines essential monitoring considerations and commands for cluster health.

ElasticsearchReplicasShards

0 likes · 3 min read

Elasticsearch Index Basics, Custom Shard and Replica Settings, and Monitoring Commands

JD Cloud Developers

Nov 10, 2020 · Cloud Computing

How JD Cloud Powers the 11.11 Mega Sale: Scaling, High Availability, and Monitoring Strategies

This article reveals how JD's Zhilian Cloud prepares for the massive 11.11 shopping festival by rapidly mobilizing teams, defining protection scopes, estimating resources, implementing high‑availability across regions and AZs, applying business degradation and elastic scaling, and establishing comprehensive monitoring and rehearsal practices to ensure a smooth, resilient promotion.

Operationscloud computingmonitoring

0 likes · 13 min read

How JD Cloud Powers the 11.11 Mega Sale: Scaling, High Availability, and Monitoring Strategies

Alibaba Terminal Technology

Nov 6, 2020 · Frontend Development

Designing a Robust Front‑End Monitoring SDK: Principles, Architecture & Implementation

This article explores the design and implementation of the Yueying front‑end monitoring SDK, covering its purpose, core design principles, module architecture, reference formats, semantic versioning, key interfaces, testing strategy, and user‑experience enhancements such as quick integration and dynamic sampling.

DesignSDKfrontend

0 likes · 10 min read

Designing a Robust Front‑End Monitoring SDK: Principles, Architecture & Implementation

High Availability Architecture

Nov 6, 2020 · Operations

My Philosophy on Alerting: Principles for Effective Monitoring and Incident Management

This article translates and expands on the author’s seven‑year experience with monitoring and alerting, presenting symptom‑based principles, practical guidelines for rule design, incident handling, and operational processes to create a robust, low‑noise alerting system.

Operationsmonitoringobservability

0 likes · 16 min read

My Philosophy on Alerting: Principles for Effective Monitoring and Incident Management

IT Architects Alliance

Nov 3, 2020 · Backend Development

How to Learn Microservices: Learning Pyramid, Path, and Six Core Components

This article presents a structured approach to mastering microservices, covering the learning pyramid concept, a detailed learning path with resource collection, and an overview of the six essential components—service description, registry, framework, monitoring, tracing, and governance—along with practical tips and visual diagrams.

BackendLearning PathMicroservices

0 likes · 9 min read

How to Learn Microservices: Learning Pyramid, Path, and Six Core Components

Alibaba Cloud Developer

Nov 3, 2020 · Frontend Development

How to Build a Robust Front‑End Monitoring SDK: Design Principles & Implementation

This article explains what an SDK is, outlines key design principles such as minimalism, stability, and extensibility, and walks through the practical implementation of Alibaba’s Yueying front‑end monitoring SDK, covering architecture, module division, versioning, core interfaces, testing, and deployment options.

DesignSDKfrontend

0 likes · 12 min read

How to Build a Robust Front‑End Monitoring SDK: Design Principles & Implementation

Efficient Ops

Nov 1, 2020 · Databases

Why Is Redis Slowing Down? Diagnose and Fix Common Latency Issues

This article explains the typical reasons behind Redis latency spikes—such as complex commands, big keys, concentrated expirations, memory limits, fork overhead, CPU binding, AOF settings, swap usage, and network overload—and provides practical steps and monitoring techniques to identify and resolve each problem.

BigKeyLatencySlowlog

0 likes · 18 min read

Why Is Redis Slowing Down? Diagnose and Fix Common Latency Issues

Zhongtong Tech

Oct 30, 2020 · Big Data

How Apache Kylin Supercharged OLAP at ZTO Express: A Deep Dive

This article details ZTO Express's journey of adopting Apache Kylin for OLAP, comparing it with Presto, describing platform architecture, performance gains, integration with scheduling and monitoring systems, and the practical optimizations and future plans that enabled sub‑second query responses on massive daily data volumes.

Apache KylinBig DataHBase

0 likes · 16 min read

How Apache Kylin Supercharged OLAP at ZTO Express: A Deep Dive

Java Backend Technology

Oct 27, 2020 · Backend Development

Master JVM Performance: Essential Tools and Real-World Usage Guide

This article explains common JVM problems such as OutOfMemoryError, memory leaks, and thread deadlocks, then introduces core monitoring tools—jps, jstack, jmap/jhat, jstat, and hprof—detailing their syntax, options, and practical examples to help Java developers diagnose and tune production applications.

HprofJVMjmap

0 likes · 15 min read

Master JVM Performance: Essential Tools and Real-World Usage Guide

Aikesheng Open Source Community

Oct 26, 2020 · Operations

Debugging Persistent Active Alerts in Thanos Ruler: Queue Bottleneck Analysis and maxBatchSize Tuning

The article analyzes a persistent active alert observed via Thanos Ruler's HTTP interface, identifies the buffering queue bottleneck as the root cause, and proposes adjusting the maxBatchSize parameter to prevent alert delay and automatic resolution failures.

AlertingAlertmanagerBufferQueue

0 likes · 8 min read

Debugging Persistent Active Alerts in Thanos Ruler: Queue Bottleneck Analysis and maxBatchSize Tuning

Ops Development Stories

Oct 26, 2020 · Operations

How to Use Zabbix Agent2 to Monitor Docker Containers – Step-by-Step Guide

This guide walks through the inner workings of Zabbix Agent2’s Docker monitoring plugin, detailing how it communicates via the Docker API, the key configuration files, the query mechanism, and how to apply the provided templates to automatically discover and display container and image metrics.

Agent2ContainersDocker

0 likes · 5 min read

How to Use Zabbix Agent2 to Monitor Docker Containers – Step-by-Step Guide

dbaplus Community

Oct 22, 2020 · Operations

Choosing the Right Open‑Source Monitoring System: Zabbix, Open‑Falcon, and Prometheus Compared

This article systematically explains monitoring fundamentals, the seven core functions of a monitoring system, proper usage practices, common monitoring objects and metrics, the basic data flow, and provides detailed comparisons of three popular open‑source solutions—Zabbix, Open‑Falcon, and Prometheus—to guide informed selection decisions.

Open-FalconOperationsSystem Design

0 likes · 20 min read

Choosing the Right Open‑Source Monitoring System: Zabbix, Open‑Falcon, and Prometheus Compared

Programmer DD

Oct 22, 2020 · Operations

Mastering Prometheus: Principles, Pitfalls, and Scaling Strategies

This article explores Prometheus as a cloud‑native monitoring solution, covering core principles, limitations, metric selection, exporter consolidation, Kubernetes deployment nuances, memory and storage planning, high‑availability designs, and advanced features like rate calculations, cardinality management, and predictive alerts.

HAKubernetesMetrics

0 likes · 33 min read

Mastering Prometheus: Principles, Pitfalls, and Scaling Strategies

Java Architecture Diary

Oct 20, 2020 · Backend Development

How to Integrate Druid Monitor with Spring Cloud: A Step‑by‑Step Guide

This article explains what Druid Monitor and Druid Admin are, why cluster‑level monitoring is needed in microservice architectures, and provides a complete Spring Cloud starter implementation with configuration examples, code snippets, and usage limitations.

DruidMicroservicesSpring Boot

0 likes · 6 min read

How to Integrate Druid Monitor with Spring Cloud: A Step‑by‑Step Guide

Efficient Ops

Oct 18, 2020 · Operations

Unlocking Prometheus: How TSDB Powers Scalable Monitoring and Real-Time Analytics

This article explains how Prometheus uses a time‑series database (TSDB) to handle massive monitoring data, detailing its concepts, query examples, storage engine design, indexing mechanisms, and the benefits of pre‑computing expressions for efficient real‑time analysis.

MetricsPrometheusTSDB

0 likes · 7 min read

Unlocking Prometheus: How TSDB Powers Scalable Monitoring and Real-Time Analytics

iQIYI Technical Product Team

Oct 16, 2020 · Cloud Native

Service Maturity Model and Optimization Practices for Microservices

The article presents iQIYI’s service‑maturity model for micro‑services, outlines how scores across development, deployment and operation stages reveal common deficiencies such as code style, testing, gray‑release and alert handling, and recommends concrete optimization practices—including unified coding standards, automated testing, robust rollback, circuit‑breaking, monitoring, and emergency procedures—to raise services to mature, high‑scoring levels.

Availabilitymonitoringservice maturity

0 likes · 15 min read

Service Maturity Model and Optimization Practices for Microservices

Architecture Digest

Oct 16, 2020 · Backend Development

Root Cause Analysis of High Latency in a Java HTTP Service: QPS Surge, GC Overhead, and Memory Pressure

The article details a real‑world investigation of a Java HTTP service that experienced a sudden QPS increase and response‑time spikes, tracing the issue through database queries, local method latency, CPU load, frequent ParNew GCs, and large response payloads, and presents concrete remediation steps.

BackendJVMgc

0 likes · 8 min read

Root Cause Analysis of High Latency in a Java HTTP Service: QPS Surge, GC Overhead, and Memory Pressure

dbaplus Community

Oct 15, 2020 · Backend Development

Essential 2020 Backend Tech Stack: 14 Categories of Tools and Frameworks

This guide surveys over a hundred modern frameworks and tools across fourteen critical backend domains—message queues, caching, sharding, data sync, communication, micro‑services, distributed utilities, monitoring, scheduling, entry proxies, storage, CI/CD, debugging, and local utilities—offering concise recommendations and practical insights for architects and engineers.

BackendTechnology Selectionarchitecture

0 likes · 14 min read

Essential 2020 Backend Tech Stack: 14 Categories of Tools and Frameworks

Meituan Technology Team

Oct 15, 2020 · Artificial Intelligence

AIOps at Meituan: Architecture and Practice of Time‑Series Anomaly Detection (Part 1)

Meituan’s AIOps initiative replaces manual rule‑based monitoring with the Horae platform, which automatically classifies time‑series metrics, applies CNN and XGBoost models to detect periodic anomalies, achieves over 90 % precision in production, and paves the way for broader metric types, forecasting, and advanced fault‑localization.

HoraeMeituanOperations

0 likes · 33 min read

AIOps at Meituan: Architecture and Practice of Time‑Series Anomaly Detection (Part 1)

Cloud Native Technology Community

Oct 12, 2020 · Operations

How to Monitor etcd in Kubernetes: Metrics, Prometheus, and Sysdig

This article explains what etcd is, outlines common failure points, and provides step‑by‑step instructions for collecting etcd metrics via curl, configuring Prometheus scraping, creating alerts, and using Sysdig Monitor to observe key health indicators in a Kubernetes environment.

etcdmonitoringsysdig

0 likes · 13 min read

How to Monitor etcd in Kubernetes: Metrics, Prometheus, and Sysdig

Liangxu Linux

Oct 11, 2020 · Operations

Essential Linux Commands for Database Monitoring and System Management

A concise collection of Linux command‑line snippets helps you query Oracle client IPs, kill specific processes, count connections, summarize traffic, find large files, measure copy time, and monitor CPU and memory usage, all useful for DB and system administrators.

Sysadmincommandsdatabase

0 likes · 6 min read

Essential Linux Commands for Database Monitoring and System Management

ITPUB

Oct 9, 2020 · Operations

How to Streamline Call Center Incident Management: Practical Steps and Best Practices

This guide walks through a real‑world call‑center slowdown incident, outlines common fault‑handling techniques, proposes monitoring enhancements, details a comprehensive emergency‑response plan, and introduces intelligent event‑processing concepts to help operations teams resolve outages faster and more reliably.

Operationsautomationcall center

0 likes · 15 min read

How to Streamline Call Center Incident Management: Practical Steps and Best Practices

Youzan Coder

Oct 9, 2020 · Backend Development

Performance Optimization: Concepts, Metrics, and a Real‑World Case Study from Youzan Live Streaming

Performance optimization is a continuous, data‑driven practice that monitors response time and concurrency, applies techniques such as indexing, caching, parallelism, and asynchronous processing, and in Youzan’s live‑streaming product‑detail case reduced bottlenecks by adding multi‑level caches, circuit‑breaker fallbacks, and parallel sub‑task aggregation.

Load Testingcachingmonitoring

0 likes · 16 min read

Performance Optimization: Concepts, Metrics, and a Real‑World Case Study from Youzan Live Streaming

Liangxu Linux

Oct 7, 2020 · Operations

Turn Shell Commands into Real‑Time Visual Dashboards with Sampler

Sampler is a lightweight tool that runs shell commands, visualizes their output, and can trigger alerts; configured via simple YAML, it works on macOS, Linux and Windows, supports various components such as runcharts, sparklines, gauges, and interactive shells for monitoring databases, queues and system metrics.

DevOpsShellYAML

0 likes · 15 min read

Turn Shell Commands into Real‑Time Visual Dashboards with Sampler

MaGe Linux Operations

Oct 6, 2020 · Cloud Native

Essential Prometheus Operator Metrics for Kubernetes: Prevent Alert Overload

This guide explains the most common Prometheus Operator metrics for Kubernetes, detailing each metric's purpose, the PromQL expression to monitor it, and the related underlying metrics, helping you fine‑tune alerts and avoid unnecessary noise in your cluster monitoring.

Cloud NativeKubernetesPromQL

0 likes · 24 min read

Essential Prometheus Operator Metrics for Kubernetes: Prevent Alert Overload

Top Architect

Oct 2, 2020 · Databases

Redis Performance Degradation: Common Latency Issues, Diagnosis, and Optimization

This article explains why Redis can become slow, covering typical latency causes such as high‑complexity commands, large keys, concentrated expirations, memory limits, fork overhead, CPU binding, AOF settings, swap usage, and network saturation, and provides practical troubleshooting steps and best‑practice recommendations.

Latencybest-practicesmonitoring

0 likes · 24 min read

Redis Performance Degradation: Common Latency Issues, Diagnosis, and Optimization

Aikesheng Open Source Community

Sep 29, 2020 · Databases

Understanding Prometheus Local Storage (TSDB) and Its Architecture

This article explains Prometheus's built‑in time‑series database (TSDB), covering its concepts, storage configuration, block structure, write‑ahead log, mmap reads, inverted indexing, data compression, and remote storage integration for scalable monitoring.

PrometheusRemoteStorageTSDB

0 likes · 8 min read

Understanding Prometheus Local Storage (TSDB) and Its Architecture

Aikesheng Open Source Community

Sep 28, 2020 · Backend Development

DTLE 3.20.09.0 Release Notes – New Monitoring Features, Docker Support, and Bug Fixes

Version 3.20.09.0 of the open‑source DTLE data‑transfer component for MySQL has been released, introducing replication‑delay and memory‑usage monitoring with Prometheus, providing configuration examples and Docker commands, and fixing incremental serialization, CPU usage, and uppercase‑where clause handling.

DTLEData TransferDocker

0 likes · 5 min read

DTLE 3.20.09.0 Release Notes – New Monitoring Features, Docker Support, and Bug Fixes

Xianyu Technology

Sep 27, 2020 · Backend Development

Design of an Asynchronous Component with Monitoring, Fault Tolerance, and Zero‑Cost Integration

The article presents a design for an asynchronous component that is monitorable, fault‑tolerant, and integrates with zero overhead, compares Akka, RxJava, and a custom JUC‑based implementation, and selects the latter—using extended Callables and a CountDownLatch—to track business units, handle timeouts, and provide fallback behavior.

AsynchronousJUCconcurrency

0 likes · 8 min read

Design of an Asynchronous Component with Monitoring, Fault Tolerance, and Zero‑Cost Integration

Efficient Ops

Sep 20, 2020 · Operations

How to Build Docker Container Monitoring with CAdvisor, InfluxDB & Grafana

This article explains how to design and implement a Docker container monitoring system using CAdvisor for metric collection, InfluxDB for time‑series storage, and Grafana for visualization, covering deployment, integration, common issues, and practical configuration details.

ContainerDockerGrafana

0 likes · 15 min read

How to Build Docker Container Monitoring with CAdvisor, InfluxDB & Grafana

dbaplus Community

Sep 20, 2020 · Operations

Zabbix vs Prometheus: Choosing the Right Monitoring Tool for Large‑Scale Environments

A comprehensive Q&A with SRE experts explores how Zabbix and Prometheus compare across scalability, storage, alert handling, intelligent monitoring, dashboard design, automation, migration strategies, and performance‑cost trade‑offs for modern infrastructure.

AlertingScalabilityZabbix

0 likes · 33 min read

Zabbix vs Prometheus: Choosing the Right Monitoring Tool for Large‑Scale Environments

iQIYI Technical Product Team

Sep 18, 2020 · Operations

Full-Chain Load Testing Practices for iQIYI Payment System

iQIYI’s payment team built a full‑chain load‑testing framework that isolates data, mocks dependencies, constructs realistic multi‑service traffic, and executes protected tests to expose bottlenecks, guide scaling and optimizations, and ultimately ensure reliable payment services during traffic spikes, while planning a unified automation platform.

Load Testingcapacity planningfull-chain testing

0 likes · 13 min read

Full-Chain Load Testing Practices for iQIYI Payment System

Top Architect

Sep 18, 2020 · Backend Development

Microservice Architecture Evolution: From Monolith to Service Mesh

This article walks through the evolution of an online supermarket from a simple monolithic web application to a fully decomposed microservice architecture, highlighting the challenges of scaling, the need for monitoring, tracing, service discovery, fault tolerance, and the eventual adoption of a service mesh.

BackendMicroservicesService Mesh

0 likes · 23 min read

Microservice Architecture Evolution: From Monolith to Service Mesh

Zhuanzhuan Tech

Sep 18, 2020 · Operations

Testing Environment Characteristics, Common Issues, and Troubleshooting Practices

The article outlines the complex nature of testing environments, enumerates typical problems such as resource constraints, external dependencies, and service bugs, and presents systematic troubleshooting methods, useful tools, and real‑world case studies to improve reliability and efficiency.

Environmentcase studymonitoring

0 likes · 11 min read

Testing Environment Characteristics, Common Issues, and Troubleshooting Practices

转转QA

Sep 18, 2020 · Operations

Testing Environment Troubleshooting: Characteristics, Common Issues, and Practical Solutions

This article examines the complexities of testing environments, outlines typical causes of failures such as resource constraints, external dependencies, and service bugs, and provides systematic troubleshooting methods, useful tools, and real‑world case studies to improve reliability and efficiency.

EnvironmentOperationsdebugging

0 likes · 11 min read

Testing Environment Troubleshooting: Characteristics, Common Issues, and Practical Solutions

Big Data Technology & Architecture

Sep 17, 2020 · Big Data

Monitoring Kafka Consumer Groups with kafka-consumer-groups and Kafka Manager

This article explains how to monitor Kafka consumer groups using the built‑in kafka‑consumer‑groups tool and the Kafka Manager UI, providing commands, field explanations, and setup steps to ensure real‑time data availability for downstream services such as MongoDB or Elasticsearch.

Big DataKafkaKafka Manager

0 likes · 4 min read

Monitoring Kafka Consumer Groups with kafka-consumer-groups and Kafka Manager

vivo Internet Technology

Sep 16, 2020 · Fundamentals

Shared Memory Principles and a Practical VCS Data Collection Implementation

The article explains Linux shared‑memory fundamentals, why it outperforms file‑based IPC, demonstrates the mmap() system call, and walks through a complete Go implementation that creates, synchronizes, reads, and protobuf‑serializes advertising‑tracking metrics in the VCS monitoring platform.

GoIPCmmap

0 likes · 19 min read

Shared Memory Principles and a Practical VCS Data Collection Implementation

JD Cloud Developers

Sep 15, 2020 · Databases

How JD’s HoraeDB Tackles Massive Time‑Series Data at Scale

This article introduces JD Cloud’s self‑built time‑series database HoraeDB, explaining its core concepts, typical use cases, architectural layers, high‑performance features, down‑sampling strategies, compression techniques, and stability measures for handling massive, 24‑hour monitoring data at scale.

DownsamplingTime Series Databasecompression

0 likes · 18 min read

How JD’s HoraeDB Tackles Massive Time‑Series Data at Scale

Big Data Technology & Architecture

Sep 13, 2020 · Big Data

ClickHouse Deployment, Management, and Monitoring Practices in Production

This article explains ClickHouse's strengths as a high‑performance MPP database, details hardware selection, read/write separation, shard expansion steps, batch‑size tuning, and presents a three‑layer monitoring model, while also describing its practical application in Tencent's game analytics platform.

Big DataDeploymentGame Analytics

0 likes · 19 min read

ClickHouse Deployment, Management, and Monitoring Practices in Production

DataFunTalk

Sep 13, 2020 · Big Data

Online Sample Generation with Flink: Architecture and Implementation

This article explains why Flink is chosen for online sample generation, describes the end‑to‑end implementation steps—including stream union, state‑timer processing, and output formatting—covers state backend choices, monitoring, validation, fault handling, and platformization for scalable real‑time machine‑learning pipelines.

FlinkKafkaOnline Sample Generation

0 likes · 11 min read

Online Sample Generation with Flink: Architecture and Implementation

Java Backend Technology

Sep 12, 2020 · Databases

Why Redis Gets Slow: Common Latency Causes and How to Diagnose Them

This article explains the typical reasons Redis latency spikes—such as high‑complexity commands, large keys, concentrated expirations, memory limits, fork overhead, CPU binding, AOF settings, swap usage, and network saturation—and provides practical steps to monitor, identify, and mitigate each issue.

Slowlogmemorymonitoring

0 likes · 18 min read

Why Redis Gets Slow: Common Latency Causes and How to Diagnose Them

ITPUB

Sep 11, 2020 · Blockchain

How Red Pulse Secured Its Blockchain Platform: Real‑World Attack Lessons

This article details Red Pulse's journey of integrating the NEO blockchain, the security vulnerabilities it faced—from token theft and credential‑stuffing attacks to sophisticated social‑engineering exploits—and the comprehensive technical measures, monitoring tools, and mitigation strategies it implemented to protect its platform and users.

Attack MitigationBlockchainNEO

0 likes · 21 min read

How Red Pulse Secured Its Blockchain Platform: Real‑World Attack Lessons

Aikesheng Open Source Community

Sep 10, 2020 · Databases

Setting Up ClickHouse Monitoring with clickhouse-exporter, Prometheus, and Grafana

This guide walks through deploying clickhouse-exporter, configuring Prometheus to scrape its metrics, and importing a Grafana dashboard to monitor ClickHouse single‑node or cluster performance, providing a practical monitoring solution for the database.

ExporterGoGrafana

0 likes · 4 min read

Setting Up ClickHouse Monitoring with clickhouse-exporter, Prometheus, and Grafana

Aikesheng Open Source Community

Sep 9, 2020 · Databases

How to Monitor MySQL Compressed Tables and Their Suitable Use Cases

This article explains the scenarios where MySQL compressed tables are appropriate, describes how to monitor their health using InnoDB CMP tables in information_schema, and provides practical examples of creation, performance comparison, and update/delete operations to illustrate best‑practice usage.

Compressed TableInnoDBdatabase

0 likes · 10 min read

How to Monitor MySQL Compressed Tables and Their Suitable Use Cases

Ops Development Stories

Sep 9, 2020 · Operations

How to Deploy MQTT with Mosquitto and Monitor It Using Zabbix Agent2

This guide explains the MQTT protocol, shows how to install and run a Mosquitto broker on CentOS, and demonstrates how to collect MQTT messages with a custom Zabbix Agent2 plugin for real‑time monitoring in Zabbix.

Agent2IoTMQTT

0 likes · 7 min read

How to Deploy MQTT with Mosquitto and Monitor It Using Zabbix Agent2

HaoDF Tech Team

Sep 7, 2020 · Operations

Analyzing Latency and Slow Interface Detection in a Full‑Chain Monitoring System

This article explains how latency is used as a key indicator for application risk identification, defines slow interfaces, describes why percentile‑based thresholds are preferred over averages, and outlines the architecture, task workflow, and practical optimization strategies for a full‑chain monitoring system in a microservice environment.

LatencyMicroservicesSRE

0 likes · 14 min read

Analyzing Latency and Slow Interface Detection in a Full‑Chain Monitoring System

New Oriental Technology

Sep 7, 2020 · Operations

Performance Optimization and Stability Enhancement of the Continuation Enrollment System

This article details the background, performance and stability requirements, strategic approach, and concrete initiatives—including full‑chain load testing, chaos engineering, monitoring, and targeted optimization projects—that were undertaken to boost the performance by over 300% and improve high‑availability of the continuation enrollment platform.

Load Testingbackend optimizationchaos testing

0 likes · 7 min read

Performance Optimization and Stability Enhancement of the Continuation Enrollment System

dbaplus Community

Sep 6, 2020 · Operations

Building a High‑Performance Monitoring Alert System with Akka, Dubbo, and Ignite

The article outlines G Bank’s transition from a single‑threaded commercial monitoring solution to a self‑developed, open‑source based alert system that leverages Akka for parallel collection, Apache Dubbo for distributed processing, and Apache Ignite for in‑memory storage, achieving million‑level alert capacity, sub‑100 ms latency, and linear scalability.

AkkaApache DubboApache Ignite

0 likes · 17 min read

Building a High‑Performance Monitoring Alert System with Akka, Dubbo, and Ignite

MaGe Linux Operations

Sep 4, 2020 · Operations

Master Prometheus: From Basics to Full-Scale Monitoring Deployment

This guide walks through Prometheus fundamentals, architecture, components, service discovery, Docker-based deployment, exporter integration, Alertmanager configuration, Grafana visualization, PromQL queries, and Consul service discovery, providing a complete end‑to‑end monitoring solution for cloud‑native environments.

AlertmanagerConsulDocker

0 likes · 32 min read

Master Prometheus: From Basics to Full-Scale Monitoring Deployment

Suning Technology

Sep 4, 2020 · Big Data

How ClickHouse Powers Real-Time OLAP Monitoring at Suning Big Data Platform

This article explains how Suning's big‑data center leverages ClickHouse’s columnar OLAP engine and a full‑chain monitoring platform to achieve real‑time query tracing, slow‑query analysis, cluster health checks, and resource‑level alerts across diverse business scenarios.

ClusterOLAPclickhouse

0 likes · 14 min read

How ClickHouse Powers Real-Time OLAP Monitoring at Suning Big Data Platform

Alibaba Cloud Native

Sep 1, 2020 · Cloud Native

CTrip’s CDubbo Journey: Scaling 10k Services with Registration, Monitoring, and Service Mesh

From early .Net ESB attempts to a Java‑based CDubbo framework, CTrip details its migration to Dubbo, covering registration, health checks, CAT monitoring, dynamic configuration, SOA compatibility, testing tools, thread‑less execution, performance gains, extensibility, ecosystem integration, and future service‑mesh standardization.

MicroservicesRegistrationcloud-native

0 likes · 15 min read

CTrip’s CDubbo Journey: Scaling 10k Services with Registration, Monitoring, and Service Mesh

Liangxu Linux

Aug 29, 2020 · Operations

Enforcing Clear Git Commit Messages with a Webhook‑Based Monitoring Service

This article explains why consistent Git commit messages matter, presents a detailed commit‑message format with type, scope and subject, shows how to enforce the standard using a webhook that validates messages, monitors large commits, and provides useful statistics for the development team.

code-qualitycommit messagemonitoring

0 likes · 11 min read

Enforcing Clear Git Commit Messages with a Webhook‑Based Monitoring Service

Amap Tech

Aug 28, 2020 · Fundamentals

Git Commit Message Standardization and Monitoring Service

The team introduced an Angular‑style Git commit‑message standard—type(scope): subject in Chinese—and built a webhook‑based monitoring service that validates pushes, alerts violations, tracks diff size and deletions, stores metrics, and visualizes compliance, improving traceability, readability, and automated changelog generation.

DevOpsGitbest-practices

0 likes · 10 min read

Git Commit Message Standardization and Monitoring Service

Java Architecture Diary

Aug 27, 2020 · Operations

Visualizing Redis in Grafana: Quick Start with the Redis Data Source Plugin

Grafana’s new Redis Data Source plugin lets DevOps engineers and DBAs seamlessly connect to Redis instances—whether open‑source, Enterprise, or Cloud—visualize time‑series and core data types, run management commands, and build interactive dashboards using Grafana’s transformations and built‑in panels.

DashboardData SourceDevOps

0 likes · 7 min read

Visualizing Redis in Grafana: Quick Start with the Redis Data Source Plugin