Tagged articles
2179 articles
Page 15 of 22
macrozheng
macrozheng
Jan 6, 2021 · Backend Development

Essential Spring Boot Practices for Building Robust Microservices

This article outlines the golden rules for constructing Spring Boot microservices, covering monitoring with Spring Boot Admin and Grafana, exposing metrics via Actuator, centralized logging with ELK, clear API documentation using Swagger, YApi or smart‑doc, transparent build info, and keeping dependencies up‑to‑date.

API documentationMicroservicesSpring Boot
0 likes · 8 min read
Essential Spring Boot Practices for Building Robust Microservices
Liangxu Linux
Liangxu Linux
Jan 5, 2021 · Operations

How to Install and Use nmon Monitoring Tool on CentOS 7

This guide shows how to download, extract, and run the lightweight nmon performance monitoring tool on CentOS 7, including the exact commands to fetch the package, choose the correct binary, start the utility, and view CPU and memory statistics using interactive keys.

Linuxcentos7monitoring
0 likes · 3 min read
How to Install and Use nmon Monitoring Tool on CentOS 7
Ops Development Stories
Ops Development Stories
Jan 4, 2021 · Cloud Native

Integrate SkyWalking Monitoring into Nginx Ingress on Kubernetes

This guide walks through installing SkyWalking‑nginx‑lua, renaming conflicting scripts, modifying the nginx‑ingress controller’s template to inject SkyWalking environment variables and tracing buffer, building a custom Docker image, and deploying it with the required environment variables so that request traces appear in the SkyWalking UI.

DockerIngressKubernetes
0 likes · 7 min read
Integrate SkyWalking Monitoring into Nginx Ingress on Kubernetes
Architect
Architect
Jan 2, 2021 · Operations

Layered Architecture of Microservice Monitoring and Key Practices

This article explains the layered architecture of microservice monitoring, detailing five monitoring levels—from infrastructure to end-user experience—along with essential monitoring points such as logs, metrics, tracing, alerts, and health checks, and presents a typical monitoring stack using agents, Kafka, ELK, and InfluxDB.

MetricsOperationslogging
0 likes · 6 min read
Layered Architecture of Microservice Monitoring and Key Practices
MaGe Linux Operations
MaGe Linux Operations
Jan 1, 2021 · Operations

How to Deploy Nightingale: A Step‑by‑Step Docker Guide for High‑Availability Monitoring

This article provides a comprehensive, step‑by‑step tutorial for installing the open‑source Nightingale monitoring platform using Docker, covering code retrieval, Docker‑compose setup, node configuration, service startup, Grafana integration, and essential UI features, enabling a high‑availability, hybrid‑cloud monitoring solution.

DockerGrafanaKubernetes
0 likes · 7 min read
How to Deploy Nightingale: A Step‑by‑Step Docker Guide for High‑Availability Monitoring
Youzan Coder
Youzan Coder
Dec 30, 2020 · Operations

ERROR Log Governance and Monitoring Alerting Practice at Youzan

Youzan’s log‑governance guide uses a car‑dashboard analogy to show why precise ERROR logs and sensible alerts matter, defines INFO/WARN/ERROR levels, sets daily reduction targets, leverages top‑error analysis and water‑level monitoring, and ultimately cut daily ERROR entries from thousands to about one hundred while catching issues before incidents.

AlertingError HandlingLog Management
0 likes · 9 min read
ERROR Log Governance and Monitoring Alerting Practice at Youzan
Architecture Digest
Architecture Digest
Dec 30, 2020 · Databases

Redis Latency Analysis and Mitigation Strategies

This article examines common causes of increased latency in Redis—including high‑complexity commands, large keys, concentrated expirations, memory limits, fork overhead, CPU binding, AOF settings, swap usage, and network saturation—and provides practical monitoring and configuration techniques to diagnose and reduce delays.

Latencymonitoringoptimization
0 likes · 17 min read
Redis Latency Analysis and Mitigation Strategies
Programmer DD
Programmer DD
Dec 27, 2020 · Databases

Build a Powerful MySQL Monitoring Platform with Prometheus and Grafana

This guide walks through building a comprehensive MySQL monitoring platform using Prometheus and Grafana, covering exporter installation, configuration, key performance metrics such as replication health, query throughput, slow queries, connection limits, buffer pool usage, and provides ready‑made Grafana dashboards and alerting rules.

ExporterGrafanaMetrics
0 likes · 17 min read
Build a Powerful MySQL Monitoring Platform with Prometheus and Grafana
Youzan Coder
Youzan Coder
Dec 25, 2020 · Big Data

Metadata Governance and Collection in a Data Asset Platform

The platform implements comprehensive metadata governance by extracting, standardizing, and ingesting basic, trend, resource, lineage, and task metadata from offline and real‑time systems via a Kafka‑based SDK, enabling unified storage, monitoring, alerts, and future automation to improve data asset visibility and quality.

Big DataData GovernanceSDK
0 likes · 18 min read
Metadata Governance and Collection in a Data Asset Platform
Architecture Digest
Architecture Digest
Dec 24, 2020 · Backend Development

WeChat Architecture: Strategies, Agile Practices, and Large‑Scale System Design

The article details WeChat’s three‑in‑one strategy of precise product, agile projects, and robust technical support, explaining how the team achieves massive scalability, high availability, extensible protocols, resilient disaster recovery, and embedded monitoring through practices like small‑system‑big‑scale, gray‑release, and foundational components.

BackendOperationsWeChat
0 likes · 17 min read
WeChat Architecture: Strategies, Agile Practices, and Large‑Scale System Design
JD Tech Talk
JD Tech Talk
Dec 18, 2020 · Artificial Intelligence

Model Online Inference System: Architecture, Components, and Deployment Strategies

This article examines the challenges of moving machine‑learning models from offline training to online serving, proposes a modular architecture—including model gateway, data source gateway, business service center, monitoring, and RPC components—to enable rapid model deployment, version management, traffic mirroring, gray‑release, and real‑time monitoring.

Model Servingmachine learningmonitoring
0 likes · 10 min read
Model Online Inference System: Architecture, Components, and Deployment Strategies
Continuous Delivery 2.0
Continuous Delivery 2.0
Dec 18, 2020 · Operations

Applying the VALET Model for SRE Transformation at Home Depot (THD)

The article explains how Home Depot (THD) adopted the VALET model—a five‑dimensional SLO language covering Volume, Availability, Latency, Error, and Ticket—to unify communication, automate data collection, and improve reliability across its massive retail and e‑commerce infrastructure.

OperationsReliabilitySLO
0 likes · 9 min read
Applying the VALET Model for SRE Transformation at Home Depot (THD)
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 16, 2020 · Big Data

Designing a Real‑Time Data Processing Platform with Flink: Architecture, Deployment, and Operations

This article explains how to build a real‑time data processing platform using Flink, covering the Lambda architecture, design approaches, SQL and custom‑Jar task definitions, UI drag‑and‑drop, cluster resource management on Yarn and Kubernetes, submission modes, scheduling, permission and metadata handling, logging, and monitoring with Prometheus and Grafana.

Cluster ManagementFlinkLambda architecture
0 likes · 19 min read
Designing a Real‑Time Data Processing Platform with Flink: Architecture, Deployment, and Operations
58 Tech
58 Tech
Dec 16, 2020 · Big Data

Building a High‑Performance ClickHouse Data Analytics Platform: Architecture, Operations, and Optimization

This article describes how 58.com designed and optimized a ClickHouse‑based OLAP platform for massive user‑behavior data, covering the reasons for choosing ClickHouse, its key features, multi‑layer architecture, configuration management, automation scripts, monitoring, performance benchmarks, and future improvement plans.

OLAPclickhousedata-warehouse
0 likes · 20 min read
Building a High‑Performance ClickHouse Data Analytics Platform: Architecture, Operations, and Optimization
Top Architect
Top Architect
Dec 15, 2020 · Backend Development

From Monolith to Service Mesh: A Comprehensive Guide to Microservice Architecture Evolution

This article walks through the transformation of a simple online supermarket from a monolithic application to a fully fledged microservice architecture, covering design principles, common pitfalls, monitoring, tracing, logging, service discovery, circuit breaking, testing strategies, and the role of service meshes.

BackendMicroservicesarchitecture
0 likes · 22 min read
From Monolith to Service Mesh: A Comprehensive Guide to Microservice Architecture Evolution
Yanxuan Tech Team
Yanxuan Tech Team
Dec 14, 2020 · Operations

Mastering Stability Governance: Practical Strategies for Reliable Supply‑Chain Systems

This article examines the critical role of stability governance in evolving systems, outlines a three‑stage framework—usability, monitoring alerts, and online emergency—illustrated with a case study of an electronic waybill service, and shares concrete strategies for prevention, detection, response, and post‑mortem to achieve predictable, observable, and fast‑acting reliability.

Operationsgovernanceincident response
0 likes · 11 min read
Mastering Stability Governance: Practical Strategies for Reliable Supply‑Chain Systems
NetEase Yanxuan Technology Product Team
NetEase Yanxuan Technology Product Team
Dec 11, 2020 · Operations

How to Build Effective Stability Governance for E‑commerce Logistics Services

This article analyzes the concept of stability governance, outlines its five fault‑management sub‑domains, examines the pain points of an electronic waybill service, and presents a comprehensive three‑phase strategy—prevention, perception, reach, mitigation, and post‑mortem—backed by concrete implementation steps in availability, monitoring, and online emergency handling.

LogisticsOperationsincident response
0 likes · 12 min read
How to Build Effective Stability Governance for E‑commerce Logistics Services
iQIYI Technical Product Team
iQIYI Technical Product Team
Dec 11, 2020 · Cloud Native

iQIYI Microservice Standard Architecture: Design Principles, Components, and Practices

iQIYI’s middleware team introduced a unified microservice standard architecture—combining a single SDK, centralized infrastructure (Nacos registry, Kong gateway, Apollo config, Prometheus‑SkyWalking monitoring, ChaosBlade), the QDAS platform, and extensible open‑source practices—to eliminate redundant builds, ensure high availability, streamline governance, and pave the way for cloud‑native service‑mesh evolution.

NacosService Meshcloud-native
0 likes · 17 min read
iQIYI Microservice Standard Architecture: Design Principles, Components, and Practices
21CTO
21CTO
Dec 10, 2020 · Operations

How Netflix’s Telltale Transforms Application Monitoring and Incident Response

This article explains how Netflix built the Telltale monitoring system to consolidate data sources, provide multidimensional health assessments, deliver intelligent alerts, and streamline incident management for over 100 production applications, reducing on‑call fatigue and improving service reliability.

Netflixincident responsemonitoring
0 likes · 14 min read
How Netflix’s Telltale Transforms Application Monitoring and Incident Response
Programmer DD
Programmer DD
Dec 9, 2020 · Operations

Step-by-Step Guide to Installing Apache SkyWalking with Elasticsearch and InfluxDB

This tutorial walks through installing and configuring Apache SkyWalking, an open‑source APM system for micro‑services and cloud‑native environments, covering its architecture, Elasticsearch and InfluxDB storage setup, agent deployment, service startup, alarm integration, and essential documentation links.

APMDockerElasticsearch
0 likes · 12 min read
Step-by-Step Guide to Installing Apache SkyWalking with Elasticsearch and InfluxDB
Alibaba Cloud Developer
Alibaba Cloud Developer
Dec 8, 2020 · Operations

From Ops Engineer to Cloud Leader: 10 Years of Growth at Alibaba

This article chronicles a senior Alibaba technologist’s decade‑long journey through operations, monitoring, resource management, and product development, sharing practical insights on system automation, team leadership, career promotion, and the mindset needed to evolve from a junior engineer to a cloud‑native solutions architect.

Career DevelopmentOperationsautomation
0 likes · 21 min read
From Ops Engineer to Cloud Leader: 10 Years of Growth at Alibaba
dbaplus Community
dbaplus Community
Dec 7, 2020 · Databases

Why InfluxDB’s max‑value‑per‑tag Error Occurs and How to Resolve It

This article explains the cause of InfluxDB’s max‑value‑per‑tag error when Prometheus remote‑writes high‑cardinality tags, analyzes why the built‑in memory index triggers OOM protection, and presents three practical solutions—including moving indexes to disk, storing high‑cardinality tags as fields, and filtering them before write—to ensure stable monitoring data persistence.

Database ConfigurationInfluxDBTime Series
0 likes · 11 min read
Why InfluxDB’s max‑value‑per‑tag Error Occurs and How to Resolve It
MaGe Linux Operations
MaGe Linux Operations
Dec 3, 2020 · Cloud Native

Essential Kubernetes Tools: Deploy, Monitor, and Develop with Ease

This article introduces a curated list of Kubernetes tools—including cluster deployment solutions, monitoring utilities, CLI helpers, and development aids—explaining how each simplifies container orchestration, enhances DevOps workflows, and empowers engineers to manage, observe, and extend their Kubernetes environments efficiently.

CLI toolsCluster ManagementDevOps
0 likes · 7 min read
Essential Kubernetes Tools: Deploy, Monitor, and Develop with Ease
Programmer DD
Programmer DD
Dec 3, 2020 · Operations

Mastering Prometheus in Kubernetes: Practical Tips, Exporter Guide, and Common Pitfalls

This article shares practical experiences with Prometheus in Kubernetes, covering core principles, limitations, common exporters, metric selection, capacity planning, high‑availability strategies, query optimization, and integration with Grafana, offering actionable guidance for building reliable, scalable monitoring solutions.

ExportersGrafanaKubernetes
0 likes · 31 min read
Mastering Prometheus in Kubernetes: Practical Tips, Exporter Guide, and Common Pitfalls
IT Architects Alliance
IT Architects Alliance
Dec 2, 2020 · Operations

How to Diagnose and Optimize Business System Performance Issues

This article outlines a comprehensive process for identifying root causes of performance bottlenecks in production business systems, covering hardware, database, middleware, JVM settings, code inefficiencies, and monitoring tools, and provides practical optimization techniques for each layer.

JVMdatabasediagnostics
0 likes · 16 min read
How to Diagnose and Optimize Business System Performance Issues
Efficient Ops
Efficient Ops
Dec 1, 2020 · Operations

Zero‑Downtime Ops: Inside Tencent’s Panshi High‑Availability Platform

At the 2020 GOPS Global Operations Conference, Tencent’s senior operations engineer Xie Hailin detailed the design and implementation of the Panshi platform—a comprehensive, high‑availability solution that unifies change management, fault handling, continuous operation, and disaster recovery to ensure uninterrupted payment services for billions of daily transactions.

Operationsaiopschange management
0 likes · 24 min read
Zero‑Downtime Ops: Inside Tencent’s Panshi High‑Availability Platform
Open Source Linux
Open Source Linux
Nov 30, 2020 · Operations

Essential Linux Shell Commands for System Monitoring and Maintenance

This guide compiles a comprehensive set of Linux shell commands for deleting zero‑byte files, inspecting processes, checking CPU, memory, disk usage, network load, and other system metrics, plus a collection of useful regular expressions for text processing and validation.

LinuxSystem Administrationmonitoring
0 likes · 13 min read
Essential Linux Shell Commands for System Monitoring and Maintenance
Code Ape Tech Column
Code Ape Tech Column
Nov 27, 2020 · Operations

From Monolith to Microservices: Real‑World Lessons and Practical Strategies

This article walks through the evolution of an online supermarket from a simple monolithic website to a fully split microservice architecture, highlighting the pitfalls of ad‑hoc growth, the need for service abstraction, monitoring, tracing, fault tolerance, testing, and the trade‑offs of frameworks versus service mesh.

MicroservicesService Mesharchitecture
0 likes · 24 min read
From Monolith to Microservices: Real‑World Lessons and Practical Strategies
JD Cloud Developers
JD Cloud Developers
Nov 27, 2020 · Operations

How JD Cloud’s Log Service Powered the Record‑Breaking 11.11 Sale

During JD.com’s 11.11 Global Shopping Festival, the JD Cloud Log Service handled petabyte‑scale log data, delivering real‑time monitoring, cost‑effective storage, high‑availability architecture, circuit‑breaking, rate‑limiting, auto‑scaling and comprehensive dashboards to ensure stable operation of the massive traffic surge.

Log Servicecloud computingmonitoring
0 likes · 10 min read
How JD Cloud’s Log Service Powered the Record‑Breaking 11.11 Sale
Ops Development Stories
Ops Development Stories
Nov 27, 2020 · Operations

How to Monitor Redis with Zabbix Agent2: A Complete Guide

This article explains how to use Zabbix Agent2 to monitor Redis, covering the plugin's architecture, configuration priority, methods for retrieving INFO, CONFIG, health status, and slow‑query logs, as well as practical steps to set up the Redis template in Zabbix.

Agent2DevOpsOperations
0 likes · 9 min read
How to Monitor Redis with Zabbix Agent2: A Complete Guide
HaoDF Tech Team
HaoDF Tech Team
Nov 25, 2020 · Operations

Microservice Governance and Stability Platform at Haodf.com: Architecture, Monitoring, and SLO Design

The article presents a comprehensive case study of Haodf.com's transition to a micro‑service architecture, detailing the challenges of service stability and observability, the design of a unified governance platform with log‑holographic analysis, real‑time alerts, application profiling, SLO/SLA definition, and future roadmap for capacity and reliability improvements.

MicroservicesSLOlogging
0 likes · 16 min read
Microservice Governance and Stability Platform at Haodf.com: Architecture, Monitoring, and SLO Design
Taobao Frontend Technology
Taobao Frontend Technology
Nov 23, 2020 · Operations

Achieving 1‑5‑10 Front‑End Monitoring with JSTracker for Double‑11

This article explains how the JSTracker platform was used to build a comprehensive end‑to‑end front‑end monitoring and data analysis solution that meets the 1‑5‑10 safety production goal—detecting issues within one minute, locating them in five, and fixing them in ten—by improving coverage, subscription, metrics, and gray‑release monitoring for Alibaba’s Double‑11 promotion.

Operationsgray releaseincident response
0 likes · 15 min read
Achieving 1‑5‑10 Front‑End Monitoring with JSTracker for Double‑11
DeWu Technology
DeWu Technology
Nov 19, 2020 · Operations

HBase Operations and Use Cases for High‑Concurrency E‑commerce

In this talk, Yun Jin explains how HBase’s petabyte‑scale, horizontally‑scalable architecture—built on Hadoop, HMaster, RegionServers, and Zookeeper—enables e‑commerce platforms to handle extreme promotion‑day traffic by supporting high‑throughput reads/writes, time‑series monitoring, massive order storage, and robust HA, while covering essential table operations, monitoring, and troubleshooting techniques.

Big DataHBaseOperations
0 likes · 6 min read
HBase Operations and Use Cases for High‑Concurrency E‑commerce
Java Backend Technology
Java Backend Technology
Nov 19, 2020 · Backend Development

Why Long Database Transactions Crash Services and How to Prevent Them

The article explains how long‑running database transactions can exhaust connection pools, block threads, and cause widespread service failures, then offers practical strategies—including keeping transactions short, removing RPC calls, enhancing monitoring, and reviewing code—to detect and prevent these high‑risk issues.

Backend PerformanceDatabase Connection Poollong transactions
0 likes · 7 min read
Why Long Database Transactions Crash Services and How to Prevent Them
JD Cloud Developers
JD Cloud Developers
Nov 10, 2020 · Cloud Computing

How JD Cloud Powers the 11.11 Mega Sale: Scaling, High Availability, and Monitoring Strategies

This article reveals how JD's Zhilian Cloud prepares for the massive 11.11 shopping festival by rapidly mobilizing teams, defining protection scopes, estimating resources, implementing high‑availability across regions and AZs, applying business degradation and elastic scaling, and establishing comprehensive monitoring and rehearsal practices to ensure a smooth, resilient promotion.

Operationscloud computingmonitoring
0 likes · 13 min read
How JD Cloud Powers the 11.11 Mega Sale: Scaling, High Availability, and Monitoring Strategies
Alibaba Terminal Technology
Alibaba Terminal Technology
Nov 6, 2020 · Frontend Development

Designing a Robust Front‑End Monitoring SDK: Principles, Architecture & Implementation

This article explores the design and implementation of the Yueying front‑end monitoring SDK, covering its purpose, core design principles, module architecture, reference formats, semantic versioning, key interfaces, testing strategy, and user‑experience enhancements such as quick integration and dynamic sampling.

DesignSDKfrontend
0 likes · 10 min read
Designing a Robust Front‑End Monitoring SDK: Principles, Architecture & Implementation
IT Architects Alliance
IT Architects Alliance
Nov 3, 2020 · Backend Development

How to Learn Microservices: Learning Pyramid, Path, and Six Core Components

This article presents a structured approach to mastering microservices, covering the learning pyramid concept, a detailed learning path with resource collection, and an overview of the six essential components—service description, registry, framework, monitoring, tracing, and governance—along with practical tips and visual diagrams.

BackendLearning PathMicroservices
0 likes · 9 min read
How to Learn Microservices: Learning Pyramid, Path, and Six Core Components
Efficient Ops
Efficient Ops
Nov 1, 2020 · Databases

Why Is Redis Slowing Down? Diagnose and Fix Common Latency Issues

This article explains the typical reasons behind Redis latency spikes—such as complex commands, big keys, concentrated expirations, memory limits, fork overhead, CPU binding, AOF settings, swap usage, and network overload—and provides practical steps and monitoring techniques to identify and resolve each problem.

BigKeyLatencySlowlog
0 likes · 18 min read
Why Is Redis Slowing Down? Diagnose and Fix Common Latency Issues
Zhongtong Tech
Zhongtong Tech
Oct 30, 2020 · Big Data

How Apache Kylin Supercharged OLAP at ZTO Express: A Deep Dive

This article details ZTO Express's journey of adopting Apache Kylin for OLAP, comparing it with Presto, describing platform architecture, performance gains, integration with scheduling and monitoring systems, and the practical optimizations and future plans that enabled sub‑second query responses on massive daily data volumes.

Apache KylinBig DataHBase
0 likes · 16 min read
How Apache Kylin Supercharged OLAP at ZTO Express: A Deep Dive
Java Backend Technology
Java Backend Technology
Oct 27, 2020 · Backend Development

Master JVM Performance: Essential Tools and Real-World Usage Guide

This article explains common JVM problems such as OutOfMemoryError, memory leaks, and thread deadlocks, then introduces core monitoring tools—jps, jstack, jmap/jhat, jstat, and hprof—detailing their syntax, options, and practical examples to help Java developers diagnose and tune production applications.

HprofJVMjmap
0 likes · 15 min read
Master JVM Performance: Essential Tools and Real-World Usage Guide
dbaplus Community
dbaplus Community
Oct 22, 2020 · Operations

Choosing the Right Open‑Source Monitoring System: Zabbix, Open‑Falcon, and Prometheus Compared

This article systematically explains monitoring fundamentals, the seven core functions of a monitoring system, proper usage practices, common monitoring objects and metrics, the basic data flow, and provides detailed comparisons of three popular open‑source solutions—Zabbix, Open‑Falcon, and Prometheus—to guide informed selection decisions.

Open-FalconOperationsSystem Design
0 likes · 20 min read
Choosing the Right Open‑Source Monitoring System: Zabbix, Open‑Falcon, and Prometheus Compared
Programmer DD
Programmer DD
Oct 22, 2020 · Operations

Mastering Prometheus: Principles, Pitfalls, and Scaling Strategies

This article explores Prometheus as a cloud‑native monitoring solution, covering core principles, limitations, metric selection, exporter consolidation, Kubernetes deployment nuances, memory and storage planning, high‑availability designs, and advanced features like rate calculations, cardinality management, and predictive alerts.

HAKubernetesMetrics
0 likes · 33 min read
Mastering Prometheus: Principles, Pitfalls, and Scaling Strategies
iQIYI Technical Product Team
iQIYI Technical Product Team
Oct 16, 2020 · Cloud Native

Service Maturity Model and Optimization Practices for Microservices

The article presents iQIYI’s service‑maturity model for micro‑services, outlines how scores across development, deployment and operation stages reveal common deficiencies such as code style, testing, gray‑release and alert handling, and recommends concrete optimization practices—including unified coding standards, automated testing, robust rollback, circuit‑breaking, monitoring, and emergency procedures—to raise services to mature, high‑scoring levels.

Availabilitymonitoringservice maturity
0 likes · 15 min read
Service Maturity Model and Optimization Practices for Microservices
dbaplus Community
dbaplus Community
Oct 15, 2020 · Backend Development

Essential 2020 Backend Tech Stack: 14 Categories of Tools and Frameworks

This guide surveys over a hundred modern frameworks and tools across fourteen critical backend domains—message queues, caching, sharding, data sync, communication, micro‑services, distributed utilities, monitoring, scheduling, entry proxies, storage, CI/CD, debugging, and local utilities—offering concise recommendations and practical insights for architects and engineers.

BackendTechnology Selectionarchitecture
0 likes · 14 min read
Essential 2020 Backend Tech Stack: 14 Categories of Tools and Frameworks
Meituan Technology Team
Meituan Technology Team
Oct 15, 2020 · Artificial Intelligence

AIOps at Meituan: Architecture and Practice of Time‑Series Anomaly Detection (Part 1)

Meituan’s AIOps initiative replaces manual rule‑based monitoring with the Horae platform, which automatically classifies time‑series metrics, applies CNN and XGBoost models to detect periodic anomalies, achieves over 90 % precision in production, and paves the way for broader metric types, forecasting, and advanced fault‑localization.

HoraeMeituanOperations
0 likes · 33 min read
AIOps at Meituan: Architecture and Practice of Time‑Series Anomaly Detection (Part 1)
Liangxu Linux
Liangxu Linux
Oct 11, 2020 · Operations

Essential Linux Commands for Database Monitoring and System Management

A concise collection of Linux command‑line snippets helps you query Oracle client IPs, kill specific processes, count connections, summarize traffic, find large files, measure copy time, and monitor CPU and memory usage, all useful for DB and system administrators.

Sysadmincommandsdatabase
0 likes · 6 min read
Essential Linux Commands for Database Monitoring and System Management
ITPUB
ITPUB
Oct 9, 2020 · Operations

How to Streamline Call Center Incident Management: Practical Steps and Best Practices

This guide walks through a real‑world call‑center slowdown incident, outlines common fault‑handling techniques, proposes monitoring enhancements, details a comprehensive emergency‑response plan, and introduces intelligent event‑processing concepts to help operations teams resolve outages faster and more reliably.

Operationsautomationcall center
0 likes · 15 min read
How to Streamline Call Center Incident Management: Practical Steps and Best Practices
Youzan Coder
Youzan Coder
Oct 9, 2020 · Backend Development

Performance Optimization: Concepts, Metrics, and a Real‑World Case Study from Youzan Live Streaming

Performance optimization is a continuous, data‑driven practice that monitors response time and concurrency, applies techniques such as indexing, caching, parallelism, and asynchronous processing, and in Youzan’s live‑streaming product‑detail case reduced bottlenecks by adding multi‑level caches, circuit‑breaker fallbacks, and parallel sub‑task aggregation.

Load Testingcachingmonitoring
0 likes · 16 min read
Performance Optimization: Concepts, Metrics, and a Real‑World Case Study from Youzan Live Streaming
Liangxu Linux
Liangxu Linux
Oct 7, 2020 · Operations

Turn Shell Commands into Real‑Time Visual Dashboards with Sampler

Sampler is a lightweight tool that runs shell commands, visualizes their output, and can trigger alerts; configured via simple YAML, it works on macOS, Linux and Windows, supports various components such as runcharts, sparklines, gauges, and interactive shells for monitoring databases, queues and system metrics.

DevOpsShellYAML
0 likes · 15 min read
Turn Shell Commands into Real‑Time Visual Dashboards with Sampler
Top Architect
Top Architect
Oct 2, 2020 · Databases

Redis Performance Degradation: Common Latency Issues, Diagnosis, and Optimization

This article explains why Redis can become slow, covering typical latency causes such as high‑complexity commands, large keys, concentrated expirations, memory limits, fork overhead, CPU binding, AOF settings, swap usage, and network saturation, and provides practical troubleshooting steps and best‑practice recommendations.

Latencybest-practicesmonitoring
0 likes · 24 min read
Redis Performance Degradation: Common Latency Issues, Diagnosis, and Optimization
Aikesheng Open Source Community
Aikesheng Open Source Community
Sep 28, 2020 · Backend Development

DTLE 3.20.09.0 Release Notes – New Monitoring Features, Docker Support, and Bug Fixes

Version 3.20.09.0 of the open‑source DTLE data‑transfer component for MySQL has been released, introducing replication‑delay and memory‑usage monitoring with Prometheus, providing configuration examples and Docker commands, and fixing incremental serialization, CPU usage, and uppercase‑where clause handling.

DTLEData TransferDocker
0 likes · 5 min read
DTLE 3.20.09.0 Release Notes – New Monitoring Features, Docker Support, and Bug Fixes
Xianyu Technology
Xianyu Technology
Sep 27, 2020 · Backend Development

Design of an Asynchronous Component with Monitoring, Fault Tolerance, and Zero‑Cost Integration

The article presents a design for an asynchronous component that is monitorable, fault‑tolerant, and integrates with zero overhead, compares Akka, RxJava, and a custom JUC‑based implementation, and selects the latter—using extended Callables and a CountDownLatch—to track business units, handle timeouts, and provide fallback behavior.

AsynchronousJUCconcurrency
0 likes · 8 min read
Design of an Asynchronous Component with Monitoring, Fault Tolerance, and Zero‑Cost Integration
iQIYI Technical Product Team
iQIYI Technical Product Team
Sep 18, 2020 · Operations

Full-Chain Load Testing Practices for iQIYI Payment System

iQIYI’s payment team built a full‑chain load‑testing framework that isolates data, mocks dependencies, constructs realistic multi‑service traffic, and executes protected tests to expose bottlenecks, guide scaling and optimizations, and ultimately ensure reliable payment services during traffic spikes, while planning a unified automation platform.

Load Testingcapacity planningfull-chain testing
0 likes · 13 min read
Full-Chain Load Testing Practices for iQIYI Payment System
Top Architect
Top Architect
Sep 18, 2020 · Backend Development

Microservice Architecture Evolution: From Monolith to Service Mesh

This article walks through the evolution of an online supermarket from a simple monolithic web application to a fully decomposed microservice architecture, highlighting the challenges of scaling, the need for monitoring, tracing, service discovery, fault tolerance, and the eventual adoption of a service mesh.

BackendMicroservicesService Mesh
0 likes · 23 min read
Microservice Architecture Evolution: From Monolith to Service Mesh
JD Cloud Developers
JD Cloud Developers
Sep 15, 2020 · Databases

How JD’s HoraeDB Tackles Massive Time‑Series Data at Scale

This article introduces JD Cloud’s self‑built time‑series database HoraeDB, explaining its core concepts, typical use cases, architectural layers, high‑performance features, down‑sampling strategies, compression techniques, and stability measures for handling massive, 24‑hour monitoring data at scale.

DownsamplingTime Series Databasecompression
0 likes · 18 min read
How JD’s HoraeDB Tackles Massive Time‑Series Data at Scale
DataFunTalk
DataFunTalk
Sep 13, 2020 · Big Data

Online Sample Generation with Flink: Architecture and Implementation

This article explains why Flink is chosen for online sample generation, describes the end‑to‑end implementation steps—including stream union, state‑timer processing, and output formatting—covers state backend choices, monitoring, validation, fault handling, and platformization for scalable real‑time machine‑learning pipelines.

FlinkKafkaOnline Sample Generation
0 likes · 11 min read
Online Sample Generation with Flink: Architecture and Implementation
Java Backend Technology
Java Backend Technology
Sep 12, 2020 · Databases

Why Redis Gets Slow: Common Latency Causes and How to Diagnose Them

This article explains the typical reasons Redis latency spikes—such as high‑complexity commands, large keys, concentrated expirations, memory limits, fork overhead, CPU binding, AOF settings, swap usage, and network saturation—and provides practical steps to monitor, identify, and mitigate each issue.

Slowlogmemorymonitoring
0 likes · 18 min read
Why Redis Gets Slow: Common Latency Causes and How to Diagnose Them
ITPUB
ITPUB
Sep 11, 2020 · Blockchain

How Red Pulse Secured Its Blockchain Platform: Real‑World Attack Lessons

This article details Red Pulse's journey of integrating the NEO blockchain, the security vulnerabilities it faced—from token theft and credential‑stuffing attacks to sophisticated social‑engineering exploits—and the comprehensive technical measures, monitoring tools, and mitigation strategies it implemented to protect its platform and users.

Attack MitigationBlockchainNEO
0 likes · 21 min read
How Red Pulse Secured Its Blockchain Platform: Real‑World Attack Lessons
HaoDF Tech Team
HaoDF Tech Team
Sep 7, 2020 · Operations

Analyzing Latency and Slow Interface Detection in a Full‑Chain Monitoring System

This article explains how latency is used as a key indicator for application risk identification, defines slow interfaces, describes why percentile‑based thresholds are preferred over averages, and outlines the architecture, task workflow, and practical optimization strategies for a full‑chain monitoring system in a microservice environment.

LatencyMicroservicesSRE
0 likes · 14 min read
Analyzing Latency and Slow Interface Detection in a Full‑Chain Monitoring System
New Oriental Technology
New Oriental Technology
Sep 7, 2020 · Operations

Performance Optimization and Stability Enhancement of the Continuation Enrollment System

This article details the background, performance and stability requirements, strategic approach, and concrete initiatives—including full‑chain load testing, chaos engineering, monitoring, and targeted optimization projects—that were undertaken to boost the performance by over 300% and improve high‑availability of the continuation enrollment platform.

Load Testingbackend optimizationchaos testing
0 likes · 7 min read
Performance Optimization and Stability Enhancement of the Continuation Enrollment System
dbaplus Community
dbaplus Community
Sep 6, 2020 · Operations

Building a High‑Performance Monitoring Alert System with Akka, Dubbo, and Ignite

The article outlines G Bank’s transition from a single‑threaded commercial monitoring solution to a self‑developed, open‑source based alert system that leverages Akka for parallel collection, Apache Dubbo for distributed processing, and Apache Ignite for in‑memory storage, achieving million‑level alert capacity, sub‑100 ms latency, and linear scalability.

AkkaApache DubboApache Ignite
0 likes · 17 min read
Building a High‑Performance Monitoring Alert System with Akka, Dubbo, and Ignite
MaGe Linux Operations
MaGe Linux Operations
Sep 4, 2020 · Operations

Master Prometheus: From Basics to Full-Scale Monitoring Deployment

This guide walks through Prometheus fundamentals, architecture, components, service discovery, Docker-based deployment, exporter integration, Alertmanager configuration, Grafana visualization, PromQL queries, and Consul service discovery, providing a complete end‑to‑end monitoring solution for cloud‑native environments.

AlertmanagerConsulDocker
0 likes · 32 min read
Master Prometheus: From Basics to Full-Scale Monitoring Deployment
Alibaba Cloud Native
Alibaba Cloud Native
Sep 1, 2020 · Cloud Native

CTrip’s CDubbo Journey: Scaling 10k Services with Registration, Monitoring, and Service Mesh

From early .Net ESB attempts to a Java‑based CDubbo framework, CTrip details its migration to Dubbo, covering registration, health checks, CAT monitoring, dynamic configuration, SOA compatibility, testing tools, thread‑less execution, performance gains, extensibility, ecosystem integration, and future service‑mesh standardization.

MicroservicesRegistrationcloud-native
0 likes · 15 min read
CTrip’s CDubbo Journey: Scaling 10k Services with Registration, Monitoring, and Service Mesh
Liangxu Linux
Liangxu Linux
Aug 29, 2020 · Operations

Enforcing Clear Git Commit Messages with a Webhook‑Based Monitoring Service

This article explains why consistent Git commit messages matter, presents a detailed commit‑message format with type, scope and subject, shows how to enforce the standard using a webhook that validates messages, monitors large commits, and provides useful statistics for the development team.

code-qualitycommit messagemonitoring
0 likes · 11 min read
Enforcing Clear Git Commit Messages with a Webhook‑Based Monitoring Service
Amap Tech
Amap Tech
Aug 28, 2020 · Fundamentals

Git Commit Message Standardization and Monitoring Service

The team introduced an Angular‑style Git commit‑message standard—type(scope): subject in Chinese—and built a webhook‑based monitoring service that validates pushes, alerts violations, tracks diff size and deletions, stores metrics, and visualizes compliance, improving traceability, readability, and automated changelog generation.

DevOpsGitbest-practices
0 likes · 10 min read
Git Commit Message Standardization and Monitoring Service