Topic

monitoring

Collection size
1794 articles
Page 70 of 90
HaoDF Tech Team
HaoDF Tech Team
Nov 25, 2020 · Operations

Microservice Governance and Stability Platform at Haodf.com: Architecture, Monitoring, and SLO Design

The article presents a comprehensive case study of Haodf.com's transition to a micro‑service architecture, detailing the challenges of service stability and observability, the design of a unified governance platform with log‑holographic analysis, real‑time alerts, application profiling, SLO/SLA definition, and future roadmap for capacity and reliability improvements.

SLOloggingmonitoring
0 likes · 16 min read
Microservice Governance and Stability Platform at Haodf.com: Architecture, Monitoring, and SLO Design
HaoDF Tech Team
HaoDF Tech Team
Sep 7, 2020 · Operations

Analyzing Latency and Slow Interface Detection in a Full‑Chain Monitoring System

This article explains how latency is used as a key indicator for application risk identification, defines slow interfaces, describes why percentile‑based thresholds are preferred over averages, and outlines the architecture, task workflow, and practical optimization strategies for a full‑chain monitoring system in a microservice environment.

MicroservicesPerformance OptimizationSRE
0 likes · 14 min read
Analyzing Latency and Slow Interface Detection in a Full‑Chain Monitoring System
Liulishuo Tech Team
Liulishuo Tech Team
Aug 31, 2022 · Databases

Design and Implementation of a Distributed Time‑Series Database Based on Mimir

The article describes the motivation, requirements, and architectural design of a highly available, scalable, low‑cost distributed time‑series database built on Mimir, detailing write and read paths, multi‑tenant isolation, compaction, and the performance and cost improvements achieved after deployment.

MimirMulti-tenantPrometheus
0 likes · 8 min read
Design and Implementation of a Distributed Time‑Series Database Based on Mimir
Liulishuo Tech Team
Liulishuo Tech Team
May 26, 2021 · Operations

Custom Prometheus Monitoring Architecture and GitOps Practices at Liulishuo

This article details Liulishuo's customized Prometheus monitoring architecture, including data backup to Aliyun SLS, ECS service discovery, advanced alerting with PagerDuty and Goalert, GitOps-driven config management, cloud resource exporters, SLA monitoring, and future plans for storage and alert pipelines.

GitOpsPrometheusalerting
0 likes · 9 min read
Custom Prometheus Monitoring Architecture and GitOps Practices at Liulishuo
Liulishuo Tech Team
Liulishuo Tech Team
Feb 19, 2019 · Backend Development

My Journey as a New Backend Engineer: Project Setup, Testing Approaches, and Monitoring at FlowingTalk

Joining a new team and project as a fresh graduate at FlowingTalk, I describe the supportive environment, codebase initialization, various HTTP testing strategies using Go and Gin, the adoption of OpenCensus, Prometheus, and Sentry for monitoring, and how iterative development accelerates my growth as a backend engineer.

GoMicroservicesTesting
0 likes · 9 min read
My Journey as a New Backend Engineer: Project Setup, Testing Approaches, and Monitoring at FlowingTalk
Snowball Engineer Team
Snowball Engineer Team
Dec 22, 2022 · Frontend Development

Cross‑Platform Frontend High‑Availability, Performance Optimization and Migration at Snowball

This article details Snowball's front‑end team's end‑to‑end cross‑platform architecture, covering high‑availability monitoring, performance measurement, bundle hot‑update and splitting strategies, Hermes engine migration, stability fixes, and a systematic migration plan for RN/H5 pages, while outlining future roadmap and lessons learned.

PerformanceReact Nativecross‑platform
0 likes · 31 min read
Cross‑Platform Frontend High‑Availability, Performance Optimization and Migration at Snowball
政采云技术
政采云技术
Nov 24, 2022 · Databases

Is Redis Really Slowing Down? A Comprehensive Diagnosis and Optimization Guide

This article explains how to determine whether Redis is truly experiencing latency issues, outlines benchmark testing methods, identifies common causes such as network problems, high‑complexity commands, big keys, slow logs, memory limits, fork overhead, AOF configuration, swap usage, fragmentation, and provides practical troubleshooting and optimization steps.

OptimizationPerformancedatabase
0 likes · 26 min read
Is Redis Really Slowing Down? A Comprehensive Diagnosis and Optimization Guide
政采云技术
政采云技术
Feb 23, 2021 · Frontend Development

Capturing and Handling Frontend Exceptions

This article explains common frontend exceptions such as UI glitches, script errors, and network failures, classifies JavaScript error types, demonstrates handling techniques using try‑catch, finally, window.onerror, event listeners, Promise rejection handling, and framework‑specific solutions like React error boundaries, Vue errorHandler, and Axios interceptors.

JavaScriptReactVue
0 likes · 18 min read
Capturing and Handling Frontend Exceptions
Yang Money Pot Technology Team
Yang Money Pot Technology Team
Aug 4, 2021 · Backend Development

Design and Implementation of ylock: A Distributed ReentrantReadWriteLock Framework

This article explains the challenges of distributed locking, compares existing lock services, and details the design, implementation, and monitoring features of the ylock framework, which provides reentrant read‑write locks over Redis and Zookeeper with unified APIs and Spring Boot integration.

Distributed LockJavaSpring Boot
0 likes · 24 min read
Design and Implementation of ylock: A Distributed ReentrantReadWriteLock Framework
YunZhu Net Technology Team
YunZhu Net Technology Team
Apr 15, 2022 · Operations

Design and Architecture of a Cloud‑Native Monitoring Platform for Business Systems

The document outlines the background, vision, current status, technical research, value, product and technical architecture, and functional design of a cloud‑native monitoring platform that integrates SkyWalking and Prometheus to provide comprehensive APM, resource utilization, alerting, and rapid fault localization for business and technical middle‑platform services.

APMMetricscloud native
0 likes · 10 min read
Design and Architecture of a Cloud‑Native Monitoring Platform for Business Systems
YunZhu Net Technology Team
YunZhu Net Technology Team
Nov 5, 2021 · Backend Development

Practical Java Performance Optimization: Metrics, Bottleneck Identification, and Governance Strategies

This article shares practical Java performance‑optimization techniques, covering UI and non‑UI latency metrics, baseline data collection, bottleneck discovery with tools like Arthas, chronic issue handling, and a comprehensive set of governance measures ranging from network‑level caching to code‑level refactoring, asynchronous processing, and service splitting to achieve stable sub‑200 ms response times.

ArthasCachingJava
0 likes · 19 min read
Practical Java Performance Optimization: Metrics, Bottleneck Identification, and Governance Strategies
JD Tech
JD Tech
Mar 13, 2019 · Operations

Evolution of JD Digital Technology’s Host Monitoring System “DiTing”: From V1 to V3

The article chronicles the design, evolution, and lessons learned of JD Digital Technology’s self‑built host monitoring platform “DiTing”, detailing its initial requirements, V1 architecture, subsequent V2 and V3 redesigns, encountered challenges, and future directions toward intelligent operations.

big datacloud nativedistributed systems
0 likes · 12 min read
Evolution of JD Digital Technology’s Host Monitoring System “DiTing”: From V1 to V3
JD Tech
JD Tech
Nov 5, 2018 · Operations

Practical Guide to Elasticsearch Monitoring and Operations

This article provides a comprehensive, operations‑focused overview of Elasticsearch monitoring, covering tool selection, key metrics for black‑box and white‑box monitoring, common issues discovered through alerts, and practical optimization recommendations to ensure high availability of ES clusters.

ElasticsearchMetricsSRE
0 likes · 8 min read
Practical Guide to Elasticsearch Monitoring and Operations
JD Tech
JD Tech
Aug 13, 2018 · Backend Development

Building Scalable High‑Concurrency Backend Systems: Guarding the Baseline, Raising Throughput, and Horizontal Expansion

This article shares practical guidance on designing, protecting, and continuously improving high‑concurrency backend services—covering baseline capacity, rate limiting, data‑structure optimization, stateless architecture, and horizontal scaling—to help engineers evolve small systems into robust, production‑grade platforms.

Microservicesbackendhigh concurrency
0 likes · 8 min read
Building Scalable High‑Concurrency Backend Systems: Guarding the Baseline, Raising Throughput, and Horizontal Expansion
JD Tech
JD Tech
Jul 5, 2018 · Backend Development

Design and Optimization of JD's High‑Availability Open Gateway System

This article describes how JD's open gateway handles billions of requests during major sales events by employing a multi‑layer architecture, Nginx + Lua unified access, NIO asynchronous processing, service isolation, dynamic routing, degradation, rate‑limiting, circuit‑breaking, fast‑fail mechanisms, and comprehensive monitoring to ensure high performance and reliability.

Circuit BreakingNginxasynchronous processing
0 likes · 16 min read
Design and Optimization of JD's High‑Availability Open Gateway System
JD Tech
JD Tech
Jun 14, 2018 · Operations

Design and Implementation of a Lightweight Service Monitoring and Traffic Management System

This article shares the design and implementation of a lightweight, robust, and low‑intrusion monitoring management system for microservice traffic, detailing data collection via client filters, Redis‑based structured storage, alerting, rate‑limiting, degradation, and authorization mechanisms, and discusses performance optimizations and future improvements.

Microservicesmonitoringoperations
0 likes · 11 min read
Design and Implementation of a Lightweight Service Monitoring and Traffic Management System
JD Tech
JD Tech
Feb 28, 2018 · Operations

CallGraph: JD.com's Distributed Tracing and Service Governance Platform

CallGraph is JD.com's internally developed distributed tracing and service governance platform that addresses the challenges of monitoring complex microservice architectures by providing low‑intrusion, low‑latency tracing, real‑time analytics, configurable sampling, and integration with JMQ, Storm, Spark, HBase, and JimDB for both operational insight and performance optimization.

Distributed TracingMicroservicesbig data
0 likes · 12 min read
CallGraph: JD.com's Distributed Tracing and Service Governance Platform