Tagged articles

2179 articles

Page 13 of 22

Oct 15, 2021 · Operations

How Unified Observability Transforms Quality Management in Cloud‑Native Environments

This article explores the challenges of quality monitoring in cloud‑native DevOps pipelines, outlines pain points of massive heterogeneous logs and alerts, and presents a unified observability platform that enables data consolidation, AI‑driven intelligent inspection, and smart alert management to improve system reliability.

AIAlertingData Unification

0 likes · 17 min read

How Unified Observability Transforms Quality Management in Cloud‑Native Environments

IT Architects Alliance

Oct 13, 2021 · Backend Development

Understanding Microservices Architecture: Core Concepts, Benefits, and Implementation Practices

This article provides a comprehensive overview of microservices architecture, covering its definition, key characteristics, advantages and drawbacks, suitable organizational contexts, core components such as service discovery, gateways, configuration centers, monitoring, circuit breaking, as well as containerization and orchestration technologies.

Backend ArchitectureCloud NativeMicroservices

0 likes · 16 min read

Understanding Microservices Architecture: Core Concepts, Benefits, and Implementation Practices

Rare Earth Juejin Tech Community

Oct 12, 2021 · Frontend Development

Frontend Monitoring Platform: Data Collection and Reporting Techniques

This article explains the data collection and reporting component of a complete frontend monitoring platform, detailing performance metrics such as FP, FCP, LCP, CLS, and providing practical JavaScript code examples for measuring, observing, and reporting these metrics, along with error and behavior monitoring techniques.

error trackingfrontendmonitoring

0 likes · 28 min read

Frontend Monitoring Platform: Data Collection and Reporting Techniques

Open Source Linux

Oct 11, 2021 · Operations

10 Essential Ops Principles Every Engineer Should Follow

This article shares ten practical operations guidelines—from avoiding duplicated work and embracing mistakes to emphasizing monitoring, backup roles, clear division of labor, and continuous improvement—aimed at boosting reliability, efficiency, and team cohesion for both engineers and managers.

OperationsReliabilitybest practices

0 likes · 10 min read

10 Essential Ops Principles Every Engineer Should Follow

Java Architect Essentials

Oct 10, 2021 · Operations

Guide to Using Nginx‑GUI for Visual Configuration, Performance Monitoring and Log Management

This article introduces Nginx‑GUI, explains its requirements and current implementation for configuration and performance monitoring, provides step‑by‑step installation and configuration instructions with code snippets, and lists the features already realized and the remaining challenges such as log analysis and traffic statistics.

ConfigurationGUILinux

0 likes · 4 min read

Guide to Using Nginx‑GUI for Visual Configuration, Performance Monitoring and Log Management

DevOps Cloud Academy

Oct 9, 2021 · Cloud Native

Serverless Application DevOps: Latest Practices and Implementation Guide

This article presents a comprehensive overview of serverless application DevOps, covering the definition, benefits, common use cases, development workflow, container image deployment, CI/CD pipelines with AWS SAM, security strategies, monitoring tools, and real‑world examples such as Coca‑Cola.

AWS LambdaCloud NativeDevOps

0 likes · 13 min read

Serverless Application DevOps: Latest Practices and Implementation Guide

Tencent Cloud Developer

Oct 8, 2021 · Operations

Unveiling Kafka’s Controller: Architecture, Election, and Monitoring Deep Dive

This article provides a comprehensive technical analysis of Kafka’s Controller component, covering its background, core responsibilities, data storage, election process, version‑specific improvements, monitoring techniques, and key source‑code excerpts to help engineers understand and manage Kafka clusters effectively.

Cluster ManagementControllerDistributed Systems

0 likes · 27 min read

Unveiling Kafka’s Controller: Architecture, Election, and Monitoring Deep Dive

MaGe Linux Operations

Oct 6, 2021 · Operations

How to Accelerate Call Center Incident Resolution with Smart Monitoring and Automation

This article outlines a comprehensive approach to handling call‑center incidents, covering common troubleshooting steps, proactive monitoring enhancements, well‑structured emergency plans, and intelligent event‑driven automation to reduce downtime and improve operational efficiency.

Operationsautomationcall center

0 likes · 12 min read

How to Accelerate Call Center Incident Resolution with Smart Monitoring and Automation

Programmer DD

Oct 5, 2021 · Operations

Essential DevOps Toolchain: 13 Must‑Have Tool Categories Explained

This article outlines the core technology categories and specific tools—planning, issue tracking, source control, build, testing, CI/CD, configuration management, cloud platforms, container orchestration, monitoring, communication, and knowledge sharing—that together enable teams to implement DevOps practices effectively and deliver value sustainably.

Configuration ManagementDevOpscontinuous integration

0 likes · 30 min read

Essential DevOps Toolchain: 13 Must‑Have Tool Categories Explained

ByteFE

Sep 30, 2021 · Frontend Development

A Practical Guide to Chrome Performance Tools and the Performance API

This article introduces Chrome's built‑in Performance panel, explains how to use the W3C Performance API for custom metric collection, compares third‑party auditing tools, and demonstrates a real‑world optimization case to help front‑end developers diagnose and improve page load speed.

APIChromemonitoring

0 likes · 16 min read

A Practical Guide to Chrome Performance Tools and the Performance API

Top Architect

Sep 30, 2021 · Backend Development

Spring Boot Actuator: Quick Start, Endpoint Overview, and Security Integration

This article introduces Spring Boot Actuator, explains how to create a demo project with Maven or Gradle, details the most important built‑in endpoints such as /health, /metrics, /loggers, /info, /beans, /heapdump, /threaddump and /shutdown, and shows how to secure them with Spring Security, providing configuration snippets and code examples.

ActuatorEndpointsSpring Boot

0 likes · 14 min read

Spring Boot Actuator: Quick Start, Endpoint Overview, and Security Integration

Aikesheng Open Source Community

Sep 28, 2021 · Operations

Common DBLE Operational Commands for Monitoring, Diagnosis, and Maintenance

This article provides a comprehensive guide to DBLE's built‑in commands for viewing system information, diagnosing faults, and performing maintenance tasks such as killing connections, reloading configurations, and managing sharding nodes, helping MySQL DBAs efficiently operate distributed database clusters.

DBLEOperationsdiagnostics

0 likes · 8 min read

Common DBLE Operational Commands for Monitoring, Diagnosis, and Maintenance

Open Source Linux

Sep 27, 2021 · Operations

Step-by-Step Guide to Installing Zabbix 5 on CentOS 7

This article provides a comprehensive, hands‑on tutorial for installing and configuring Zabbix 5 on CentOS 7, covering system overview, key terminology, disabling SELinux and firewalls, setting up repositories, installing server, agent, frontend, MariaDB, database initialization, configuration tweaks, and final web‑UI setup.

CentOSInstallationOperations

0 likes · 9 min read

Step-by-Step Guide to Installing Zabbix 5 on CentOS 7

DevOps Cloud Academy

Sep 27, 2021 · Operations

Understanding Prometheus Relabeling: Rules, Actions, and Practical Use Cases

This article explains how Prometheus relabeling works, covering the purpose of relabeling, hidden and meta labels, the various actions such as replace, keep, drop, labelmap, labelkeep, labeldrop, and hashmod, and provides concrete configuration examples for common monitoring scenarios.

ConfigurationKubernetesMetrics

0 likes · 15 min read

Understanding Prometheus Relabeling: Rules, Actions, and Practical Use Cases

dbaplus Community

Sep 27, 2021 · Operations

6 Powerful Alternatives to Prometheus for Kubernetes Monitoring

Monitoring ensures Kubernetes applications run smoothly, and while Prometheus is a popular open‑source solution, this article examines six viable alternatives—Grafana, cAdvisor, Fluentd, Jaeger, Telepresence, and Zabbix—detailing their key features, strengths, and use‑cases for effective cluster observability.

FluentdGrafanaKubernetes

0 likes · 10 min read

6 Powerful Alternatives to Prometheus for Kubernetes Monitoring

Programmer DD

Sep 27, 2021 · Operations

Mastering Redis Sentinel: Setup, Failover, and Multi‑Sentinel Configuration

This guide explains what Redis Sentinel is, its architecture, how to configure it in a master‑replica environment, and demonstrates handling both slave and master failures with detailed log examples, plus tips for running multiple Sentinel instances for high availability.

failoverhigh availabilitymonitoring

0 likes · 11 min read

Mastering Redis Sentinel: Setup, Failover, and Multi‑Sentinel Configuration

Efficient Ops

Sep 26, 2021 · Cloud Native

How to Stabilize Your Kubernetes Clusters: CI/CD, Monitoring, Logging, and Docs

This article analyzes why our Kubernetes clusters were constantly unstable—citing an erratic release process, missing monitoring, logging, documentation, and unclear request routing—and presents a comprehensive solution that includes a Kubernetes‑centric CI/CD pipeline, federated monitoring, centralized logging, a documentation hub, and integrated traffic management.

Cloud NativeDevOpsci/cd

0 likes · 8 min read

How to Stabilize Your Kubernetes Clusters: CI/CD, Monitoring, Logging, and Docs

ITFLY8 Architecture Home

Sep 26, 2021 · Backend Development

Designing a High‑Performance API Gateway for Microservices: Architecture & Key Features

This article details the architecture, core functions, and implementation techniques of a reactive, RxNetty‑based API gateway that handles request dispatch, conditional routing, API management, rate limiting, circuit breaking, security policies, and monitoring within a microservices ecosystem.

Microservicesapi-gatewaymonitoring

0 likes · 12 min read

Designing a High‑Performance API Gateway for Microservices: Architecture & Key Features

dbaplus Community

Sep 22, 2021 · Databases

Mastering Redis Monitoring: Key Metrics, Commands, and Performance Testing

Learn the essential Redis monitoring metrics—including performance, memory, activity, persistence, and error indicators—along with the commands and tools such as redis-cli, INFO, SLOWLOG, and redis-benchmark to collect, interpret, and act on these metrics for effective database operations.

ErrorPersistencememory

0 likes · 8 min read

Mastering Redis Monitoring: Key Metrics, Commands, and Performance Testing

Liangxu Linux

Sep 19, 2021 · Operations

Master Linux System Info with Inxi: Install, Configure, and Use

This guide explains what the lightweight inxi utility does, how to install it via package managers or source, and demonstrates its various options for displaying system, hardware, network, disk, memory, weather, and color‑customized information on Linux.

LinuxSystem Informationinxi

0 likes · 6 min read

Master Linux System Info with Inxi: Install, Configure, and Use

Baidu Intelligent Testing

Sep 16, 2021 · Operations

Baidu Game Microservice Monitoring Practice: System Design and Evolution

This article describes Baidu's game microservice monitoring practice, detailing the initial challenges, system design, risk control, intelligent monitoring, multi‑dimensional visualization, smart alerting, and efficient fault localization, illustrating how a systematic approach improves detection speed, coverage, and issue resolution for large‑scale online games.

AlertingGame Developmentmonitoring

0 likes · 12 min read

Baidu Game Microservice Monitoring Practice: System Design and Evolution

Ops Development Stories

Sep 16, 2021 · Cloud Native

Master Kubernetes: A Step‑by‑Step Learning Roadmap for Beginners

This guide walks beginners through a structured learning path for Kubernetes, covering fundamentals, core components, key objects, controllers, storage, networking, resource management, security, cluster operations, backup, logging, monitoring, DevOps practices, and deeper topics like architecture, source code, and operator development.

BackupCloud NativeDevOps

0 likes · 16 min read

Master Kubernetes: A Step‑by‑Step Learning Roadmap for Beginners

Efficient Ops

Sep 14, 2021 · Cloud Native

Master Kubernetes: A Step‑by‑Step Learning Roadmap for Beginners

This comprehensive guide walks beginners through Kubernetes fundamentals, core components, key objects, storage, networking, resource management, security, cluster operations, backup, logging, monitoring, DevOps practices, and deep‑dive techniques, providing a clear learning path and practical tips for effective use.

Cloud NativeDevOpscontainer orchestration

0 likes · 16 min read

dbaplus Community

Sep 13, 2021 · Operations

How to Stabilize a Failing Kubernetes Cluster: CI/CD, Monitoring, Logging, and Docs

This article analyzes why a company's Kubernetes clusters were constantly on the brink of failure and presents a comprehensive solution covering CI/CD pipeline reconstruction, federated monitoring with Prometheus, centralized logging via Elasticsearch, documentation centralization, and clarified request routing to achieve high reliability.

Kubernetesci/cdcluster stability

0 likes · 9 min read

How to Stabilize a Failing Kubernetes Cluster: CI/CD, Monitoring, Logging, and Docs

Architect's Tech Stack

Sep 12, 2021 · Backend Development

Spring Boot Actuator: Quick Start, Endpoint Overview, and Monitoring Configuration

This article provides a comprehensive guide to using Spring Boot Actuator for microservice monitoring, covering its purpose, quick setup with Maven/Gradle, detailed explanations of key endpoints such as /health, /metrics, /loggers, and integration with Spring Security for secure access.

Actuatorbackend-developmentjava

0 likes · 13 min read

Spring Boot Actuator: Quick Start, Endpoint Overview, and Monitoring Configuration

WeChat Client Technology Team

Sep 8, 2021 · Mobile Development

Uncovering Hidden Android Thread Pitfalls: Memory Leaks, Monitoring, and Hook Solutions

This article explores obscure Android thread issues—including uncontrolled thread creation, stack memory leaks, and the impact of thread‑priority settings—while presenting monitoring techniques, a pthread hook implementation, and performance considerations to help developers detect and resolve thread‑related crashes.

AndroidHookMemory Management

0 likes · 15 min read

Uncovering Hidden Android Thread Pitfalls: Memory Leaks, Monitoring, and Hook Solutions

Open Source Linux

Sep 6, 2021 · Operations

How to Diagnose Linux Server Issues in the First 60 Seconds with 10 Essential Commands

This article explains how Netflix's performance team uses ten standard Linux command‑line tools—uptime, dmesg, vmstat, mpstat, pidstat, iostat, free, sar, and top—to quickly assess system health, resource saturation and errors within the first minute of a performance incident.

ServerSysadmincommand-line

0 likes · 18 min read

How to Diagnose Linux Server Issues in the First 60 Seconds with 10 Essential Commands

Efficient Ops

Sep 5, 2021 · Operations

Why Prometheus’s TSDB Makes Monitoring Scalable: A Deep Dive

This article explains how Prometheus’s time‑series database handles massive monitoring data, illustrates practical query examples, and shows why its storage engine and pre‑computation features enable efficient, high‑performance observability for large‑scale services.

PrometheusTSDBTime Series Database

0 likes · 8 min read

Why Prometheus’s TSDB Makes Monitoring Scalable: A Deep Dive

ITFLY8 Architecture Home

Sep 5, 2021 · Cloud Native

From Rookie to Cloud‑Native Architect: Building an Enterprise Kubernetes Cluster

Over the past year, the author chronicles a hands‑on journey from a fresh graduate to a cloud‑native specialist, detailing the design and implementation of an enterprise‑grade Kubernetes architecture—including multi‑cluster logging, CI/CD pipelines, Istio service mesh, monitoring, and private‑deployment strategies—while sharing practical lessons learned.

Cloud NativeKubernetesService Mesh

0 likes · 13 min read

From Rookie to Cloud‑Native Architect: Building an Enterprise Kubernetes Cluster

DeWu Technology

Sep 3, 2021 · Operations

Live Streaming Service Monitoring and Alert Attribution Practice

The article outlines a systematic approach for quickly attributing live‑streaming service alerts—combining consolidated knowledge, log and trace analysis, and a decision‑tree workflow—to pinpoint root causes such as resource limits or mesh overload, illustrated by a real RT‑jitter case and emphasizing deep architectural understanding.

alert attributionmonitoringtroubleshooting

0 likes · 8 min read

Live Streaming Service Monitoring and Alert Attribution Practice

Top Architect

Sep 2, 2021 · Cloud Native

Designing a Stable Backend Architecture: CI/CD, Federated Monitoring, Logging, Documentation, and Traffic Management on Kubernetes

The article analyzes why a company's clusters were unstable—unstable release process, missing monitoring and logging, insufficient documentation, and unclear request routing—and proposes a comprehensive solution built around Kubernetes‑centric CI/CD, a federated Prometheus monitoring platform, Elasticsearch logging, centralized documentation, and Kong/Istio traffic management.

Backend ArchitectureCloud NativeDocumentation

0 likes · 9 min read

Designing a Stable Backend Architecture: CI/CD, Federated Monitoring, Logging, Documentation, and Traffic Management on Kubernetes

Ops Development Stories

Sep 1, 2021 · Operations

How to Build High‑Quality Prometheus Exporters: From Basics to Custom Go Implementations

This article explains the concept of Prometheus exporters, details the four metric types, provides step‑by‑step Go code for creating custom exporters and collectors, and outlines best practices for designing robust, production‑ready monitoring exporters.

ExporterGoPrometheus

0 likes · 20 min read

How to Build High‑Quality Prometheus Exporters: From Basics to Custom Go Implementations

Aikesheng Open Source Community

Aug 31, 2021 · Operations

Building a DTLE Monitoring System with Prometheus and Grafana (DTLE 3.21.07.0)

This tutorial walks through setting up a DTLE 3.21.07.0 monitoring environment by configuring DTLE and Nomad metrics, deploying Prometheus and Grafana via Docker, and creating common monitoring panels such as CPU, memory, bandwidth, latency, and TPS using PromQL.

DTLEDockerGrafana

0 likes · 7 min read

Building a DTLE Monitoring System with Prometheus and Grafana (DTLE 3.21.07.0)

Java Architect Essentials

Aug 30, 2021 · Databases

How to Monitor and Optimize Redis Performance

This article explains how to use Redis INFO commands to track memory usage, command processing, latency, key eviction and fragmentation, and provides practical tips such as adjusting maxmemory, using hash structures, pipelines, and slowlog to diagnose and improve Redis performance.

LatencyOpsmemory

0 likes · 23 min read

How to Monitor and Optimize Redis Performance

High Availability Architecture

Aug 30, 2021 · Backend Development

Hulk: A Go‑Based Web Service Framework for Short‑Video Backend Development

The article introduces Hulk, a Go service development framework created by the short‑video R&D team to replace PHP monoliths, outlines its background, design principles, component hierarchy, comparison with GDP2, and demonstrates how its built‑in monitoring, configuration, and tooling improve code quality, development speed, and SRE efficiency across Baidu’s short‑video services.

BackendDevOpsFramework

0 likes · 17 min read

Hulk: A Go‑Based Web Service Framework for Short‑Video Backend Development

Ops Development Stories

Aug 27, 2021 · Operations

Inside Prometheus Alerting Rules: How They’re Managed and Executed

This article explains Prometheus' custom Rule system, detailing the structure and components of alerting rules, the rule manager's loading and updating process, group scheduling, evaluation cycles, and the logic for generating, updating, and sending alerts, enabling advanced monitoring extensions.

Alerting RulesGoPrometheus

0 likes · 21 min read

Inside Prometheus Alerting Rules: How They’re Managed and Executed

Open Source Linux

Aug 24, 2021 · Operations

Why Prometheus Became the Leading Cloud‑Native Monitoring Solution

This article explains how Prometheus evolved from a Google internal project to a CNCF‑graduated, top‑ranked time‑series database and full‑stack monitoring ecosystem, detailing its history, core features, architecture, and the roles of its components such as Exporters, Pushgateway, Service Discovery, and Alertmanager.

PrometheusTime Series Databasecloud-native

0 likes · 19 min read

Why Prometheus Became the Leading Cloud‑Native Monitoring Solution

Full-Stack DevOps & Kubernetes

Aug 23, 2021 · Operations

Mastering Prometheus on Kubernetes: From Basics to Advanced Alerting

This guide provides a comprehensive walkthrough of Prometheus fundamentals, component architecture, deployment patterns, node‑exporter setup, Grafana integration, kube‑state‑metrics, and detailed Alertmanager configuration for Kubernetes monitoring.

HA deploymentKuberneteskube-state-metrics

0 likes · 37 min read

Mastering Prometheus on Kubernetes: From Basics to Advanced Alerting

Java Architect Essentials

Aug 22, 2021 · Operations

Exposing Spring Boot Metrics with Actuator and Monitoring via Prometheus and Grafana

This tutorial demonstrates how to add Actuator dependencies to a Spring Boot 1.5.7 application, expose Prometheus‑compatible metrics, collect them with a Dockerized Prometheus instance, and visualize the data using Grafana, including all required configuration files and code snippets.

ActuatorGrafanaPrometheus

0 likes · 5 min read

Exposing Spring Boot Metrics with Actuator and Monitoring via Prometheus and Grafana

dbaplus Community

Aug 22, 2021 · Operations

Master Elasticsearch Performance: Memory, CPU, Shards, and Cluster Tuning

This guide presents practical best‑practice configurations for Elasticsearch clusters in production, covering JVM heap sizing, CPU thread‑pool tuning, optimal shard counts, replica strategies, hot‑warm node architecture, node role settings, common troubleshooting tips, cache handling, refresh intervals, and essential monitoring APIs.

ClusterElasticsearchShards

0 likes · 14 min read

Master Elasticsearch Performance: Memory, CPU, Shards, and Cluster Tuning

Youzan Coder

Aug 19, 2021 · Mobile Development

Thread Pool Isolation and Monitoring Design for Mobile Applications

The design separates the original I/O pool into dedicated network, I/O, and polling thread pools, adds comprehensive monitoring of task duration and frequency, enforces unified polling rules, and automatically tunes pool parameters, resulting in a 76 % reduction in UI lag and easier troubleshooting.

PollingRxJavamobile performance

0 likes · 12 min read

Thread Pool Isolation and Monitoring Design for Mobile Applications

Baidu Geek Talk

Aug 18, 2021 · Backend Development

Hulk: A Go Web Service Framework for Short‑Video Backend Development

Hulk is a Go‑based web service framework created by the short‑video R&D team to replace a PHP monolith, extending the unreleased GDP2 platform with business‑specific wrappers, a four‑layer architecture, and integrated monitoring, tracing, and deployment tools that dramatically boost development speed, runtime performance, and SRE efficiency for high‑traffic short‑video services.

BackendDevOpsFramework

0 likes · 18 min read

Hulk: A Go Web Service Framework for Short‑Video Backend Development

Java Architecture Diary

Aug 18, 2021 · Frontend Development

Explore Grafana’s New Geomap Panel: Advanced Mapping Features & Customizations

This article introduces Grafana v8.1’s Geomap panel, detailing its multiple base‑layer options, enhanced custom markers, heatmap layer, flexible data‑layer mappings, and a new sharing feature that lets several map panels synchronize on a single dashboard.

DashboardGrafanaMapping

0 likes · 6 min read

Explore Grafana’s New Geomap Panel: Advanced Mapping Features & Customizations

Tencent Cloud Developer

Aug 17, 2021 · Backend Development

Design and Implementation of a Calculation DSL and Engine

The article presents a domain‑specific language that mimics Excel formulas, a stack‑based parser and recursive engine for evaluating calculations, and a multi‑layer architecture—including a dynamic priority scheduler—to efficiently resolve field dependencies, improve maintainability, and enable monitoring across large data systems.

Calculation EngineDSLbackend-development

0 likes · 11 min read

Design and Implementation of a Calculation DSL and Engine

dbaplus Community

Aug 16, 2021 · Operations

Essential Kubernetes Ops: Node, Pod, Logging, and Monitoring Best Practices

This guide outlines practical steps for maintaining Kubernetes nodes, configuring pods, standardizing logging, and implementing effective monitoring and alerting to ensure stable, secure, and observable workloads in production environments.

KubernetesNode ManagementPod Configuration

0 likes · 19 min read

Essential Kubernetes Ops: Node, Pod, Logging, and Monitoring Best Practices

MaGe Linux Operations

Aug 14, 2021 · Operations

Boost System Reliability: 4 Proven Practices to Master Observability

This article explains why observability is essential for DevOps, outlines four key practices—including production‑environment monitoring, structured logging, a DevOps‑focused culture, and pre‑deployment observability with remote debugging—to help teams detect, diagnose, and prevent issues throughout the software lifecycle.

CultureDevOpsci/cd

0 likes · 9 min read

Boost System Reliability: 4 Proven Practices to Master Observability

Cloud Native Technology Community

Aug 13, 2021 · Cloud Native

Sysdig 2021 Container Security and Usage Report – Top Open‑Source Solutions, Metrics, and Kubernetes Trends

The Sysdig 2021 report analyzes container usage across thousands of customers, highlighting the most popular open‑source services, the rise of Go and Prometheus, container density and image size trends, alert strategies, and detailed Kubernetes adoption patterns in cloud‑native environments.

ContainersKubernetescloud-native

0 likes · 12 min read

Sysdig 2021 Container Security and Usage Report – Top Open‑Source Solutions, Metrics, and Kubernetes Trends

ByteDance SE Lab

Aug 13, 2021 · Operations

Designing an Effective Full‑Link Load‑Testing Strategy for High‑Traffic Systems

This guide explains how to plan, configure, and execute full‑link performance testing—including network architecture, testing objectives, environment isolation, platform setup, various load‑generation methods, monitoring, and post‑test analysis—to ensure reliable, scalable services under heavy traffic.

AnalysisLoad Testingfull‑link

0 likes · 15 min read

Designing an Effective Full‑Link Load‑Testing Strategy for High‑Traffic Systems

Volcano Engine Developer Services

Aug 11, 2021 · Big Data

How Volcengine Solves Big Data Quality Challenges with a Unified Stream‑Batch Platform

Volcengine’s Data Quality Platform bridges the gap between data validation and resource‑intensive computation in large‑scale environments, offering unified stream‑batch monitoring, data exploration, comparison, and alerting across Hive, ClickHouse, Kafka, and more, while addressing scalability, latency, and resource optimization challenges.

Big DataData Qualitymonitoring

0 likes · 19 min read

How Volcengine Solves Big Data Quality Challenges with a Unified Stream‑Batch Platform

Top Architect

Aug 10, 2021 · Operations

Building and Using an ELK Real‑Time Log Analysis Platform

This tutorial explains how to set up a real‑time ELK log analysis platform, covering the architecture of Elasticsearch, Logstash and Kibana, detailed installation commands, configuration for Spring Boot and Nginx logs, and how to run the stack continuously with Supervisor.

ELKElasticsearchKibana

0 likes · 18 min read

Building and Using an ELK Real‑Time Log Analysis Platform

ITFLY8 Architecture Home

Aug 10, 2021 · R&D Management

How to Build a High‑Performance, Rapid‑Response Tech Team: Integrated R&D System

This article outlines a comprehensive approach for technology leaders to create a fast‑responding, high‑efficiency engineering team by integrating business, technology, monitoring, operations, and management practices, detailing goals, frameworks, and measurable outcomes.

R&D managementTech Integrationmonitoring

0 likes · 14 min read

How to Build a High‑Performance, Rapid‑Response Tech Team: Integrated R&D System

Full-Stack DevOps & Kubernetes

Aug 6, 2021 · Cloud Native

Master Kubernetes CPU & Memory: Requests, Limits, Quotas, and LimitRanges

This guide explains how to allocate CPU and memory to containers and pods in Kubernetes, clarifies resource units, demonstrates practical Deployment, ResourceQuota, and LimitRange configurations with YAML examples, and lists useful tools for monitoring and enforcing resource constraints.

CPUCloud NativeKubernetes

0 likes · 8 min read

Master Kubernetes CPU & Memory: Requests, Limits, Quotas, and LimitRanges

Programmer DD

Aug 3, 2021 · Databases

Integrate InfluxDB with Spring Boot: A Step-by-Step Guide to Time‑Series Data

Learn how to configure and use the open‑source time‑series database InfluxDB in a Spring Boot application, covering essential concepts, dependency setup, property configuration, scheduled data writing, and verification via InfluxDB CLI, with complete code examples.

InfluxDBSpring BootTime Series Database

0 likes · 8 min read

Integrate InfluxDB with Spring Boot: A Step-by-Step Guide to Time‑Series Data

Xianyu Technology

Jul 29, 2021 · Mobile Development

How Xianyu Tackles Android ANR: Monitoring, Diagnosis, and Optimization Strategies

This article explains how Xianyu identifies, monitors, and resolves Android ANR issues by analyzing root causes, implementing SIGQUIT‑based detection, inspecting thread stacks, and applying concrete optimizations such as SharedPreferences replacement, network broadcast caching, and delayed component registration, ultimately cutting ANR rates by more than half.

ANRAndroidMobile Development

0 likes · 11 min read

How Xianyu Tackles Android ANR: Monitoring, Diagnosis, and Optimization Strategies

Full-Stack Internet Architecture

Jul 28, 2021 · Operations

Common Open‑Source Tools for MySQL Operations and Maintenance

This article introduces a curated list of open‑source MySQL operational tools—including online DDL changers, backup and restore utilities, load‑testing frameworks, flashback solutions, slow‑query analyzers, replication consistency checkers, audit platforms, and graphical clients—explaining their principles, usage scenarios, and visual references.

BackupOperationsReplication

0 likes · 8 min read

Common Open‑Source Tools for MySQL Operations and Maintenance

Open Source Linux

Jul 27, 2021 · Operations

How to Effectively Locate and Debug Production Issues Using Logs and Remote Debugging

This guide walks beginners through understanding logs, using them for error tracing, applying monitoring and alerts, and performing remote debugging to quickly pinpoint and resolve production problems, emphasizing practical steps and best practices for reliable system maintenance.

Operationsdebuggingmonitoring

0 likes · 7 min read

How to Effectively Locate and Debug Production Issues Using Logs and Remote Debugging

dbaplus Community

Jul 26, 2021 · Operations

Top Open‑Source Tools Every SRE Should Know for Monitoring, Chaos Engineering, and Reliability

This article introduces a curated list of popular open‑source projects for SRE and DevOps, covering monitoring, deployment, chaos engineering, and reliability tools such as Cloudprober, Istio, Checkov, Litmus, Locust, Prometheus, and more, highlighting their key features and practical use cases.

KubernetesSREmonitoring

0 likes · 10 min read

Top Open‑Source Tools Every SRE Should Know for Monitoring, Chaos Engineering, and Reliability

IT Architects Alliance

Jul 25, 2021 · Backend Development

Comprehensive Guide to Building a Backend Technology Stack for Startups

This article outlines a complete backend technology stack for startups, covering language choices, core components, processes, systemization, and detailed selections for project management, DNS, load balancing, CDN, RPC frameworks, service discovery, databases, messaging, logging, monitoring, configuration, deployment, and operational best practices.

BackendCICDDevOps

0 likes · 28 min read

Comprehensive Guide to Building a Backend Technology Stack for Startups

Architects' Tech Alliance

Jul 24, 2021 · Backend Development

How to Build a Scalable Backend Stack for Startups: Languages, Components, and Best Practices

This guide outlines a comprehensive backend technology stack for startups, covering language choices, core components, development processes, infrastructure services, database options, monitoring, CI/CD, and operational best practices to help teams design, select, and implement a reliable server-side architecture.

BackendOperationsTechnology Stack

0 likes · 31 min read

How to Build a Scalable Backend Stack for Startups: Languages, Components, and Best Practices

GrowingIO Tech Team

Jul 22, 2021 · Databases

How to Diagnose and Fix Common HBase RegionServer Crashes

This article examines frequent HBase RegionServer failures caused by long GC pauses, oversized scans, and HDFS decommissioning, outlines step‑by‑step troubleshooting procedures—including log searches, GC tuning, scan size limits, and monitoring strategies—and provides practical solutions to prevent and resolve these issues.

HBaseRegionServergc

0 likes · 14 min read

How to Diagnose and Fix Common HBase RegionServer Crashes

Tencent Cloud Developer

Jul 22, 2021 · Operations

Observability in Serverless Environments: Monitoring, Logging, Distributed Tracing, and Best Practices

In this talk, Gal Bashan explains how serverless architectures complicate observability and why metrics, logs, and especially distributed tracing with tools like OpenTelemetry, Jaeger, or commercial platforms are essential for gaining end-to-end visibility, automating instrumentation, and maintaining reliable, business-focused services across cloud providers.

Cloud NativeDistributed TracingServerless

0 likes · 12 min read

Observability in Serverless Environments: Monitoring, Logging, Distributed Tracing, and Best Practices

Ops Development Stories

Jul 22, 2021 · Operations

How to Diagnose Linux Server Performance Issues in 60 Seconds with 10 Essential Commands

Learn to quickly pinpoint Linux server bottlenecks by running ten powerful commands—uptime, dmesg, vmstat, mpstat, pidstat, iostat, free, sar, and top—within a minute, interpreting their outputs using the USE method to assess utilization, saturation, and errors across CPU, memory, disk, and network resources.

LinuxSystem AdministrationUSE method

0 likes · 20 min read

How to Diagnose Linux Server Performance Issues in 60 Seconds with 10 Essential Commands

21CTO

Jul 21, 2021 · Backend Development

How Our Reactive API Gateway Powers Microservices with RxNetty

This article outlines the design and implementation of a high‑performance, reactive API gateway built on RxNetty, detailing its overall architecture, request routing, conditional routing, API management, rate‑limiting, circuit breaking, security policies, monitoring, tracing, and future enhancements within a microservices ecosystem.

MicroservicesRxNettyapi-gateway

0 likes · 12 min read

How Our Reactive API Gateway Powers Microservices with RxNetty

ITPUB

Jul 21, 2021 · Backend Development

How Our Reactive API Gateway Handles Routing, Rate Limiting, and Security in Microservices

This article explains the overall architecture of a reactive API gateway built on RxNetty, detailing its request dispatch, conditional routing for gray releases, API management, rate‑limiting and circuit‑breaking, security policies, and integrated monitoring and tracing within a microservices ecosystem.

Microservicesapi-gatewaybackend-development

0 likes · 13 min read

How Our Reactive API Gateway Handles Routing, Rate Limiting, and Security in Microservices

Youzan Coder

Jul 19, 2021 · Operations

How We Built a Robust Search Middle Platform: From Pain Points to Full‑Scale Quality Assurance

This article examines the challenges faced by a search middle platform—such as inaccurate impact assessment, unstable underlying clusters, and missing process standards—and details a comprehensive quality‑assurance strategy that includes baseline test suites, stability practices, performance testing, emergency drills, and systematic monitoring to ensure reliable search services.

BackendOperationsPerformance Testing

0 likes · 13 min read

How We Built a Robust Search Middle Platform: From Pain Points to Full‑Scale Quality Assurance

Yuewen Technology

Jul 16, 2021 · Operations

Mastering Log Aggregation: From LogID Generation to Powerful Analysis Tools

This article explores the challenges of log aggregation in micro‑service architectures, introduces a globally unique log identifier (logid) with its required properties, compares various logid generation schemes, and presents end‑to‑end solutions for log distribution, aggregation, and analysis using custom tools such as ylog and watcher.

Distributed Systemslog aggregationlog analysis

0 likes · 26 min read

Mastering Log Aggregation: From LogID Generation to Powerful Analysis Tools

MaGe Linux Operations

Jul 15, 2021 · Operations

Deploy a Ready‑to‑Use ELK Logging & Monitoring Stack for Private Environments

This article presents a practical, out‑of‑the‑box ELK‑based solution for private deployments, detailing design principles, rapid one‑click setup with Jenkins, component choices, log and metric collection using Beats, alerting with ElastAlert, and automated Kibana dashboard configuration.

AnsibleELKElasticsearch

0 likes · 11 min read

Deploy a Ready‑to‑Use ELK Logging & Monitoring Stack for Private Environments

High Availability Architecture

Jul 15, 2021 · Operations

Baidu Game Microservice Monitoring Practice and System Design

This article describes Baidu's comprehensive approach to monitoring game microservices, covering the background, initial monitoring tools, evolution of the monitoring system, systematic design for risk control, intelligent detection, alarm optimization, efficient fault localization, and future outlook for high‑availability architecture.

BaiduGame DevelopmentMicroservices

0 likes · 13 min read

Baidu Game Microservice Monitoring Practice and System Design

Baidu Geek Talk

Jul 14, 2021 · Operations

How Baidu Built a Robust Microservice Monitoring System for Game Services

This article details Baidu's comprehensive microservice monitoring practice for its game platform, covering the initial fragmented setup, systematic redesign across risk control, intelligent monitoring, smart alerting, and rapid fault localization, and presents the resulting monitoring architecture, visualizations, and future improvement goals.

AlertingBaiduMicroservices

0 likes · 14 min read

How Baidu Built a Robust Microservice Monitoring System for Game Services

Code Ape Tech Column

Jul 14, 2021 · Operations

How to Build a Real-Time ELK Log Analysis Platform for Spring Boot and Nginx

This guide walks you through installing and configuring the ELK stack—Elasticsearch, Logstash, and Kibana—on Ubuntu, setting up Logstash shipper and indexer pipelines, integrating Spring Boot and Nginx logs, and managing the services with Supervisor for reliable, real‑time log analysis.

ELKElasticsearchKibana

0 likes · 19 min read

How to Build a Real-Time ELK Log Analysis Platform for Spring Boot and Nginx

Senior Brother's Insights

Jul 13, 2021 · Backend Development

Mastering Spring Boot Actuator: A Complete Guide to All Native Endpoints

This article provides a thorough walkthrough of Spring Boot Actuator's native endpoints, explaining their purpose, configuration, and usage with practical code examples and screenshots, enabling developers to monitor, diagnose, and manage Spring Boot microservices effectively.

ActuatorEndpointsMicroservices

0 likes · 20 min read

Mastering Spring Boot Actuator: A Complete Guide to All Native Endpoints

DevOps

Jul 12, 2021 · Operations

The First Four Chaos Experiments to Run on Apache Kafka

This article explains how to use chaos engineering with Gremlin to design, execute, and analyze four experiments that test Kafka broker load, message loss, split‑brain scenarios, and ZooKeeper outages, helping improve the reliability and resilience of Kafka deployments.

Distributed SystemsGremlinKafka

0 likes · 18 min read

The First Four Chaos Experiments to Run on Apache Kafka

Senior Brother's Insights

Jul 11, 2021 · Backend Development

Master Spring Boot Actuator: Integration, Secure Shutdown, and Custom Endpoints

This guide walks through adding Spring Boot Actuator to a project, configuring default and custom endpoints, securing shutdown operations, and demonstrates practical code snippets and curl commands for monitoring and managing a Spring Boot application.

ActuatorBackendCustom Endpoint

0 likes · 9 min read

Master Spring Boot Actuator: Integration, Secure Shutdown, and Custom Endpoints

Yuewen Technology

Jul 9, 2021 · Operations

Mastering Efficient Log Utilization: Best Practices for Logging and Collection

This article outlines how to design, print, collect, and manage online service logs efficiently—covering log levels, key information, formatting, rolling, local vs. remote storage, real‑time collection, and tool selection—to turn logs into a valuable debugging and analytics asset.

Elastic StackFilebeatLog Management

0 likes · 16 min read

Mastering Efficient Log Utilization: Best Practices for Logging and Collection

Xianyu Technology

Jul 9, 2021 · Backend Development

Backend Architecture and Stability for Xianyu Local Services

The article describes Xianyu’s local services architecture, tackling rapid supplier onboarding, heterogeneous quality, and stability by reusing core platform capabilities, defining merchant, audit, and independent business domains, employing high‑concurrency rate limiting, idempotent retries, unified exception handling, status‑change logging, and proactive monitoring with alerts and reporting.

Data ConsistencySystem Designmonitoring

0 likes · 7 min read

Backend Architecture and Stability for Xianyu Local Services

Selected Java Interview Questions

Jul 7, 2021 · Operations

Redis Monitoring Metrics and Commands Guide

This article provides a comprehensive overview of Redis monitoring metrics—including performance, memory, basic activity, persistence, and error indicators—along with recommended monitoring tools, configuration settings, and command-line examples for gathering and interpreting these metrics in production environments.

MetricsOperationsdatabase

0 likes · 7 min read

Redis Monitoring Metrics and Commands Guide

Efficient Ops

Jul 5, 2021 · Operations

10 Essential Practices to Prevent DBA and Ops Disasters

Learn ten practical strategies—from safe change rollbacks and cautious destructive commands to robust backups, clear prompts, vigilant monitoring, and disciplined handovers—that help DBAs and operations engineers avoid costly system failures and maintain reliable production environments.

BackupOperationsOracle

0 likes · 6 min read

10 Essential Practices to Prevent DBA and Ops Disasters

Full-Stack Internet Architecture

Jul 5, 2021 · Databases

Integrating Alibaba Druid Connection Pool with Spring Boot: Configuration and Monitoring Guide

This article provides a comprehensive guide on integrating the Alibaba Druid JDBC connection pool into a Spring Boot application, covering its components, powerful monitoring features, password encryption, SQL parsing, Maven and YAML configuration, filter setup, and how to access the Druid monitoring console.

ConfigurationDatabase Connection PoolDruid

0 likes · 11 min read

Integrating Alibaba Druid Connection Pool with Spring Boot: Configuration and Monitoring Guide

37 Mobile Game Tech Team

Jul 2, 2021 · Operations

How to Build a Flink Monitoring System with Prometheus, Pushgateway, and Grafana

This guide walks you through configuring Flink metrics, installing and linking Pushgateway, Node_exporter, Prometheus, and Grafana, and finally visualizing and alerting on Flink metrics, providing a complete end‑to‑end monitoring solution for Flink clusters.

FlinkGrafanaMetrics

0 likes · 7 min read

How to Build a Flink Monitoring System with Prometheus, Pushgateway, and Grafana

37 Mobile Game Tech Team

Jul 2, 2021 · Big Data

Inside Flink Metrics: Adding, Retrieving, and Exposing Metrics in TaskManager

This article walks through Flink's metric system by explaining the core interfaces such as MetricReporter and MetricRegistry, showing how metrics are added, registered, and queried during TaskManager startup, and detailing both REST and Prometheus approaches for retrieving metric values.

Big DataFlinkMetrics

0 likes · 16 min read

Inside Flink Metrics: Adding, Retrieving, and Exposing Metrics in TaskManager

Ops Development Stories

Jun 30, 2021 · Cloud Native

Mastering Kubernetes: Essential Node & Pod Practices for Stable, Secure Deployments

This article outlines essential Kubernetes operational practices—including node maintenance, kernel upgrades, Docker and kubelet tuning, pod resource limits, scheduling strategies, health probes, logging standards, and monitoring setups—to ensure applications run reliably, securely, and efficiently in production environments.

Cloud NativeKubernetesNode Management

0 likes · 18 min read

Mastering Kubernetes: Essential Node & Pod Practices for Stable, Secure Deployments

Architects' Tech Alliance

Jun 28, 2021 · Backend Development

Understanding the Essence of Architecture and Weibo's Large‑Scale System Design

This article explores the fundamental concepts of software architecture, illustrates scaling challenges with examples like Uber and Weibo, and details multi‑tier designs, caching strategies, service decomposition, monitoring, and operational practices for building and maintaining high‑performance, billion‑user backend systems.

BackendScalabilitycaching

0 likes · 20 min read

Understanding the Essence of Architecture and Weibo's Large‑Scale System Design

DataFunTalk

Jun 27, 2021 · Big Data

Practical Experience in Operating NetEase's Big Data Platform: Architecture, EasyOps, Monitoring, and Optimization

This presentation by NetEase senior SRE Jin Chuan details the current state of NetEase's big data platform, introduces the internally built EasyOps management system, explains a generic Ansible‑based operation framework, describes Prometheus/Grafana monitoring and alerting, and shares practical lessons on network, storage, and cloud migration for large‑scale Hadoop services.

AnsiblePrometheusSRE

0 likes · 10 min read

Practical Experience in Operating NetEase's Big Data Platform: Architecture, EasyOps, Monitoring, and Optimization

Java Interview Crash Guide

Jun 26, 2021 · Backend Development

Essential Linux and Java Tools for Fast Troubleshooting and Performance Tuning

This guide compiles a comprehensive set of Linux commands and Java diagnostic utilities—including tail, grep, awk, find, tsar, btrace, Greys, Arthas, and JProfiler—offering practical examples and code snippets to help developers quickly identify and resolve performance and stability issues in production environments.

javamonitoringtools

0 likes · 16 min read

Essential Linux and Java Tools for Fast Troubleshooting and Performance Tuning

Ops Development Stories

Jun 25, 2021 · Operations

How to Build Custom Zabbix Webhook Alerts with JavaScript (DingTalk Example)

This guide explains how Zabbix 4.4+ lets you use custom JavaScript in webhook media types to send alert notifications, details the built‑in Zabbix objects, shows configuration steps, data validation, logging rules, and provides a complete DingTalk webhook script with testing instructions.

AlertDingTalkJavaScript

0 likes · 11 min read

How to Build Custom Zabbix Webhook Alerts with JavaScript (DingTalk Example)

Java High-Performance Architecture

Jun 24, 2021 · Operations

How Netflix’s Telltale Transforms Application Monitoring and Smart Alerting

Netflix’s in‑house Telltale system consolidates diverse monitoring data, reduces alert noise, provides multidimensional health assessments, and delivers intelligent, context‑rich notifications, enabling engineers to quickly diagnose and resolve issues across more than 100 production services.

AlertingNetflixmonitoring

0 likes · 11 min read

How Netflix’s Telltale Transforms Application Monitoring and Smart Alerting

Architecture Digest

Jun 22, 2021 · Operations

Netflix’s Telltale: An Intelligent Monitoring and Alerting System for Application Health

The article details Netflix’s internally built Telltale monitoring platform, explaining its motivation, key features such as multi‑dimensional health assessment, smart alerting, event management, deployment monitoring, and continuous optimization, and how it improves operational efficiency for over a hundred production services.

AlertingNetflixTelltale

0 likes · 12 min read

Netflix’s Telltale: An Intelligent Monitoring and Alerting System for Application Health

Spring Full-Stack Practical Cases

Jun 19, 2021 · Backend Development

Unlock Spring Boot Actuator: All Endpoints, Configurations, and Code Samples

Spring Boot Actuator provides extensive monitoring and management features via HTTP and JMX; this guide shows how to enable it, lists each endpoint such as /auditevents, /beans, /caches, /conditions, /env, /health, and demonstrates configuration, custom endpoints, and code examples for full utilization.

ActuatorEndpointsSpring Boot

0 likes · 8 min read

Unlock Spring Boot Actuator: All Endpoints, Configurations, and Code Samples

Code Ape Tech Column

Jun 19, 2021 · Operations

Master Prometheus: From Installation to Advanced Monitoring with Grafana

This comprehensive guide walks you through Prometheus' origins, core features, installation methods, configuration files, PromQL basics, exporter setup, Grafana integration, alerting with Alertmanager, and advanced topics like service discovery, providing a complete roadmap for building a production‑grade monitoring system.

AlertmanagerDockerGrafana

0 likes · 34 min read

Master Prometheus: From Installation to Advanced Monitoring with Grafana

Spring Full-Stack Practical Cases

Jun 18, 2021 · Operations

How to Integrate Prometheus Monitoring into Spring Boot with Grafana

This guide walks through setting up Prometheus and Grafana to monitor a Spring Boot 2.3.11 application, covering environment preparation, Maven dependencies, Spring configuration, custom metrics, Prometheus server setup, and visualizing data in Grafana.

GrafanaMicrometerPrometheus

0 likes · 4 min read

How to Integrate Prometheus Monitoring into Spring Boot with Grafana

Programmer DD

Jun 18, 2021 · Operations

What’s New in Grafana 8.0? A Deep Dive into Alerts, Panels, and Real‑Time Streams

Grafana 8.0 introduces a major overhaul of its alerting system, new visualizations such as state timeline, history and histogram panels, reusable library panels, fine‑grained access control, real‑time streaming, enhanced Loki log navigation, and expanded tracing support, all aimed at faster, more flexible observability.

DashboardGrafanaReal-Time

0 likes · 9 min read

What’s New in Grafana 8.0? A Deep Dive into Alerts, Panels, and Real‑Time Streams

Alibaba Cloud Native

Jun 16, 2021 · Backend Development

How to Build a Scalable Distributed Message Governance Platform for High Availability

This article shares Haro's practical experience in designing and operating a distributed message governance platform that unifies RocketMQ, Kafka, and other middleware, covering metrics, monitoring, alerting, scenario‑based controls, and high‑availability strategies to keep microservices reliable under sudden traffic spikes.

MicroservicesRocketMQmonitoring

0 likes · 14 min read

How to Build a Scalable Distributed Message Governance Platform for High Availability

Efficient Ops

Jun 15, 2021 · Operations

Mastering IT Monitoring: Strategies, Challenges, and Best Practices

This article explores the fundamentals of IT monitoring, examines common challenges such as scalability, reliability, and alert fatigue, compares four implementation approaches—from open‑source to fully custom solutions—and presents practical techniques like alert convergence, suppression, and automation to build a robust, adaptable monitoring platform.

Alert ManagementOperationsScalability

0 likes · 19 min read

Mastering IT Monitoring: Strategies, Challenges, and Best Practices

Liangxu Linux

Jun 14, 2021 · Operations

7 Essential Everyday Shell Scripts for Linux System Administration

This article presents seven practical Bash scripts that help Linux administrators quickly gather system status, back up MySQL databases, monitor services, scan network hosts, manage user passwords, and verify MySQL replication, each accompanied by clear code examples and usage instructions.

BackupShellSysadmin

0 likes · 10 min read

7 Essential Everyday Shell Scripts for Linux System Administration

Top Architect

Jun 13, 2021 · Operations

Troubleshooting a JVM Memory Leak and Network Timeout Issue in a Monitoring Service

The article recounts a weekend on‑call incident where a Java monitoring service suffered network packet loss and a severe memory leak, leading to massive timeouts, high CPU usage, and frequent GC, and explains how the problem was diagnosed and resolved using tools such as top, jstat, jstack, jmap, and MAT.

JVMjavajstack

0 likes · 9 min read

Troubleshooting a JVM Memory Leak and Network Timeout Issue in a Monitoring Service

58 Tech

Jun 11, 2021 · Frontend Development

Beidou Frontend Monitoring System: Architecture, Challenges, and Solutions

The article details the design, architecture, and operational challenges of the Beidou frontend monitoring platform at 58 Group, covering SDK management, behavior trace logging, front‑back link integration, performance optimizations, minute‑level alerting, and permission management.

Alertingarchitecturefrontend

0 likes · 22 min read

Beidou Frontend Monitoring System: Architecture, Challenges, and Solutions

IT Architects Alliance

Jun 9, 2021 · Backend Development

The Essence of Architecture and Scaling Strategies in Large‑Scale Web Systems – A Weibo Case Study

This article explores the fundamental concepts of system architecture, scaling challenges, multi‑level caching, service decomposition, and monitoring techniques in massive web platforms, using Weibo’s evolution from LAMP to service‑oriented architecture as a detailed example.

BackendDistributed SystemsScalability

0 likes · 20 min read

The Essence of Architecture and Scaling Strategies in Large‑Scale Web Systems – A Weibo Case Study

Top Architect

Jun 9, 2021 · Operations

Configuring a Perfect JVM GC Log Printing Strategy

This guide explains how to configure comprehensive JVM garbage-collection logging—including basic GC details, object age distribution, heap snapshots, pause times, safepoint statistics, and reference processing—while using timestamped filenames and JVM log rotation to avoid overwriting and manage file size effectively.

JVMgcjava

0 likes · 12 min read

Configuring a Perfect JVM GC Log Printing Strategy

Aikesheng Open Source Community

Jun 9, 2021 · Databases

Monitoring MySQL Full-Text Indexes: Parameters, Metadata Tables, and Practical Demonstrations

This article explains how to monitor MySQL full-text indexes by describing relevant InnoDB parameters, the metadata tables that expose index activity, and step‑by‑step examples that create a sample table, configure monitoring, observe cache behavior, and manage index maintenance operations.

Full-Text IndexInnoDBmonitoring

0 likes · 13 min read

Monitoring MySQL Full-Text Indexes: Parameters, Metadata Tables, and Practical Demonstrations

Efficient Ops

Jun 8, 2021 · Operations

How Red‑Blue Drills Boost Securities Ops: From Capacity Testing to Full‑Scale Automation

Lin Ying, a senior test manager at Guoxin Securities, shares insights from his GOPS 2021 talk on the securities industry's digital transformation, current IT challenges, and a comprehensive red‑blue exercise strategy that combines full‑link load testing, automated workflows, and proactive monitoring to ensure system stability during market peaks.

DevOpsOperationscapacity testing

0 likes · 13 min read

How Red‑Blue Drills Boost Securities Ops: From Capacity Testing to Full‑Scale Automation