monitoring | BestHub

Collection size

1674 articles

Page 3 of 84

JD Tech Talk

Dec 4, 2024 · Operations

Gray Release, Verification, and Rollback Strategies in Software Deployment

The article outlines a comprehensive release management framework that emphasizes gray (canary) deployments, detailed verification steps, monitoring practices, and rollback procedures to mitigate risks and ensure system stability for production rollouts.

DeploymentSoftware OperationsVerification

0 likes · 13 min read

Gray Release, Verification, and Rollback Strategies in Software Deployment

JD Tech Talk

Oct 21, 2024 · Operations

Observability and Quality Assurance: Strategies for Test Teams

This article examines how test teams can enhance application observability and quality assurance by distinguishing observability from traditional monitoring, defining goals, outlining a monitoring foundation, and proposing module‑level and system‑level strategies for proactive fault detection, data analysis, and alerting.

Testingmonitoringobservability

0 likes · 12 min read

Observability and Quality Assurance: Strategies for Test Teams

Beike Product & Technology

Jul 31, 2020 · Mobile Development

Design and Evolution of a Mobile Live‑Streaming Platform at Beike

This article describes how Beike built, refined, and scaled a mobile live‑streaming platform—detailing early challenges, architectural pain points of version 1.0, and the systematic improvements introduced in version 2.0 such as clear boundaries, functional aggregation, layered platform design, dynamic configuration, monitoring, and zero‑cost integration to support diverse business scenarios.

Mobile ArchitecturePlatform EngineeringSDK

0 likes · 11 min read

Design and Evolution of a Mobile Live‑Streaming Platform at Beike

Test Development Learning Exchange

Jan 11, 2025 · Operations

Python System Administration Scripts for DevOps Engineers

This article provides comprehensive Python scripts for system administration tasks including CPU monitoring, memory usage tracking, log analysis, file backup, system updates, network monitoring, service management, user administration, and system information collection.

DevOpsNetwork MonitoringPython

0 likes · 6 min read

Python System Administration Scripts for DevOps Engineers

Refining Core Development Skills

Oct 19, 2020 · Operations

Linux Network Packet Monitoring and Tuning: Tools, RingBuffer, Interrupts, and SoftIRQ Optimization

This article explains how to monitor and tune Linux network packet reception using tools such as ethtool, ifconfig, and procfs, covering RingBuffer inspection, hardware and soft interrupt analysis, multi‑queue configuration, interrupt coalescing, and GRO settings to improve throughput and reduce packet loss.

KernelLinuxNetwork

0 likes · 17 min read

Linux Network Packet Monitoring and Tuning: Tools, RingBuffer, Interrupts, and SoftIRQ Optimization

Watermelon Video Tech Team

Jan 31, 2024 · Mobile Development

Optimizing Android Process Startup in Xigua Video: Strategies, Implementation, and Benefits

This article details how Xigua Video analyzed and optimized the startup of multiple Android subprocesses—including push, mini‑app, sandboxed, and exec processes—by applying on‑demand loading, SDK integration, and monitoring techniques, resulting in measurable performance and quality improvements.

AndroidMobile DevelopmentMulti-process

0 likes · 23 min read

Optimizing Android Process Startup in Xigua Video: Strategies, Implementation, and Benefits

政采云技术

Oct 28, 2021 · Backend Development

HikariCP Overview (Part 1): Initialization, Core Components, Monitoring and Configuration

This article provides a detailed analysis of HikariCP’s initialization, core components, startup flow, connection acquisition logic, monitoring metrics, and key configuration parameters, illustrating how Spring Boot 2.x leverages this high‑performance JDBC connection pool and offering guidance for tuning and extending it.

Connection PoolHikariCPSpring Boot

0 likes · 14 min read

HikariCP Overview (Part 1): Initialization, Core Components, Monitoring and Configuration

JD Tech

Mar 13, 2025 · Operations

Ensuring Stability of the Double 11 Supply‑Chain Dashboard: Full‑Link Process, Risk Points, and Technical Safeguards

This article details how JD Logistics guarantees the stability of its Double 11 supply‑chain dashboard by mapping the entire data‑flow, identifying risk points across ingestion, processing, storage, service, and monitoring layers, and applying targeted technical and organizational safeguards.

DashboardStabilitybig data

0 likes · 10 min read

Ensuring Stability of the Double 11 Supply‑Chain Dashboard: Full‑Link Process, Risk Points, and Technical Safeguards

JD Tech

Oct 10, 2023 · Operations

Technical Case Study of JDV Visual Dashboard Platform for the 618 Promotion

This article details how JDV, JD.com’s internal visual dashboard platform, tackled the massive data‑intensive 618 promotion by implementing real‑time updates, cross‑midnight count stops, request‑state control, heartbeat monitoring, proxy data sources, and a suite of developer tools to ensure stability, performance, and rapid feature delivery.

data platformlarge-scalemonitoring

0 likes · 18 min read

Technical Case Study of JDV Visual Dashboard Platform for the 618 Promotion

Ctrip Technology

Sep 23, 2024 · Frontend Development

Intelligent Alert Attribution System for Ctrip Hotel Frontend: Design, Implementation, and Outcomes

This article details the design and deployment of an intelligent alert attribution system for Ctrip Hotel's front‑end, describing the background challenges, the unified data pool, weighted alert rules, three attribution algorithms, achieved improvements in accuracy and troubleshooting speed, and future enhancement plans.

AlertAttributionFrontend

0 likes · 18 min read

Intelligent Alert Attribution System for Ctrip Hotel Frontend: Design, Implementation, and Outcomes

Ctrip Technology

Oct 10, 2018 · Operations

Design and Implementation of Ctrip's Fourth-Generation Full-Link Performance Testing System

This article outlines the evolution of Ctrip’s performance testing approaches across three generations, analyzes their limitations, and presents the design, architecture, data construction, request tracing, monitoring, and operational considerations of the fourth-generation full‑link testing platform, including case studies and future outlook.

Capacity Planningfull-link testingload testing

0 likes · 14 min read

Design and Implementation of Ctrip's Fourth-Generation Full-Link Performance Testing System

Ctrip Technology

Aug 17, 2017 · Operations

Design, Evolution, and Future of Ctrip's Operations Workflow Platform

This article details the challenges, architectural evolution, key components, implementation experiences, and future directions of Ctrip's operations workflow platform, illustrating how a multi‑stage, layered design and standardized services have transformed manual IT operations into an automated, observable, and scalable system.

Platform Architecturemonitoringoperations automation

0 likes · 16 min read

Design, Evolution, and Future of Ctrip's Operations Workflow Platform

360 Tech Engineering

Sep 6, 2019 · Operations

StackStorm-Based ChatOps Solution for Automated Monitoring Alert Self‑Healing

This article introduces a StackStorm‑driven ChatOps framework that consolidates monitoring alerts, applies rule‑based root‑cause analysis, and automatically executes self‑healing actions, outlining its architecture, components, workflow definitions, and practical deployment results within an enterprise operations environment.

ChatOpsStackStormmonitoring

0 likes · 6 min read

StackStorm-Based ChatOps Solution for Automated Monitoring Alert Self‑Healing

360 Smart Cloud

Jul 3, 2024 · Operations

Practical Practices for Enhancing Kafka Cluster Stability at 360

This article details 360's comprehensive approach to improving Apache Kafka cluster stability through proactive operations, capacity assessment, parameter tuning, monitoring, version upgrades, and traffic control, offering concrete guidelines and best‑practice recommendations for large‑scale message‑queue deployments.

ClusterKafkaStability

0 likes · 33 min read

Practical Practices for Enhancing Kafka Cluster Stability at 360

58 Tech

Nov 27, 2024 · Operations

Building an Observability System for Cloud Authentication: Practices, Metrics, and Lessons Learned

This article details how 58 Group’s cloud authentication service introduced an observability framework—optimizing logs, employing distributed tracing, defining SLO/SLA metrics, and implementing burn‑rate alerts—to improve fault detection, reduce false alarms, and achieve faster root‑cause analysis across the system.

Distributed TracingError BudgetSLO

0 likes · 16 min read

Building an Observability System for Cloud Authentication: Practices, Metrics, and Lessons Learned

58 Tech

Apr 21, 2022 · Frontend Development

Interview with Li Yi on Building 58 Group’s Large Front‑End Technology Service System

In this interview, Li Yi, head of 58 Group’s Front‑End Technology Department, explains how the company built its large‑scale front‑end service system—including a Hybrid permission platform, a React Native hot‑update platform, and the Beidou monitoring system—while discussing cross‑platform frameworks, performance challenges, low‑code adoption, and advice for newcomers.

FrontendReact Nativecross‑platform

0 likes · 11 min read

Interview with Li Yi on Building 58 Group’s Large Front‑End Technology Service System

Aikesheng Open Source Community

May 13, 2024 · Databases

Profiling Memory Usage in MySQL Queries

This article explains how to use MySQL's performance_schema to monitor and analyze per‑connection memory consumption, provides SQL queries to list memory instruments, shows Python scripts for sampling and visualizing memory usage over time, and demonstrates practical usage with example commands and output.

Memory ProfilingMySQLPerformance Schema

0 likes · 14 min read

Aikesheng Open Source Community

Jun 9, 2021 · Databases

Monitoring MySQL Full-Text Indexes: Parameters, Metadata Tables, and Practical Demonstrations

This article explains how to monitor MySQL full-text indexes by describing relevant InnoDB parameters, the metadata tables that expose index activity, and step‑by‑step examples that create a sample table, configure monitoring, observe cache behavior, and manage index maintenance operations.

Database performanceFull-Text IndexInnoDB

0 likes · 13 min read

Monitoring MySQL Full-Text Indexes: Parameters, Metadata Tables, and Practical Demonstrations

Architecture and Beyond

Jul 28, 2024 · Frontend Development

Comprehensive Guide to Front‑End Stability: Observability, Full‑Chain Monitoring, High‑Availability Architecture, Performance Management, Risk Governance, Process Mechanisms, and Engineering Practices

This extensive article presents a systematic approach to front‑end stability, covering observability systems, full‑chain monitoring, high‑availability design, performance management, risk governance, process mechanisms, and engineering practices to ensure reliable user experiences and business continuity.

FrontendHigh AvailabilityStability

0 likes · 44 min read

Comprehensive Guide to Front‑End Stability: Observability, Full‑Chain Monitoring, High‑Availability Architecture, Performance Management, Risk Governance, Process Mechanisms, and Engineering Practices

IT Architects Alliance

Feb 5, 2025 · Cloud Native

Performance Optimization Strategies for Cloud‑Native Applications

This article examines the rapid adoption of cloud‑native architectures and presents a comprehensive guide to identifying performance bottlenecks and applying architectural, resource‑management, caching, networking, and tooling techniques—such as Kubernetes, Prometheus, Grafana, and JMeter—to achieve high‑performance, scalable cloud‑native systems.

CI/CDKubernetesPerformance optimization

0 likes · 22 min read

Performance Optimization Strategies for Cloud‑Native Applications