monitoring | BestHub

Collection size

1767 articles

Page 11 of 89

WeiLi Technology Team

Jun 28, 2024 · Big Data

How to Build a Robust Big Data Monitoring and Alerting System

This article explains why high‑availability design and comprehensive monitoring are essential for modern big‑data platforms, outlines a layered architecture, and provides practical guidance on health checks, alerting, and data‑quality monitoring across storage, compute, scheduling, and service layers.

AlertingBig DataFlink

0 likes · 14 min read

How to Build a Robust Big Data Monitoring and Alerting System

Xianyu Technology

Jun 17, 2020 · Backend Development

Lottery System Risk Management and SDK Integration

Xianyu mitigated lottery‑related financial loss by centralizing rights management, decoupling UI from business logic, and providing a unified SDK with simple draw APIs, while adding real‑time log backflow, comprehensive accounting and monitoring, cutting configuration time by over 50 % and eliminating UI‑only risk.

BackendMonitoringSDK

0 likes · 10 min read

Lottery System Risk Management and SDK Integration

DeWu Technology

Apr 21, 2025 · Backend Development

Design and Evolution of a Unified Exchange Mall Middleware Platform

The unified exchange mall middleware platform consolidates disparate points‑redemption and lottery flows into a four‑layer architecture—business, gameplay templates, domain models, and downstream services—offering standardized APIs, dynamic RPC routing, Redis‑based inventory control, anti‑fraud safeguards, and built‑in monitoring, thereby cutting development costs, enhancing maintainability, and ensuring system stability.

BackendInventoryMonitoring

0 likes · 18 min read

Design and Evolution of a Unified Exchange Mall Middleware Platform

DeWu Technology

Aug 28, 2023 · Operations

Real-time Data Warehouse Business-Side Chaos Engineering Practice

The article describes how a real‑time data warehouse supporting ad‑delivery metrics adopts both technical and business‑side chaos‑engineering, using red‑blue team drills to inject faults, monitor indicator anomalies, and refine response procedures, thereby enhancing early risk detection, system resilience, and overall data stability for the advertising platform.

Backend DevelopmentChaos EngineeringData Warehousing

0 likes · 16 min read

Real-time Data Warehouse Business-Side Chaos Engineering Practice

DeWu Technology

Aug 14, 2023 · Operations

Capital Loss Prevention Practices and Technical System

Dewu’s capital‑loss prevention framework embeds risk assessment and technical safeguards—such as idempotency, distributed consistency, and active‑active multi‑region design—into architecture, organizes three defensive lines (development, QA, SRE), and employs real‑time, near‑real‑time, and offline verification plus regular drills, while advancing automated analysis and intelligent scaling.

MonitoringSREdata consistency

0 likes · 10 min read

Capital Loss Prevention Practices and Technical System

DeWu Technology

Apr 26, 2023 · Operations

Stability and Alerting Practices for E‑commerce Order Submission Service

The article details how a high‑throughput e‑commerce checkout pipeline achieves stability by combining fine‑grained metrics, custom trace logs, version‑based data validation, and targeted alert rules that detect latency spikes, error‑code surges, and downstream service failures, enabling rapid incident localization and reliable order processing.

AlertingBackendMonitoring

0 likes · 12 min read

Stability and Alerting Practices for E‑commerce Order Submission Service

DeWu Technology

Feb 27, 2023 · Operations

Message Push Monitoring and SLA Practices

The team implemented SLA‑based, node‑level monitoring for mobile push messages—splitting the workflow, measuring latency, blocking volume, and success rates, isolating metrics with Spring AOP, and tracking third‑party vendors—resulting in clear latency standards, doubled peak throughput, faster issue resolution, and improved overall reliability.

BackendMonitoringOperations

0 likes · 11 min read

Message Push Monitoring and SLA Practices

DeWu Technology

Dec 5, 2022 · Operations

Evolution of Application Monitoring at 得物: From CAT to OpenTelemetry

After rebuilding its transaction system in 2020, 得物 progressed from the basic CAT monitoring tool to OpenTracing with Prometheus, and finally adopted OpenTelemetry to unify metrics, traces, and logs via a custom vmagent‑Kafka‑Flink pipeline, dynamic sampling, and extensible javaagents, positioning the platform for a performance‑analysis‑driven future.

MonitoringObservabilityOpenTelemetry

0 likes · 18 min read

Evolution of Application Monitoring at 得物: From CAT to OpenTelemetry

DeWu Technology

May 16, 2022 · Operations

NOC SLA Implementation for Consumer Trading Platform

To tackle growing production complexity and past incident delays, the consumer trading platform introduced a three‑tier NOC‑SLA with intelligent baselines powered by Facebook Prophet, streamlined alert rules, and an SOS‑linked workflow, boosting detection frequency, cutting critical response times to under five minutes, and improving overall system reliability while emphasizing ongoing baseline and rule maintenance.

MonitoringNOCOperations

0 likes · 13 min read

NOC SLA Implementation for Consumer Trading Platform

DeWu Technology

Mar 28, 2022 · Backend Development

Loss Prevention Architecture and Real-Time Data Reconciliation for E‑commerce Platforms

The e‑commerce platform’s loss‑prevention architecture combines domain‑modeled scenario identification, pre‑emptive checks, automated testing, and a real‑time data‑reconciliation pipeline using Dcheck and rule factories to detect anomalies, trigger alerts, and execute emergency response plans, thereby minimizing financial risk and ensuring transaction stability.

Backend DevelopmentMonitoringRule Engine

0 likes · 13 min read

Loss Prevention Architecture and Real-Time Data Reconciliation for E‑commerce Platforms

Java Tech Enthusiast

Jul 21, 2024 · Backend Development

Interface Performance Optimization Techniques for Backend Development

The article outlines practical backend interface performance optimizations—including proper indexing, SQL tuning, parallel remote calls, batch queries, asynchronous processing, scoped transactions, fine-grained locking, pagination batching, multi-level caching, sharding, and monitoring tools—to dramatically reduce latency and improve throughput.

BackendCachingIndexing

0 likes · 25 min read

Interface Performance Optimization Techniques for Backend Development

Java Tech Enthusiast

May 5, 2024 · Information Security

Preventing Malicious API Abuse: Security Measures and Best Practices

To prevent malicious API abuse, implement layered defenses such as firewalls to block unwanted traffic, robust captchas and SMS verification, mandatory authentication with permission controls, IP whitelisting for critical endpoints, HTTPS encryption, strict rate‑limiting via Redis, continuous monitoring with alerts, and an API gateway that centralizes filtering, authentication and throttling.

API SecurityIP whitelistMonitoring

0 likes · 9 min read

Preventing Malicious API Abuse: Security Measures and Best Practices

DaTaobao Tech

Jul 29, 2024 · Operations

Testing Environment Reliability, Routing Isolation, Monitoring, and Efficient Deployment Practices

Alibaba Taotian’s testing platform now lets business owners self‑service reliable environments by binding accounts to isolated routes, monitoring lightweight health metrics with automated self‑healing, accelerating deployments via code caching and JVM tricks, and enabling rapid “time‑travel” scenario testing, while planning tighter observability and production alignment.

Deployment EfficiencyMonitoringObservability

0 likes · 11 min read

Testing Environment Reliability, Routing Isolation, Monitoring, and Efficient Deployment Practices

DaTaobao Tech

May 22, 2024 · Cloud Native

AONE Serverless Quality Assurance: Design, Testing, and Monitoring

The article explains how AONE Serverless separates development and operations domains to enable independent iteration and lower costs, details a QA workflow—functional regression, performance testing, monitoring verification, reverse‑engineered interfaces, automated API traffic replay, and isolated pressure testing— and reports deployment build time cuts of 17% and overall deployment reductions up to 44%, while outlining challenges and future plans for layered automation and plugin‑based extensions.

Cloud NativeDeployment EfficiencyMonitoring

0 likes · 9 min read

AONE Serverless Quality Assurance: Design, Testing, and Monitoring

DaTaobao Tech

Apr 20, 2022 · Operations

Understanding Wireless Operations and Maintenance: Origins, Challenges, and Future Directions

Wireless operations and maintenance (O&M) evolved from backend‑focused practices to address stability and performance of mobile‑device services, tackling low issue detection rates and delayed responses through improved monitoring, gray‑release tagging, phased rollouts, AI‑driven diagnostics, and automated release gates, while inviting collaborative development.

Monitoringgray releaseincident response

0 likes · 13 min read

Understanding Wireless Operations and Maintenance: Origins, Challenges, and Future Directions

DaTaobao Tech

Feb 21, 2022 · Frontend Development

Focused Gray Release Monitoring and Alert Configuration for Frontend Quality

To raise front‑end quality, the team implements gray‑release monitoring that triggers log analysis at a 5 % rollout, automatically generates reports within ten minutes, and uses dynamic thresholds and noise‑reduction tactics to detect errors early, enabling rapid rollback or expansion and markedly improving stability and release efficiency.

AlertingMetricsMonitoring

0 likes · 9 min read

Focused Gray Release Monitoring and Alert Configuration for Frontend Quality

Xianyu Technology

May 13, 2021 · Frontend Development

Front-End Disaster Recovery for Page Stability

To prevent page failures and white‑screen errors, the team built a front‑end SDK that fetches fallback data from OSS + CDN, offers configurable black/white‑list rules, lightweight validation, and a visual backend, cutting error rates from over 8% to 0.55% and dramatically improving interface stability.

CDNMonitoringSDK

0 likes · 9 min read

Front-End Disaster Recovery for Page Stability

Xianyu Technology

Sep 27, 2020 · Backend Development

Design of an Asynchronous Component with Monitoring, Fault Tolerance, and Zero‑Cost Integration

The article presents a design for an asynchronous component that is monitorable, fault‑tolerant, and integrates with zero overhead, compares Akka, RxJava, and a custom JUC‑based implementation, and selects the latter—using extended Callables and a CountDownLatch—to track business units, handle timeouts, and provide fallback behavior.

ConcurrencyJUCJava

0 likes · 8 min read

Design of an Asynchronous Component with Monitoring, Fault Tolerance, and Zero‑Cost Integration

Xianyu Technology

Jul 28, 2020 · Operations

ShenTan: Automated Fault Localization System for Online Services

ShenTan is an automated fault‑localization platform for online services that quickly (under five seconds) pinpoints server‑side issues with developer‑level accuracy by aggregating real‑time metrics, applying a decision‑tree model enriched by expert knowledge and dynamic thresholds, and presenting results through an integrated alert and visualization system, while planning broader endpoint coverage and multi‑tenant support.

Big DataMonitoringOperations

0 likes · 12 min read

ShenTan: Automated Fault Localization System for Online Services

Xianyu Technology

Mar 14, 2019 · Operations

Ensuring High Availability of Search Engine Services: A Case Study of Xianyu's Search System

The article explains how Xianyu guarantees high‑availability of its core Ha3‑based search engine through independent gateway deployment, multi‑datacenter disaster recovery, traffic isolation, comprehensive monitoring, pressure testing, gray releases, and automated/manual failover, enabling rapid issue detection, recovery, and continuous service stability.

Emergency ResponseHigh AvailabilityMonitoring

0 likes · 19 min read