Topic

monitoring

Collection size
1767 articles
Page 11 of 89
WeiLi Technology Team
WeiLi Technology Team
Jun 28, 2024 · Big Data

How to Build a Robust Big Data Monitoring and Alerting System

This article explains why high‑availability design and comprehensive monitoring are essential for modern big‑data platforms, outlines a layered architecture, and provides practical guidance on health checks, alerting, and data‑quality monitoring across storage, compute, scheduling, and service layers.

AlertingBig DataFlink
0 likes · 14 min read
How to Build a Robust Big Data Monitoring and Alerting System
Xianyu Technology
Xianyu Technology
Jun 17, 2020 · Backend Development

Lottery System Risk Management and SDK Integration

Xianyu mitigated lottery‑related financial loss by centralizing rights management, decoupling UI from business logic, and providing a unified SDK with simple draw APIs, while adding real‑time log backflow, comprehensive accounting and monitoring, cutting configuration time by over 50 % and eliminating UI‑only risk.

BackendMonitoringSDK
0 likes · 10 min read
Lottery System Risk Management and SDK Integration
DeWu Technology
DeWu Technology
Apr 21, 2025 · Backend Development

Design and Evolution of a Unified Exchange Mall Middleware Platform

The unified exchange mall middleware platform consolidates disparate points‑redemption and lottery flows into a four‑layer architecture—business, gameplay templates, domain models, and downstream services—offering standardized APIs, dynamic RPC routing, Redis‑based inventory control, anti‑fraud safeguards, and built‑in monitoring, thereby cutting development costs, enhancing maintainability, and ensuring system stability.

BackendInventoryMonitoring
0 likes · 18 min read
Design and Evolution of a Unified Exchange Mall Middleware Platform
DeWu Technology
DeWu Technology
Aug 28, 2023 · Operations

Real-time Data Warehouse Business-Side Chaos Engineering Practice

The article describes how a real‑time data warehouse supporting ad‑delivery metrics adopts both technical and business‑side chaos‑engineering, using red‑blue team drills to inject faults, monitor indicator anomalies, and refine response procedures, thereby enhancing early risk detection, system resilience, and overall data stability for the advertising platform.

Backend DevelopmentChaos EngineeringData Warehousing
0 likes · 16 min read
Real-time Data Warehouse Business-Side Chaos Engineering Practice
DeWu Technology
DeWu Technology
Aug 14, 2023 · Operations

Capital Loss Prevention Practices and Technical System

Dewu’s capital‑loss prevention framework embeds risk assessment and technical safeguards—such as idempotency, distributed consistency, and active‑active multi‑region design—into architecture, organizes three defensive lines (development, QA, SRE), and employs real‑time, near‑real‑time, and offline verification plus regular drills, while advancing automated analysis and intelligent scaling.

MonitoringSREdata consistency
0 likes · 10 min read
Capital Loss Prevention Practices and Technical System
DeWu Technology
DeWu Technology
Apr 26, 2023 · Operations

Stability and Alerting Practices for E‑commerce Order Submission Service

The article details how a high‑throughput e‑commerce checkout pipeline achieves stability by combining fine‑grained metrics, custom trace logs, version‑based data validation, and targeted alert rules that detect latency spikes, error‑code surges, and downstream service failures, enabling rapid incident localization and reliable order processing.

AlertingBackendMonitoring
0 likes · 12 min read
Stability and Alerting Practices for E‑commerce Order Submission Service
DeWu Technology
DeWu Technology
Feb 27, 2023 · Operations

Message Push Monitoring and SLA Practices

The team implemented SLA‑based, node‑level monitoring for mobile push messages—splitting the workflow, measuring latency, blocking volume, and success rates, isolating metrics with Spring AOP, and tracking third‑party vendors—resulting in clear latency standards, doubled peak throughput, faster issue resolution, and improved overall reliability.

BackendMonitoringOperations
0 likes · 11 min read
Message Push Monitoring and SLA Practices
DeWu Technology
DeWu Technology
Dec 5, 2022 · Operations

Evolution of Application Monitoring at 得物: From CAT to OpenTelemetry

After rebuilding its transaction system in 2020, 得物 progressed from the basic CAT monitoring tool to OpenTracing with Prometheus, and finally adopted OpenTelemetry to unify metrics, traces, and logs via a custom vmagent‑Kafka‑Flink pipeline, dynamic sampling, and extensible javaagents, positioning the platform for a performance‑analysis‑driven future.

MonitoringObservabilityOpenTelemetry
0 likes · 18 min read
Evolution of Application Monitoring at 得物: From CAT to OpenTelemetry
DeWu Technology
DeWu Technology
May 16, 2022 · Operations

NOC SLA Implementation for Consumer Trading Platform

To tackle growing production complexity and past incident delays, the consumer trading platform introduced a three‑tier NOC‑SLA with intelligent baselines powered by Facebook Prophet, streamlined alert rules, and an SOS‑linked workflow, boosting detection frequency, cutting critical response times to under five minutes, and improving overall system reliability while emphasizing ongoing baseline and rule maintenance.

MonitoringNOCOperations
0 likes · 13 min read
NOC SLA Implementation for Consumer Trading Platform
DeWu Technology
DeWu Technology
Mar 28, 2022 · Backend Development

Loss Prevention Architecture and Real-Time Data Reconciliation for E‑commerce Platforms

The e‑commerce platform’s loss‑prevention architecture combines domain‑modeled scenario identification, pre‑emptive checks, automated testing, and a real‑time data‑reconciliation pipeline using Dcheck and rule factories to detect anomalies, trigger alerts, and execute emergency response plans, thereby minimizing financial risk and ensuring transaction stability.

Backend DevelopmentMonitoringRule Engine
0 likes · 13 min read
Loss Prevention Architecture and Real-Time Data Reconciliation for E‑commerce Platforms
Java Tech Enthusiast
Java Tech Enthusiast
Jul 21, 2024 · Backend Development

Interface Performance Optimization Techniques for Backend Development

The article outlines practical backend interface performance optimizations—including proper indexing, SQL tuning, parallel remote calls, batch queries, asynchronous processing, scoped transactions, fine-grained locking, pagination batching, multi-level caching, sharding, and monitoring tools—to dramatically reduce latency and improve throughput.

BackendCachingIndexing
0 likes · 25 min read
Interface Performance Optimization Techniques for Backend Development
Java Tech Enthusiast
Java Tech Enthusiast
May 5, 2024 · Information Security

Preventing Malicious API Abuse: Security Measures and Best Practices

To prevent malicious API abuse, implement layered defenses such as firewalls to block unwanted traffic, robust captchas and SMS verification, mandatory authentication with permission controls, IP whitelisting for critical endpoints, HTTPS encryption, strict rate‑limiting via Redis, continuous monitoring with alerts, and an API gateway that centralizes filtering, authentication and throttling.

API SecurityIP whitelistMonitoring
0 likes · 9 min read
Preventing Malicious API Abuse: Security Measures and Best Practices
DaTaobao Tech
DaTaobao Tech
Jul 29, 2024 · Operations

Testing Environment Reliability, Routing Isolation, Monitoring, and Efficient Deployment Practices

Alibaba Taotian’s testing platform now lets business owners self‑service reliable environments by binding accounts to isolated routes, monitoring lightweight health metrics with automated self‑healing, accelerating deployments via code caching and JVM tricks, and enabling rapid “time‑travel” scenario testing, while planning tighter observability and production alignment.

Deployment EfficiencyMonitoringObservability
0 likes · 11 min read
Testing Environment Reliability, Routing Isolation, Monitoring, and Efficient Deployment Practices
DaTaobao Tech
DaTaobao Tech
May 22, 2024 · Cloud Native

AONE Serverless Quality Assurance: Design, Testing, and Monitoring

The article explains how AONE Serverless separates development and operations domains to enable independent iteration and lower costs, details a QA workflow—functional regression, performance testing, monitoring verification, reverse‑engineered interfaces, automated API traffic replay, and isolated pressure testing— and reports deployment build time cuts of 17% and overall deployment reductions up to 44%, while outlining challenges and future plans for layered automation and plugin‑based extensions.

Cloud NativeDeployment EfficiencyMonitoring
0 likes · 9 min read
AONE Serverless Quality Assurance: Design, Testing, and Monitoring
DaTaobao Tech
DaTaobao Tech
Apr 20, 2022 · Operations

Understanding Wireless Operations and Maintenance: Origins, Challenges, and Future Directions

Wireless operations and maintenance (O&M) evolved from backend‑focused practices to address stability and performance of mobile‑device services, tackling low issue detection rates and delayed responses through improved monitoring, gray‑release tagging, phased rollouts, AI‑driven diagnostics, and automated release gates, while inviting collaborative development.

Monitoringgray releaseincident response
0 likes · 13 min read
Understanding Wireless Operations and Maintenance: Origins, Challenges, and Future Directions
DaTaobao Tech
DaTaobao Tech
Feb 21, 2022 · Frontend Development

Focused Gray Release Monitoring and Alert Configuration for Frontend Quality

To raise front‑end quality, the team implements gray‑release monitoring that triggers log analysis at a 5 % rollout, automatically generates reports within ten minutes, and uses dynamic thresholds and noise‑reduction tactics to detect errors early, enabling rapid rollback or expansion and markedly improving stability and release efficiency.

AlertingMetricsMonitoring
0 likes · 9 min read
Focused Gray Release Monitoring and Alert Configuration for Frontend Quality
Xianyu Technology
Xianyu Technology
May 13, 2021 · Frontend Development

Front-End Disaster Recovery for Page Stability

To prevent page failures and white‑screen errors, the team built a front‑end SDK that fetches fallback data from OSS + CDN, offers configurable black/white‑list rules, lightweight validation, and a visual backend, cutting error rates from over 8% to 0.55% and dramatically improving interface stability.

CDNMonitoringSDK
0 likes · 9 min read
Front-End Disaster Recovery for Page Stability
Xianyu Technology
Xianyu Technology
Sep 27, 2020 · Backend Development

Design of an Asynchronous Component with Monitoring, Fault Tolerance, and Zero‑Cost Integration

The article presents a design for an asynchronous component that is monitorable, fault‑tolerant, and integrates with zero overhead, compares Akka, RxJava, and a custom JUC‑based implementation, and selects the latter—using extended Callables and a CountDownLatch—to track business units, handle timeouts, and provide fallback behavior.

ConcurrencyJUCJava
0 likes · 8 min read
Design of an Asynchronous Component with Monitoring, Fault Tolerance, and Zero‑Cost Integration
Xianyu Technology
Xianyu Technology
Jul 28, 2020 · Operations

ShenTan: Automated Fault Localization System for Online Services

ShenTan is an automated fault‑localization platform for online services that quickly (under five seconds) pinpoints server‑side issues with developer‑level accuracy by aggregating real‑time metrics, applying a decision‑tree model enriched by expert knowledge and dynamic thresholds, and presenting results through an integrated alert and visualization system, while planning broader endpoint coverage and multi‑tenant support.

Big DataMonitoringOperations
0 likes · 12 min read
ShenTan: Automated Fault Localization System for Online Services
Xianyu Technology
Xianyu Technology
Mar 14, 2019 · Operations

Ensuring High Availability of Search Engine Services: A Case Study of Xianyu's Search System

The article explains how Xianyu guarantees high‑availability of its core Ha3‑based search engine through independent gateway deployment, multi‑datacenter disaster recovery, traffic isolation, comprehensive monitoring, pressure testing, gray releases, and automated/manual failover, enabling rapid issue detection, recovery, and continuous service stability.

Emergency ResponseHigh AvailabilityMonitoring
0 likes · 19 min read
Ensuring High Availability of Search Engine Services: A Case Study of Xianyu's Search System