Topic

monitoring

Collection size
1794 articles
Page 71 of 90
Sohu Tech Products
Sohu Tech Products
Mar 31, 2021 · Operations

Improving Cluster Stability: CI/CD, Monitoring, Logging, Documentation, and Traffic Management Solutions

The article analyzes the instability of a company's Kubernetes clusters, identifies root causes such as unstable release processes, lack of monitoring, logging, and documentation, and proposes comprehensive solutions including a Kubernetes‑centric CI/CD pipeline, federated Prometheus monitoring, Elasticsearch logging, centralized documentation, and integrated traffic management with Kong and Istio.

CI/CDDevOpsKubernetes
0 likes · 10 min read
Improving Cluster Stability: CI/CD, Monitoring, Logging, Documentation, and Traffic Management Solutions
Sohu Tech Products
Sohu Tech Products
Mar 24, 2021 · Backend Development

The Essence of Architecture: Insights from Large‑Scale Systems like Weibo

This article explores the fundamental principles of system architecture, illustrating how large‑scale services such as Uber and Weibo handle massive traffic through strategic abstraction, modularization, performance optimization, multi‑level caching, distributed tracing, and operational best practices to achieve scalability and reliability.

Distributed SystemsMonitoringPerformance
0 likes · 21 min read
The Essence of Architecture: Insights from Large‑Scale Systems like Weibo
Sohu Tech Products
Sohu Tech Products
Feb 24, 2021 · Operations

Redis Monitoring and Alerting Practices: Metrics, Thresholds, and Troubleshooting

This article presents a comprehensive guide to Redis monitoring and alerting, covering metric classification, threshold settings, client traffic collection, host resource usage, instance health checks, cluster failover diagnostics, and detailed explanations of Redis INFO sections with practical code examples.

AlertingDatabaseMetrics
0 likes · 23 min read
Redis Monitoring and Alerting Practices: Metrics, Thresholds, and Troubleshooting
Sohu Tech Products
Sohu Tech Products
Jan 13, 2021 · Operations

Introducing MQCloud: A One‑Stop Service Platform for RocketMQ Monitoring, Management, and Operations

This article explains RocketMQ’s core features, the operational challenges of using its console, and how the MQCloud platform was designed to separate business and admin roles, provide comprehensive monitoring, automated deployment, security hardening, and a customized client, ultimately turning operational pain into a scalable, open‑source solution.

Distributed MessagingMQCloudMonitoring
0 likes · 15 min read
Introducing MQCloud: A One‑Stop Service Platform for RocketMQ Monitoring, Management, and Operations
Sohu Tech Products
Sohu Tech Products
Dec 18, 2019 · Backend Development

Node.js Performance Optimization: Common Techniques, Key Metrics, and Bottlenecks

This article answers a developer's question about Node.js performance optimization by outlining major optimization areas, listing practical techniques such as using streams, clustering, and load balancing, and describing typical bottlenecks and essential performance metrics to monitor.

MonitoringPerformancebackend
0 likes · 3 min read
Node.js Performance Optimization: Common Techniques, Key Metrics, and Bottlenecks
Ctrip Technology
Ctrip Technology
Dec 9, 2021 · Databases

TiDB Operational Practices at Ctrip: Architecture, Use Cases, Performance Tuning, Monitoring, and Tooling

This article details Ctrip's migration from MySQL to TiDB, describing the multi‑data‑center architecture, real‑world use cases such as the international CDP platform and hotel settlement, performance tuning measures, comprehensive monitoring and alerting, auxiliary tools, and future roadmap for the distributed NewSQL database.

Distributed DatabaseHTAPMonitoring
0 likes · 16 min read
TiDB Operational Practices at Ctrip: Architecture, Use Cases, Performance Tuning, Monitoring, and Tooling
Ctrip Technology
Ctrip Technology
Mar 4, 2021 · Cloud Native

Ctrip International Ticketing Cloud‑Native Migration: Infrastructure as Code, Logging, Monitoring, and Cost Optimization

This article shares Ctrip International Ticketing’s cloud‑native migration experience, covering infrastructure‑as‑code with Terraform, managed Kubernetes, centralized logging and monitoring using Elastic Search, Prometheus, Grafana and Thanos, and practical cost‑optimization techniques such as auto‑scaling, spot instances, storage tiering and network proxying.

Cloud NativeCost OptimizationInfrastructure as Code
0 likes · 13 min read
Ctrip International Ticketing Cloud‑Native Migration: Infrastructure as Code, Logging, Monitoring, and Cost Optimization
Ctrip Technology
Ctrip Technology
Dec 5, 2019 · Backend Development

Node.js Engineering Practices at Ctrip: From Zero to One, Best Practices and Operations

This article details how Ctrip builds, deploys, tests, releases, and operates Node.js applications—including engineering processes, core middleware, Docker-based deployment, multi‑process communication, monitoring, and full‑link tracing—while sharing practical lessons learned from real‑world production use.

DevOpsDockerMonitoring
0 likes · 14 min read
Node.js Engineering Practices at Ctrip: From Zero to One, Best Practices and Operations
Ctrip Technology
Ctrip Technology
Oct 17, 2019 · Backend Development

CDubbo: Ctrip’s Customized Dubbo Framework – Architecture, Governance, Monitoring, and Extensions

This article describes how Ctrip introduced a customized Dubbo framework called CDubbo, covering the motivations for adopting Dubbo, the initial implementation of service governance and monitoring, and subsequent extensions such as callback enhancement, serialization support, circuit‑breaking, testing tools, and a bastion testing gateway.

MonitoringRPCbackend development
0 likes · 13 min read
CDubbo: Ctrip’s Customized Dubbo Framework – Architecture, Governance, Monitoring, and Extensions
Ctrip Technology
Ctrip Technology
May 14, 2019 · Backend Development

Ctrip’s Node.js Platform: Architecture, Deployment, Monitoring, and Public Services

The article details Ctrip’s Node.js technology stack, covering its deployment pipeline, version management, build principles, Docker‑based operations and monitoring, logging models, public services such as SOA, storage, cache, disaster recovery, and real‑world application scenarios like data aggregation and server‑side rendering.

DeploymentDockerMonitoring
0 likes · 10 min read
Ctrip’s Node.js Platform: Architecture, Deployment, Monitoring, and Public Services
Ctrip Technology
Ctrip Technology
Dec 26, 2018 · Operations

Evolution of Ctrip's Hickwall Monitoring and Alerting Platform: Architecture, InfluxDB Cluster, Data Aggregation, and Stream Alerting

This article details the architectural evolution of Ctrip's Hickwall monitoring and alerting platform, describing the transition from an Elasticsearch‑based first generation to an InfluxDB‑driven second generation, the design of the Incluster storage layer, data aggregation strategies, and the implementation of high‑performance stream‑based alerting.

AlertingDistributed SystemsInfluxDB
0 likes · 12 min read
Evolution of Ctrip's Hickwall Monitoring and Alerting Platform: Architecture, InfluxDB Cluster, Data Aggregation, and Stream Alerting
Ctrip Technology
Ctrip Technology
Sep 12, 2018 · Operations

Intelligent Monitoring System for Ctrip Hotels: Design, Implementation, and Lessons Learned

This article describes the design and implementation of Ctrip's hotel intelligent monitoring platform, detailing its architecture, key components such as Smart, Mdata, Artemis, and Clog monitoring, the challenges of massive log data, and the achieved improvements in real‑time alerting and testing efficiency.

AutomationBig DataMonitoring
0 likes · 10 min read
Intelligent Monitoring System for Ctrip Hotels: Design, Implementation, and Lessons Learned
Ctrip Technology
Ctrip Technology
Jun 11, 2018 · Operations

Design and Implementation of a Production Traffic Replay System for Functional and Performance Testing

The article describes a production traffic replay system that records real user traffic, creates scalable pressure sources, supports both 4‑layer and 7‑layer protocols, and provides automated fail‑over and monitoring features to enable realistic functional and performance testing at large scale.

MonitoringPerformance Testingload testing
0 likes · 8 min read
Design and Implementation of a Production Traffic Replay System for Functional and Performance Testing
Ctrip Technology
Ctrip Technology
Aug 17, 2017 · Backend Development

Design and Implementation of Vipshop's Message Gateway

This article presents a comprehensive overview of Vipshop's message gateway redesign, covering its architectural positioning, internal modules, technical stack, monitoring, degradation strategies, and practical lessons learned to handle massive messaging traffic in a large‑scale e‑commerce environment.

KafkaMonitoringVenus RPC
0 likes · 13 min read
Design and Implementation of Vipshop's Message Gateway
Ctrip Technology
Ctrip Technology
Oct 14, 2016 · Operations

Qunar Network Device Operations Platform: Architecture, Features, and Continuous Optimization

This article presents the design, implementation, and ongoing improvements of Qunar's network device operations platform, detailing its background, optimization strategies, permission model, automated tasks, monitoring capabilities, and how it enhances operational efficiency while reducing risk.

Access ControlAuditAutomation
0 likes · 7 min read
Qunar Network Device Operations Platform: Architecture, Features, and Continuous Optimization
Ctrip Technology
Ctrip Technology
Sep 2, 2016 · Operations

Design and Implementation of Ctrip's Cloud Desktop System Based on OpenStack

This article details Ctrip's deployment of a large‑scale virtual cloud desktop solution for its call center, covering the motivations, original OpenStack architecture, its limitations, the redesigned decoupled architecture, and the operational practices such as resource over‑commit, network tuning, monitoring, and automated testing that ensure stability and scalability.

AutomationCloud DesktopMonitoring
0 likes · 13 min read
Design and Implementation of Ctrip's Cloud Desktop System Based on OpenStack
360 Tech Engineering
360 Tech Engineering
Sep 9, 2021 · Databases

PostgreSQL High‑Availability Cluster Deployment with Patroni and Etcd

This article details the design, deployment, configuration, operation, monitoring, and backup of a PostgreSQL high‑availability cluster built on Patroni, Etcd, and LVS at 360, covering hardware layout, software versions, installation steps, parameter tuning, fail‑over testing, and future outlook.

BackupClusterHigh Availability
0 likes · 16 min read
PostgreSQL High‑Availability Cluster Deployment with Patroni and Etcd
360 Tech Engineering
360 Tech Engineering
Jul 17, 2020 · Big Data

Qbus Service Overview: Architecture, Use Cases, and Implementation Details

This article introduces Qbus, a cloud‑based queue service built on Kafka, covering its architecture, core components such as log collection, SDKs, HDFS persistence, monitoring with Prometheus, business integration methods, use‑case scenarios, and future development directions.

Big DataCloud QueueHDFS
0 likes · 6 min read
Qbus Service Overview: Architecture, Use Cases, and Implementation Details
360 Tech Engineering
360 Tech Engineering
Jan 7, 2020 · Operations

Introduction to Prometheus and Grafana for Monitoring and Alerting

This article provides a comprehensive overview of using Prometheus and Grafana for metric collection, storage, querying with PromQL, visualization, and alerting, including exporter integration, metric types, high‑availability setups, and practical examples for modern microservice architectures.

AlertingGrafanaMetrics
0 likes · 10 min read
Introduction to Prometheus and Grafana for Monitoring and Alerting