Tagged articles

969 articles

Page 8 of 10

Jun 10, 2022 · Mobile Development

MDAP Stack Symbolization Service: Architecture, Implementation, and Optimization

The MDAP Stack Symbolization Service unifies high‑throughput address‑and symbol‑based stack resolution for iOS, Android native, Android Java, Web and React Native by parsing dSYM/ELF files and source‑map or ProGuard mappings, caching results in Redis (with RocksDB fallback), and exposing a gRPC API for fast, scalable de‑obfuscation.

DWARFObservabilitySourceMap

0 likes · 49 min read

MDAP Stack Symbolization Service: Architecture, Implementation, and Optimization

Top Architect

Jun 9, 2022 · Backend Development

Microservice Architecture and Design Patterns Overview

This article provides a comprehensive overview of microservice architecture, covering its core goals, design principles, various decomposition and integration patterns, database strategies, observability, resilience, deployment, and operational concerns, offering practical guidance for building scalable, maintainable services.

ArchitectureBackendDeployment

0 likes · 18 min read

Microservice Architecture and Design Patterns Overview

IT Architects Alliance

Jun 8, 2022 · Backend Development

Mastering Microservice Patterns: From Decomposition to Resilience

This article provides a comprehensive overview of common microservice patterns and design principles, covering goals such as cost reduction, faster releases, resilience, visibility, and detailing decomposition, integration, database, CQRS, observability, health‑check, and deployment strategies for building robust backend systems.

ArchitectureBlue‑Green deploymentCQRS

0 likes · 20 min read

Mastering Microservice Patterns: From Decomposition to Resilience

Alibaba Cloud Developer

Jun 8, 2022 · Fundamentals

eBPF Explained: Core Concepts, Use Cases, and Best Practices

eBPF is a kernel‑level sandbox technology that enables safe, high‑performance, programmable instrumentation for networking, security, and observability, and this article answers seven key questions covering its definition, applications, origins, usage steps, implementation details, best practices, and current ecosystem.

Kernel InstrumentationLinuxObservability

0 likes · 21 min read

eBPF Explained: Core Concepts, Use Cases, and Best Practices

Architect

Jun 6, 2022 · Backend Development

Microservice Architecture and Design Patterns

This article provides a comprehensive overview of microservice architecture, detailing its core objectives, design principles, various decomposition and integration patterns, database strategies, consistency mechanisms, observability techniques, and deployment practices for building resilient, scalable backend systems.

ArchitectureMicroservicesObservability

0 likes · 18 min read

Microservice Architecture and Design Patterns

Java Captain

Jun 1, 2022 · Operations

Migrating from Prometheus to Thanos for Scalable, Cost‑Effective Monitoring on Kubernetes

This article explains the limitations of a traditional Prometheus monitoring stack, demonstrates how Thanos provides unlimited long‑term storage and lower infrastructure costs, and walks through a complete multi‑cluster deployment on Kubernetes using Terraform and AWS.

KubernetesObservabilityPrometheus

0 likes · 16 min read

Migrating from Prometheus to Thanos for Scalable, Cost‑Effective Monitoring on Kubernetes

Efficient Ops

May 24, 2022 · Cloud Native

How AutoTagging and MultistageCodec Transform Cloud‑Native Observability

This article explores the challenges of building a unified observability data platform for hybrid‑cloud microservices, examines six common data‑island scenarios, and presents DeepFlow's AutoTagging and MultistageCodec techniques that dramatically reduce tagging overhead and storage costs while enabling seamless cross‑data correlation.

ClickHouseMicroservicesObservability

0 likes · 11 min read

How AutoTagging and MultistageCodec Transform Cloud‑Native Observability

MaGe Linux Operations

May 24, 2022 · Operations

Unlocking PromQL: How Nested Functional Queries Are Structured and Evaluated

This article explains the functional, nested nature of PromQL, its expression types, how queries are parsed and evaluated over time, and the differences between instant and range queries, providing code examples and visual insights for better monitoring with Prometheus.

ObservabilityPromQLPrometheus

0 likes · 11 min read

Unlocking PromQL: How Nested Functional Queries Are Structured and Evaluated

Snowball Engineer Team

May 24, 2022 · Cloud Native

How Snowball Used Apache APISIX to Build a Dual‑Active Architecture and Streamline Authentication

This article details Snowball's transition from a single‑datacenter setup to a dual‑active, cloud‑native architecture using Apache APISIX, covering background challenges, problem analysis, gateway selection, architectural adjustments, authentication unification, observability enhancements, ZooKeeper integration, and future plans.

Apache APISIXAuthenticationCloud Native

0 likes · 11 min read

How Snowball Used Apache APISIX to Build a Dual‑Active Architecture and Streamline Authentication

Architecture Digest

May 20, 2022 · Cloud Native

Introduction, Architecture, Deployment and Usage of Grafana Loki Log Aggregation System

This article introduces Grafana Loki, an open‑source, horizontally scalable, highly available log aggregation system optimized for Kubernetes and Prometheus, covering its core concepts, architecture, component roles, deployment steps, configuration examples, and practical usage within Grafana.

GrafanaKubernetesLoki

0 likes · 18 min read

Introduction, Architecture, Deployment and Usage of Grafana Loki Log Aggregation System

Programmer DD

May 16, 2022 · Cloud Native

Master Loki: Scalable Log Aggregation for Kubernetes and Prometheus

This guide introduces Loki, the open‑source, horizontally scalable log aggregation system optimized for Prometheus and Kubernetes, covering its core concepts, architecture, components, deployment steps, Grafana integration, label‑based indexing, and best practices for handling dynamic and high‑cardinality tags.

GrafanaKubernetesLoki

0 likes · 19 min read

Master Loki: Scalable Log Aggregation for Kubernetes and Prometheus

Alibaba Cloud Native

May 11, 2022 · Cloud Native

How Zuoyebang Cut 22% Costs with Kubernetes Serverless Virtual Nodes

Zuoyebang’s shift to cloud‑native architecture leveraged Alibaba Cloud’s Kubernetes Serverless virtual nodes, achieving a 22.5% cost reduction during peak traffic by dynamically scaling workloads, while addressing scheduling, observability, and performance challenges through custom schedulers, enhanced monitoring, and careful testing.

Cloud NativeCost OptimizationKubernetes

0 likes · 11 min read

How Zuoyebang Cut 22% Costs with Kubernetes Serverless Virtual Nodes

Tencent Cloud Developer

May 7, 2022 · Cloud Native

Fourth Techo TVP Developer Conference: Cloud Native Trends and Best Practices

The Fourth Techo TVP Developer Conference highlighted current cloud‑native adoption, FinOps cost‑optimization, distributed‑cloud strategies, and maturity models on Day 1, then showcased practical best‑practice case studies—from automotive edge computing to service‑mesh migration, hybrid‑cloud PaaS evolution, observability standards, and high‑performance API‑gateway deployments—on Day 2.

APISIXCloud NativeDevOps

0 likes · 33 min read

Fourth Techo TVP Developer Conference: Cloud Native Trends and Best Practices

MaGe Linux Operations

May 2, 2022 · Operations

How to Build a Scalable, Highly‑Available Monitoring Stack with Thanos, Prometheus & Grafana

Learn how to design a resilient, scalable monitoring solution for multi‑cluster Kubernetes environments using Thanos, Prometheus, and Grafana, covering architecture, data ingestion, querying, long‑term storage on S3, cost savings, and practical deployment tips.

ObservabilityThanoscloud-native

0 likes · 10 min read

How to Build a Scalable, Highly‑Available Monitoring Stack with Thanos, Prometheus & Grafana

Efficient Ops

Apr 27, 2022 · Operations

Why Choose Loki Over ELK? A Practical Guide to Scalable Log Aggregation

This article explains the motivations for selecting Grafana Loki instead of traditional ELK/EFK stacks, introduces Loki's core concepts and architecture, details component roles, provides step‑by‑step deployment of Promtail and Loki, and demonstrates how to configure and query logs in Grafana while addressing label indexing, dynamic tags, high‑cardinality challenges, and query performance.

GrafanaKubernetesLoki

0 likes · 18 min read

Why Choose Loki Over ELK? A Practical Guide to Scalable Log Aggregation

JD Retail Technology

Apr 27, 2022 · Industry Insights

How JD Achieves Seamless Stability During Massive Sales Events

The article reviews the Global Information System Stability Summit and JD's technical architect Li Junliang's detailed case study on the engineering practices, observability, chaos engineering, and resource‑scheduling innovations that enable JD’s e‑commerce platform to handle sales‑peak traffic that spikes hundreds of times over normal load.

Industry InsightsObservabilitychaos engineering

0 likes · 7 min read

How JD Achieves Seamless Stability During Massive Sales Events

Top Architect

Apr 27, 2022 · Backend Development

Comprehensive Guide to Backend Architecture: Microservices, Service Mesh, Observability, and Messaging

This article provides an in‑depth overview of modern backend architecture, covering microservice fundamentals, service mesh concepts, observability pillars, messaging queue choices, and practical design considerations such as service registration, configuration centers, and security mechanisms.

MessagingMicroservicesObservability

0 likes · 28 min read

Comprehensive Guide to Backend Architecture: Microservices, Service Mesh, Observability, and Messaging

Volcano Engine Developer Services

Apr 26, 2022 · Operations

How Volcano Engine’s TLS Transforms Log Management for Kubernetes at Scale

This article explains the challenges of traditional open‑source log collection in cloud‑native environments, describes Volcano Engine’s unified TLS architecture, its centralized configuration, CRD‑based deployment, and showcases real‑world case studies that demonstrate improved availability, efficiency, and scalability.

Cloud NativeDistributed SystemsKubernetes

0 likes · 15 min read

How Volcano Engine’s TLS Transforms Log Management for Kubernetes at Scale

dbaplus Community

Apr 25, 2022 · Operations

From Monitoring to Observability: Expert Insights on Evolving Cloud‑Native Operations

In this interview series, three industry experts explain how monitoring differs from observability, the shifts required for ops, developers, and architects, the core methodologies and technologies behind metrics, traces, and logs, and practical guidance for selecting and integrating observability tools in cloud‑native environments.

MetricsObservabilityOperations

0 likes · 16 min read

From Monitoring to Observability: Expert Insights on Evolving Cloud‑Native Operations

MaGe Linux Operations

Apr 22, 2022 · Backend Development

Essential Microservice Patterns: Decomposition, Integration & Observability

This article outlines the key microservice design patterns—including decomposition, integration, event‑driven, saga, and observability techniques—while explaining their goals, principles, and practical considerations such as database per service, CQRS, and cross‑cutting concerns like health checks and circuit breakers.

Backend ArchitectureDesign PatternsMicroservices

0 likes · 19 min read

Essential Microservice Patterns: Decomposition, Integration & Observability

Ops Development Stories

Apr 21, 2022 · Cloud Native

Essential Kubernetes Production Checklist for Web Services

A comprehensive, step‑by‑step checklist guides teams through documentation, application design, security, CI/CD, Kubernetes configuration, monitoring, testing, and 24/7 support to reliably run web services with HTTP APIs in production on Kubernetes.

DevOpsKubernetesObservability

0 likes · 9 min read

Essential Kubernetes Production Checklist for Web Services

政采云技术

Apr 19, 2022 · Cloud Native

A Practical Guide to Dapr Core Features: Pub/Sub, Resource Bindings, Actors, Observability, Secrets, and Configuration

This comprehensive technical tutorial demonstrates how to implement and configure core Dapr features, including publish/subscribe messaging, resource bindings, virtual actors, distributed tracing, secrets management, and dynamic configuration, using Java applications deployed on Kubernetes with practical code examples and command-line instructions.

Cloud NativeDaprKubernetes

0 likes · 21 min read

A Practical Guide to Dapr Core Features: Pub/Sub, Resource Bindings, Actors, Observability, Secrets, and Configuration

YunZhu Net Technology Team

Apr 15, 2022 · Operations

Design and Architecture of a Cloud‑Native Monitoring Platform for Business Systems

The document outlines the background, vision, current status, technical research, value, product and technical architecture, and functional design of a cloud‑native monitoring platform that integrates SkyWalking and Prometheus to provide comprehensive APM, resource utilization, alerting, and rapid fault localization for business and technical middle‑platform services.

APMMetricsObservability

0 likes · 10 min read

Design and Architecture of a Cloud‑Native Monitoring Platform for Business Systems

NetEase Smart Enterprise Tech+

Apr 14, 2022 · Operations

How to Build Precise Alerting with Prometheus to Eliminate Alert Storms

This article explains how to use Prometheus to create a precise, end‑to‑end alerting system that shortens detection and diagnosis time, integrates logs and metrics, routes alerts to the right owners, and prevents overwhelming alert storms in production environments.

AlertingDevOpsMetrics

0 likes · 10 min read

How to Build Precise Alerting with Prometheus to Eliminate Alert Storms

Alibaba Cloud Native

Apr 13, 2022 · Cloud Native

From Dapper to OpenTelemetry: A Practical Guide to Distributed Tracing and Observability

This article explains the challenges of long request chains in micro‑service architectures, reviews Google’s Dapper tracing requirements, introduces OpenTracing and OpenCensus standards, compares their strengths, and details how OpenTelemetry unifies tracing, metrics and logs with practical integration steps and best‑practice guidance.

Cloud NativeDistributed TracingMetrics

0 likes · 24 min read

From Dapper to OpenTelemetry: A Practical Guide to Distributed Tracing and Observability

DevOps

Apr 12, 2022 · Operations

Understanding Observability: Core Concepts, SRE Methodology, AIOps, and Business Architecture

The article explains the rising importance of observability in modern operations, defines its control‑theory roots, breaks it down into metrics, traces and logs, and argues that successful implementation requires three pillars—SRE practices, AIOps algorithms, and deep business‑architecture knowledge—together with well‑designed SLOs and critical‑path mapping.

ObservabilitySREaiops

0 likes · 10 min read

Understanding Observability: Core Concepts, SRE Methodology, AIOps, and Business Architecture

Alibaba Cloud Native

Apr 3, 2022 · Cloud Native

How to Achieve Full Observability for Performance Testing with Prometheus

This guide explains the essential observability concepts—metrics, logs, and traces—for performance testing, compares Zabbix and Prometheus, shows how to extend JMeter with a Prometheus exporter, and details step‑by‑step integration of Alibaba Cloud PTS and Grafana dashboards for comprehensive monitoring.

Cloud NativeObservabilityPrometheus

0 likes · 9 min read

How to Achieve Full Observability for Performance Testing with Prometheus

SQB Blog

Apr 2, 2022 · Operations

Designing a Next‑Gen Observability Platform: From Zipkin to Hera

This article chronicles the evolution of a company's monitoring system from a Zipkin‑based tracing solution to a cloud‑native observability platform called Hera, detailing design goals, technology choices, challenges with MySQL storage, and the adoption of Prometheus‑compatible metrics, Jaeger tracing, and Kubernetes operators.

Distributed TracingObservabilityPrometheus

0 likes · 22 min read

Designing a Next‑Gen Observability Platform: From Zipkin to Hera

Laravel Tech Community

Mar 29, 2022 · Backend Development

Apache APISIX 2.13.0 LTS Release: New Features, Observability, Multi‑language Support, and Bug Fixes

The Apache APISIX community announced the 2.13.0 LTS release, enhancing stability, adding observability plugins, a new OpenTelemetry tracing plugin, multi‑language (Wasm, Python, Go) support, and a comprehensive list of bug fixes and improvements.

Apache APISIXLTS ReleaseObservability

0 likes · 7 min read

Apache APISIX 2.13.0 LTS Release: New Features, Observability, Multi‑language Support, and Bug Fixes

Aikesheng Open Source Community

Mar 29, 2022 · Databases

Performance Tuning and Observation Techniques for dble Using BenchmarkSQL

This article shares practical configuration recommendations, system‑resource monitoring methods, and thread‑adjustment strategies for optimizing dble performance during BenchmarkSQL TPC‑C style load testing, highlighting how observable metrics guide effective tuning of the middleware and underlying MySQL nodes.

BenchmarkSQLObservabilitythread optimization

0 likes · 10 min read

Performance Tuning and Observation Techniques for dble Using BenchmarkSQL

StarRocks

Mar 28, 2022 · Backend Development

Scaling Microservice Tracing with Zipkin and StarRocks: A Practical Guide

This article explains how Sohu Smart Media built a high‑performance tracing system for microservices by integrating Zipkin for data collection with StarRocks for storage and analytics, covering architecture, data models, SQL queries, Flink processing, and real‑world results that boost observability and engineering efficiency.

FlinkMicroservicesObservability

0 likes · 31 min read

Scaling Microservice Tracing with Zipkin and StarRocks: A Practical Guide

Architects Research Society

Mar 25, 2022 · Operations

Understanding Observability: Importance, Benefits, Challenges, and Best Practices

Observability measures a system’s current state using telemetry such as logs, metrics, and traces, enabling IT, DevOps, and SRE teams to detect, diagnose, and resolve issues in complex multi‑cloud environments while delivering better performance, reliability, and business outcomes.

Cloud NativeDevOpsIT Operations

0 likes · 19 min read

Understanding Observability: Importance, Benefits, Challenges, and Best Practices

Sohu Tech Products

Mar 23, 2022 · Big Data

Microservice Tracing with Zipkin and StarRocks: Architecture and Practice

This article describes how Sohu Intelligent Media built a microservice tracing system using Zipkin for data collection and StarRocks for storage and analysis, covering architecture, data model, ingestion pipeline, SQL analytics, performance monitoring, and future improvements.

MicroserviceObservabilityStarRocks

0 likes · 27 min read

Microservice Tracing with Zipkin and StarRocks: Architecture and Practice

Open Source Linux

Mar 18, 2022 · Operations

Evolution of Open‑Source Monitoring Tools: From Nagios to Prometheus

This article traces the development of open‑source monitoring solutions from early tools like Nagios and Cacti through modern platforms such as Prometheus and Nightingale, comparing their strengths, weaknesses, and typical use cases while also looking ahead to emerging observability trends in cloud‑native environments.

NagiosObservabilityOperations

0 likes · 14 min read

Evolution of Open‑Source Monitoring Tools: From Nagios to Prometheus

Efficient Ops

Mar 16, 2022 · Operations

Why Traditional Monitoring Fails and Observability Is the Future for Ops Teams

Drawing from years of ops experience, the author recounts the decline of traditional monitoring, the rise of automated dashboards, the challenges of AIOps and observability, and proposes a shift toward data‑driven, business‑focused capability building to make alerts truly useful.

ObservabilitySREaiops

0 likes · 13 min read

Why Traditional Monitoring Fails and Observability Is the Future for Ops Teams

DataFunTalk

Mar 11, 2022 · Cloud Native

Operator‑Based Log Collection and the Evolution of Loggie in Cloud‑Native Environments

This article recounts NetEase's journey from early host‑based log collection to operator‑driven Kubernetes logging, discusses the challenges of large‑scale log ingestion, evaluates existing agents, and introduces the open‑source Loggie project with its architecture, features, performance gains, and roadmap.

KubernetesLoggieObservability

0 likes · 12 min read

Operator‑Based Log Collection and the Evolution of Loggie in Cloud‑Native Environments

Open Source Linux

Mar 8, 2022 · Operations

Master Kubernetes Troubleshooting: The Three Pillars Every Engineer Needs

This article breaks down Kubernetes troubleshooting into three essential steps—understanding the failure, managing the response, and preventing recurrence—while mapping key monitoring, observability, and incident‑response tools to each phase for reliable cloud‑native operations.

KubernetesObservabilityOperations

0 likes · 8 min read

Master Kubernetes Troubleshooting: The Three Pillars Every Engineer Needs

Ops Development Stories

Mar 3, 2022 · Operations

What Exactly Does an SRE Do? Unpacking Roles, Skills, and Practices

This article explains the SRE role originated by Google, outlines its core responsibilities such as automation, observability, incident response, testing, capacity planning, and SLI/SLO/SLA management, and highlights the skills and cultural practices needed for reliable service operations.

ObservabilitySLASLI

0 likes · 29 min read

What Exactly Does an SRE Do? Unpacking Roles, Skills, and Practices

Alibaba Cloud Native

Mar 1, 2022 · Cloud Native

How Alibaba’s KubeProbe Tackles Large‑Scale Kubernetes Stability Challenges

This article explains how Alibaba Cloud's self‑built KubeProbe combines universal link probing and targeted inspections to detect, diagnose, and remediate issues in massive multi‑cluster Kubernetes environments, improving reliability and reducing on‑call overhead.

ChatOpsCloud NativeInfrastructure

0 likes · 19 min read

How Alibaba’s KubeProbe Tackles Large‑Scale Kubernetes Stability Challenges

政采云技术

Mar 1, 2022 · Cloud Native

Introduction to Dapr: Features, Architecture, and Installation Guide

This article introduces Dapr, a cloud‑native sidecar runtime for building resilient microservices, explains its core features such as service invocation, state management, pub/sub, bindings, actors, observability, and secrets, and provides step‑by‑step installation instructions for CLI, binaries, Kubernetes, and Helm.

Cloud NativeDaprInstallation

0 likes · 10 min read

Introduction to Dapr: Features, Architecture, and Installation Guide

Alibaba Cloud Native

Feb 28, 2022 · Cloud Native

How to Observe and Diagnose DNS Failures in Kubernetes Clusters

This article explains how DNS operates inside Kubernetes, enumerates common failure causes, describes CoreDNS's built‑in observability plugins, introduces BPF‑based client‑side diagnostics, and provides a step‑by‑step troubleshooting workflow to identify and resolve DNS issues in cloud‑native environments.

BPFCoreDNSDNS

0 likes · 18 min read

How to Observe and Diagnose DNS Failures in Kubernetes Clusters

21CTO

Feb 24, 2022 · Backend Development

42 Hard‑Earned Lessons for Building Reliable Production Databases

This article translates Mahesh Balakrishnan’s 42‑point guide on building production databases, covering customer focus, project management, design principles, code review practices, strategy, observability, and research, offering concrete advice for engineers and teams creating robust backend systems.

Code reviewObservabilityProduction Systems

0 likes · 12 min read

42 Hard‑Earned Lessons for Building Reliable Production Databases

Laravel Tech Community

Feb 20, 2022 · Backend Development

Highlights of .NET 7 Preview 1: Nullable Annotations, Observability, Code Generation, and New APIs

The article outlines the major features of .NET 7 Preview 1, including nullable annotations for Microsoft.Extensions libraries, enhancements to tracing APIs, code‑generation improvements, dynamic PGO and Arm64 support, p/invoke source generation, new System.Text.Json APIs, and expanded hot‑reload capabilities.

Nullable AnnotationsObservabilitycode-generation

0 likes · 5 min read

MaGe Linux Operations

Feb 19, 2022 · Cloud Native

Kubernetes Hits Mainstream: Key Insights from CNCF’s 2021 Cloud‑Native Survey

According to CNCF’s 2021 Cloud‑Native Survey, 96% of organizations are using or evaluating Kubernetes, marking its transition to mainstream, with rapid growth in developer adoption, container runtimes, and related projects, while highlighting emerging trends in edge, observability, and security for 2022.

CNCF SurveyCloud NativeKubernetes

0 likes · 8 min read

Kubernetes Hits Mainstream: Key Insights from CNCF’s 2021 Cloud‑Native Survey

Ctrip Technology

Feb 17, 2022 · Operations

Evolution and Architecture of the Hickwall Enterprise Monitoring Platform

The article details the background, challenges, multi‑year evolution, current architecture, and future roadmap of Hickwall, Ctrip's enterprise‑grade monitoring and observability platform, covering metrics, logs, traces, high‑cardinality handling, cloud‑native integration, alert governance, and storage engine migrations.

AlertingObservabilityOperations

0 likes · 15 min read

Evolution and Architecture of the Hickwall Enterprise Monitoring Platform

Alibaba Cloud Native

Feb 11, 2022 · Cloud Native

What New Features and ACK Enhancements Arrive with Kubernetes 1.22?

This FAQ outlines the new Kubernetes 1.22 capabilities, the components Alibaba Cloud ACK upgrades for this version, added observability, stability and performance improvements, and key upgrade considerations such as deprecated APIs and runtime changes.

ACKCloud NativeKubernetes

0 likes · 6 min read

What New Features and ACK Enhancements Arrive with Kubernetes 1.22?

Efficient Ops

Feb 7, 2022 · Operations

Mastering Application Monitoring with Prometheus: Practical Metrics and Grafana Tips

This article explains how to design effective Prometheus metrics for various application types, choose appropriate vectors, labels, and buckets, and offers Grafana tricks for visualizing dimensions and linking tooltips, providing a comprehensive guide for robust observability.

GrafanaMetricsObservability

0 likes · 10 min read

Mastering Application Monitoring with Prometheus: Practical Metrics and Grafana Tips

MaGe Linux Operations

Feb 2, 2022 · Operations

Master Prometheus Metrics: Best Practices for Effective Monitoring

This article outlines practical Prometheus monitoring techniques, covering how to choose metrics, define labels, select vectors and buckets, and use Grafana tips to build reliable observability for various application types.

GrafanaMetricsObservability

0 likes · 8 min read

Master Prometheus Metrics: Best Practices for Effective Monitoring

Baidu Tech Salon

Jan 27, 2022 · Cloud Native

How China Unicom’s Service Mesh Evolved: From SDKs to Sidecars and Beyond

This article details China Unicom Software Research Institute's multi‑year journey of adopting Kubernetes‑based service mesh, outlining the evolution from SDK‑driven microservices to sidecar‑based architectures, migration strategies with Baidu, performance optimizations, observability enhancements, and future product roadmaps.

Cloud NativeIstioKubernetes

0 likes · 13 min read

How China Unicom’s Service Mesh Evolved: From SDKs to Sidecars and Beyond

ITFLY8 Architecture Home

Jan 26, 2022 · Operations

Mastering Microservice Monitoring, Fault Tolerance, and Security: A Complete Guide

This article explains how to monitor microservice architectures, describes log, tracing, and metric monitoring, compares open‑source tracing tools, outlines fault‑tolerance strategies such as timeout, rate‑limiting, degradation, async buffering and circuit breaking, and details access‑security mechanisms including gateway authentication, service‑side auth, and OAuth2.0 token flows, while also introducing container technology and its role in microservice deployment.

ContainersMicroservicesObservability

0 likes · 43 min read

Mastering Microservice Monitoring, Fault Tolerance, and Security: A Complete Guide

MaGe Linux Operations

Jan 22, 2022 · Cloud Native

Boost Kubernetes Monitoring: Migrate from Prometheus to Thanos for Scalable Low‑Cost Metrics

This article examines the limitations of a standard Prometheus‑based monitoring stack on Kubernetes, explains how adopting Thanos improves metric retention and reduces infrastructure costs, and provides a detailed multi‑cluster deployment guide with Terraform, TLS configuration, and Grafana visualization.

KubernetesObservabilityPrometheus

0 likes · 16 min read

Boost Kubernetes Monitoring: Migrate from Prometheus to Thanos for Scalable Low‑Cost Metrics

Efficient Ops

Jan 20, 2022 · Operations

Mastering Prometheus Metrics: Best Practices for Effective Monitoring

This article outlines practical guidelines for designing Prometheus metrics, covering how to define monitoring targets, choose appropriate vectors and labels, name metrics and labels correctly, select histogram buckets, and leverage Grafana features to visualize and troubleshoot data effectively.

GrafanaMetricsObservability

0 likes · 11 min read

Mastering Prometheus Metrics: Best Practices for Effective Monitoring

Baidu Geek Talk

Jan 12, 2022 · Backend Development

Serverless Architecture Evolution: Baidu Search Content Platform's FaaS and Intelligent Transformation

Baidu’s search content platform transitioned to a serverless, FaaS‑based architecture with intelligent scheduling and automated control, cutting resource waste by 87%, boosting automatic recovery to 96.7%, and delivering roughly tenfold productivity gains across development, deployment, and maintenance while simplifying scalability and high‑availability concerns.

FaaSIntelligent SchedulingObservability

0 likes · 27 min read

Serverless Architecture Evolution: Baidu Search Content Platform's FaaS and Intelligent Transformation

Java High-Performance Architecture

Jan 12, 2022 · Cloud Native

Mastering Service Mesh with Istio: A Hands‑On Guide to Traffic, Security, and Observability

This tutorial explains the fundamentals of service mesh, explores Istio’s architecture and core components, and provides step‑by‑step instructions for installing Istio on Kubernetes, deploying a sample microservice application, and leveraging traffic management, mutual TLS, observability, and advanced use cases such as routing, circuit breaking, and JWT‑based access control.

IstioKubernetesObservability

0 likes · 22 min read

Mastering Service Mesh with Istio: A Hands‑On Guide to Traffic, Security, and Observability

HaoDF Tech Team

Jan 11, 2022 · Big Data

Using ClickHouse for Real‑Time Log Analytics and Data Storage in Microservice Governance at Haodf

The article describes how Haodf's SRE team replaced Elasticsearch with ClickHouse to handle massive microservice logs, achieve low‑latency queries, reduce storage costs, and support real‑time monitoring, tracing, and metric analysis through columnar OLAP features, sharding, TTL, and materialized views.

AnalyticsBig DataClickHouse

0 likes · 25 min read

Using ClickHouse for Real‑Time Log Analytics and Data Storage in Microservice Governance at Haodf

Architecture Digest

Jan 9, 2022 · Cloud Native

Introduction to Service Mesh and Istio: Concepts, Architecture, and Practical Usage

This tutorial explains the fundamentals of service mesh, details Istio's architecture and core components, and provides step‑by‑step instructions for installing Istio on Kubernetes, deploying a sample microservice application, and leveraging traffic management, security, and observability features.

IstioKubernetesObservability

0 likes · 18 min read

Introduction to Service Mesh and Istio: Concepts, Architecture, and Practical Usage

IT Architects Alliance

Jan 7, 2022 · Cloud Native

Introduction to Service Mesh and Istio: Concepts, Architecture, and Practical Deployment

This tutorial explains the fundamentals of service mesh, outlines Istio’s architecture and core components, demonstrates how to install and configure Istio on Kubernetes, and showcases common use cases such as traffic management, security, observability, and alternatives, providing a comprehensive guide for modern micro‑service deployments.

IstioMicroservicesObservability

0 likes · 18 min read

Architect

Jan 5, 2022 · Cloud Native

Introduction to Service Mesh and Istio: Concepts, Architecture, and Hands‑On Guide

This tutorial explains the fundamentals of service mesh, outlines Istio’s architecture and core components, demonstrates how to install Istio on Kubernetes, and walks through practical examples such as traffic routing, security policies, observability, and common use‑cases, while also comparing alternative solutions.

IstioKubernetesMicroservices

0 likes · 20 min read

Introduction to Service Mesh and Istio: Concepts, Architecture, and Hands‑On Guide

Tencent Cloud Developer

Dec 23, 2021 · Cloud Native

An Overview of OpenTelemetry: Origins, Architecture, and Instrumentation

OpenTelemetry unifies tracing, metrics, and logs by merging OpenTracing and OpenCensus into a cross‑language specification, collector, language SDKs, and instrumentation libraries, offering vendor‑agnostic, low‑maintenance telemetry collection that separates data gathering from business logic while requiring external back‑ends for storage and analysis.

Cloud NativeCollectorInstrumentation

0 likes · 10 min read

An Overview of OpenTelemetry: Origins, Architecture, and Instrumentation

Qingyun Technology Community

Dec 22, 2021 · Cloud Native

What’s New in KubeSphere 3.2.1? Key Features, Fixes, and Upgrade Guide

Version 3.2.1 of the open‑source KubeSphere platform introduces a series of enhancements—including container group status filtering, improved image builder dialogs, expanded quota visibility, numerous UI bug fixes, and updated DevOps pipelines—alongside detailed installation and upgrade instructions for Linux and Kubernetes environments.

Cloud NativeKubeSphereKubernetes

0 likes · 8 min read

What’s New in KubeSphere 3.2.1? Key Features, Fixes, and Upgrade Guide

21CTO

Dec 20, 2021 · Cloud Native

Why Cloud‑Native Architecture Is the Future of SaaS and How to Implement It

This article explains what cloud‑native architecture is, why it is essential for modern SaaS businesses, and provides a step‑by‑step guide—including serverless migration, elasticity, observability, resilience, and automation—on how to adopt it using Alibaba Cloud SAE and related services.

MicroservicesObservabilitySaaS

0 likes · 22 min read

Why Cloud‑Native Architecture Is the Future of SaaS and How to Implement It

Java Architecture Diary

Dec 13, 2021 · Backend Development

Essential Java & Cloud Native Resources: From JDK 17 to GraalVM, Spring & More

This curated collection gathers essential articles and tutorials covering Java 8‑17 updates, GraalVM performance tricks, Spring Native adoption, Spring Cloud and RSocket alternatives, GraphQL frameworks, observability stacks like Grafana, Prometheus and Loki, IDE enhancements, database fundamentals, and low‑code platform building, providing a comprehensive knowledge base for modern backend developers.

Observabilitydatabasesgraalvm

0 likes · 4 min read

Essential Java & Cloud Native Resources: From JDK 17 to GraalVM, Spring & More

Tencent Cloud Middleware

Dec 9, 2021 · Cloud Native

Why Observability Is the Missing Piece for Day‑2 Success in Cloud‑Native and Serverless Systems

The article explains how observability—through logs, metrics, and traces—transforms the opaque, complex day‑2 operations of micro‑service, Kubernetes, and serverless environments into a deterministic, diagnosable system, highlighting OpenTelemetry, practical collection methods, and real‑world implementation challenges and benefits.

ObservabilityOpenTelemetryServerless

0 likes · 17 min read

Why Observability Is the Missing Piece for Day‑2 Success in Cloud‑Native and Serverless Systems

Alibaba Cloud Native

Dec 7, 2021 · Cloud Native

Unlocking the Third Way of Distributed Tracing: Post‑Aggregation Link Analysis Explained

This article introduces the third, post‑aggregation approach to link tracing—link analysis—showing how real‑time aggregation of stored trace data can quickly pinpoint uneven traffic, single‑machine failures, slow interfaces, business‑level traffic shifts, and gray‑release anomalies while outlining its practical constraints.

APMCloud NativeLink Analysis

0 likes · 11 min read

Unlocking the Third Way of Distributed Tracing: Post‑Aggregation Link Analysis Explained

Laravel Tech Community

Dec 2, 2021 · Cloud Native

New Features in Apache APISIX 2.11.0: LDAP Authentication, Observability Plugins, Azure Functions, and WASM Support

Apache APISIX 2.11.0 adds an LDAP‑based authentication plugin, expands observability with Datadog and SkyWalking plugins, introduces Azure Functions integration, provides early WASM support, and enhances existing plugins, all illustrated with detailed configuration examples and code snippets.

Azure FunctionsLDAPObservability

0 likes · 8 min read

New Features in Apache APISIX 2.11.0: LDAP Authentication, Observability Plugins, Azure Functions, and WASM Support

GrowingIO Tech Team

Dec 2, 2021 · Cloud Native

Mastering Chaos Mesh: A Hands‑On Guide to Cloud‑Native Chaos Engineering

Chaos Mesh is an open‑source cloud‑native chaos engineering platform that lets you experiment with fault injection across Kubernetes environments, offering visual dashboards, extensive fault types, and step‑by‑step installation and experiment creation guides to help teams uncover system weaknesses and improve resilience.

Chaos MeshFault InjectionKubernetes

0 likes · 12 min read

Mastering Chaos Mesh: A Hands‑On Guide to Cloud‑Native Chaos Engineering

Efficient Ops

Nov 24, 2021 · Operations

Why Switch to Loki? Step‑by‑Step Installation and Grafana Visualization

This guide explains why Loki is a lightweight alternative to EFK/ELK, walks through installing Loki and Promtail binaries, configuring them with YAML files, and visualizing logs in Grafana using LogQL, providing a complete end‑to‑end log management solution.

GrafanaLog ManagementLoki

0 likes · 6 min read

Why Switch to Loki? Step‑by‑Step Installation and Grafana Visualization

Baidu Geek Talk

Nov 24, 2021 · Operations

How Baidu’s Fengjing Uses Holographic Logs to Debug Massive Microservices

Baidu’s Fengjing monitoring platform tackles the daunting challenge of pinpointing failures in its massive Java‑based microservice ecosystem by employing a non‑intrusive probe that captures log metadata, stores it in a database, and reconstructs full request‑level logs with minimal storage overhead.

Distributed TracingMicroservicesObservability

0 likes · 9 min read

How Baidu’s Fengjing Uses Holographic Logs to Debug Massive Microservices

Efficient Ops

Nov 16, 2021 · Operations

How to Build a Scalable Prometheus Monitoring System with Thanos on Kubernetes

This article explains why monitoring is essential for production stability, compares white‑box and black‑box approaches, and provides a step‑by‑step guide to deploying Prometheus, configuring scrape targets, using Pushgateway and Alertmanager, and scaling the solution with Thanos in a Kubernetes environment.

AlertmanagerObservabilityPrometheus

0 likes · 21 min read

How to Build a Scalable Prometheus Monitoring System with Thanos on Kubernetes

Code Ape Tech Column

Nov 15, 2021 · Operations

A Comprehensive Guide to Using Apache SkyWalking for Distributed Tracing, Logging, and Performance Analysis

This article introduces Apache SkyWalking as a powerful open‑source APM solution, compares it with Spring Cloud Sleuth+ZipKin, explains its architecture, walks through server and client setup, data persistence, log collection, performance profiling, alert configuration, and provides practical code snippets and configuration examples.

Distributed TracingObservabilitySkyWalking

0 likes · 14 min read

A Comprehensive Guide to Using Apache SkyWalking for Distributed Tracing, Logging, and Performance Analysis

Open Source Linux

Oct 31, 2021 · Operations

Designing Effective Metrics: From Requirements to Labels and Buckets

This guide explains how to define, name, and organize monitoring metrics—covering Google’s four golden indicators, system‑specific measurement objects, vector selection, label conventions, bucket design, and practical Grafana tips—for reliable observability of diverse services.

MetricsObservabilitylabeling

0 likes · 10 min read

Designing Effective Metrics: From Requirements to Labels and Buckets

Top Architect

Oct 17, 2021 · Cloud Native

How Redis Simplifies Microservice Design Patterns, Distributed Transactions, and Observability

This article explains how Redis can be used to implement and simplify a wide range of microservice design patterns—including bounded contexts, asynchronous messaging, orchestrated sagas, transaction inboxes, telemetry, event sourcing, CQRS, and shared data—while improving performance, scalability, and observability in cloud‑native architectures.

CQRSCloud NativeMicroservices

0 likes · 16 min read

How Redis Simplifies Microservice Design Patterns, Distributed Transactions, and Observability

Alibaba Cloud Native

Oct 10, 2021 · Cloud Native

How to Detect Service and Workload Anomalies in Kubernetes with Advanced Monitoring

This article explains the common pain points of locating anomalies in Kubernetes environments and presents a multi‑layer monitoring framework—trace, metrics, events, and alerts—along with best‑practice scenarios such as network performance, DNS issues, full‑link stress testing, external MySQL access, and multi‑tenant architectures.

DNSKubernetesMetrics

0 likes · 20 min read

How to Detect Service and Workload Anomalies in Kubernetes with Advanced Monitoring

21CTO

Sep 27, 2021 · Cloud Native

Why Loki Beats ELK for Kubernetes Logging: Architecture, Deployment, and Query Guide

This article explains the motivation behind choosing Loki over heavyweight ELK/EFK stacks for container‑cloud logging, outlines Loki's lightweight architecture and components, provides step‑by‑step deployment instructions on OpenShift/Kubernetes, and demonstrates how to query logs using the LogQL language and HTTP API.

Cloud NativeKubernetesLogQL

0 likes · 17 min read

Why Loki Beats ELK for Kubernetes Logging: Architecture, Deployment, and Query Guide

21CTO

Sep 26, 2021 · Backend Development

How Baidu’s Hulk Framework Accelerates Go Service Development

The Hulk framework, built on GDP2, provides a business‑oriented Go web development platform with out‑of‑the‑box components, standardized architecture, rich observability, and tooling that together improve code quality, development speed, and SRE efficiency for large‑scale short‑video services.

BackendFrameworkGo

0 likes · 18 min read

How Baidu’s Hulk Framework Accelerates Go Service Development

Top Architect

Sep 24, 2021 · Cloud Native

Loki Log System Overview, Architecture, and Deployment Guide

This article introduces Loki, a lightweight log aggregation system for Kubernetes, explains its background and motivations, details its simple architecture and core components (Distributor, Ingester, Querier), discusses scalability and storage options, and provides step‑by‑step deployment instructions with example YAML and shell commands.

Cloud NativeDeploymentKubernetes

0 likes · 16 min read

Loki Log System Overview, Architecture, and Deployment Guide

IT Architects Alliance

Sep 20, 2021 · Operations

Why Loki Beats ELK for Kubernetes Logging: Architecture and Deployment Guide

This article explains the motivations behind choosing Loki over ELK for container‑cloud logging, details Loki's lightweight architecture—including Distributor, Ingester, and Querier components—covers deployment steps on OpenShift/Kubernetes with YAML manifests, and demonstrates LogQL query syntax for efficient log retrieval.

KubernetesLogQLLoki

0 likes · 18 min read

Why Loki Beats ELK for Kubernetes Logging: Architecture and Deployment Guide

Alibaba Cloud Native

Sep 16, 2021 · Cloud Native

How to Use Kubernetes Monitoring for End-to-End Application Architecture Exploration

This session explains why Kubernetes monitoring is essential for end-to-end observability, describes the five data sources and layers it covers, and walks through discovering and locating architecture, performance, resource, scheduling, and network issues using topology, anomaly detection, and correlation techniques.

ArchitectureCloud NativeKubernetes

0 likes · 13 min read

How to Use Kubernetes Monitoring for End-to-End Application Architecture Exploration

IT Architects Alliance

Sep 15, 2021 · Backend Development

Comprehensive Guide to Backend Architecture: Microservices, Service Mesh, Messaging, and Observability

This article provides a detailed overview of modern backend architecture, covering microservice fundamentals, design principles such as Conway's Law and DDD, gateway patterns, communication protocols, service registration, configuration management, observability pillars, service mesh options, and a comparative analysis of popular message‑queue technologies.

MicroservicesObservabilitybackend-architecture

0 likes · 27 min read

Comprehensive Guide to Backend Architecture: Microservices, Service Mesh, Messaging, and Observability

HomeTech

Sep 15, 2021 · Backend Development

How ASF Simplifies gRPC‑to‑Go Migration and Boosts Service Governance

This article explains the AutoHome Service Framework (ASF), its architecture, how it enables seamless migration from gRPC to Go services, the added Dubbo‑go support, configuration optimizations, advanced load‑balancing strategies, observability enhancements, and future plans for adaptive balancing and zero‑downtime deployments.

GoMicroservicesObservability

0 likes · 18 min read

How ASF Simplifies gRPC‑to‑Go Migration and Boosts Service Governance

Dada Group Technology

Sep 10, 2021 · Operations

Design and Implementation of JD Daojia Log System Based on Loki

This document details the motivation, architecture, components, query language, and deployment of a Loki‑based log collection and analysis platform for JD Daojia, comparing it with ELK, describing ingestion, real‑time and historical log handling, technical challenges, configuration examples, and future scaling plans.

GrafanaLog ManagementLoki

0 likes · 15 min read

Design and Implementation of JD Daojia Log System Based on Loki

Baidu Intelligent Testing

Sep 9, 2021 · Cloud Native

Observability Practices in Baidu Search Platform: Real‑time Metrics, Tracing, Logging, and Topology at Hundred‑Billion Scale

This article explains how Baidu's search middle‑platform adopts cloud‑native observability—covering metrics, distributed tracing, log querying, and topology analysis—to ensure high availability, performance, and controllability for a system handling hundreds of billions of requests across millions of micro‑service instances.

Observabilityloggingtopology

0 likes · 12 min read

Observability Practices in Baidu Search Platform: Real‑time Metrics, Tracing, Logging, and Topology at Hundred‑Billion Scale

Efficient Ops

Sep 5, 2021 · Operations

Why Prometheus’s TSDB Makes Monitoring Scalable: A Deep Dive

This article explains how Prometheus’s time‑series database handles massive monitoring data, illustrates practical query examples, and shows why its storage engine and pre‑computation features enable efficient, high‑performance observability for large‑scale services.

ObservabilityPrometheusTSDB

0 likes · 8 min read

Why Prometheus’s TSDB Makes Monitoring Scalable: A Deep Dive

Alibaba Cloud Native

Sep 2, 2021 · Cloud Native

2021 GIAC Cloud Native Conference Highlights: Service Mesh, SkyWalking, Dubbogo 3.0

The article summarizes key insights from the 2021 GIAC Cloud Native conference, covering strategies to limit service explosion radius, SkyWalking-based Kubernetes event monitoring, Kuaishou's Service Mesh implementation, and Dubbogo 3.0's innovations such as proxyless mesh and adaptive throttling.

Cloud NativeKubernetesObservability

0 likes · 13 min read

2021 GIAC Cloud Native Conference Highlights: Service Mesh, SkyWalking, Dubbogo 3.0

Alibaba Cloud Native

Sep 1, 2021 · Cloud Computing

Understanding Serverless: Architecture, Workflow, and Observability

This article explains the concept of Serverless computing, its components (FaaS and BaaS), typical development workflow, supporting tools, Serverless Workflow orchestration, and observability features such as metrics, logging, and tracing.

BaaSCloud ComputingObservability

0 likes · 12 min read

Understanding Serverless: Architecture, Workflow, and Observability

DevOps

Aug 31, 2021 · Backend Development

Designing an Uber‑Like Microservice System with DDD, OpenTelemetry Observability, and Reinforced Chaos Engineering

This article describes how to model a complex Uber‑style ride‑hailing system using Domain‑Driven Design, implement it with Java Spring Boot microservices, instrument it with OpenTelemetry for full observability, and validate the observability pipeline through a gamified chaos‑engineering approach that reduces MTTR.

DDDMicroservicesObservability

0 likes · 13 min read

Designing an Uber‑Like Microservice System with DDD, OpenTelemetry Observability, and Reinforced Chaos Engineering

Open Source Linux

Aug 26, 2021 · Cloud Native

Why Switch from Prometheus to Thanos? Boost Metric Retention & Cut Costs

This article explains the limitations of a traditional Prometheus‑based monitoring stack for Kubernetes, demonstrates how integrating Thanos improves metric retention, scalability, and storage cost, and provides a complete multi‑cluster deployment example with Terraform and Helm configurations.

Cloud NativeKubernetesObservability

0 likes · 15 min read

Why Switch from Prometheus to Thanos? Boost Metric Retention & Cut Costs

21CTO

Aug 20, 2021 · Backend Development

How the Hulk Framework Boosts Go Service Development and Operations

This article explains the background, design, components, ecosystem, and real‑world benefits of the Hulk Go web framework developed by the short‑video R&D team, showing how it improves development efficiency, code quality, performance, observability, and incident response for large‑scale microservices.

GoObservabilityWeb framework

0 likes · 19 min read

How the Hulk Framework Boosts Go Service Development and Operations

MaGe Linux Operations

Aug 14, 2021 · Operations

Boost System Reliability: 4 Proven Practices to Master Observability

This article explains why observability is essential for DevOps, outlines four key practices—including production‑environment monitoring, structured logging, a DevOps‑focused culture, and pre‑deployment observability with remote debugging—to help teams detect, diagnose, and prevent issues throughout the software lifecycle.

CI/CDCultureDevOps

0 likes · 9 min read

Boost System Reliability: 4 Proven Practices to Master Observability

Java Architecture Diary

Aug 11, 2021 · Operations

Unlock Loki v2.3.0: Custom Retention, Deletion & Recording Rules Explained

Version 2.3.0 of Loki introduces enhanced features such as per‑tenant custom retention policies, time‑range log deletion via the Compactor API, Prometheus‑style recording rules, a new pattern parser for LogQL, ingestion sharding for faster queries, and advanced IP‑matching syntax, all aimed at improving storage efficiency, compliance, and observability.

Log ManagementLogQLLoki

0 likes · 9 min read

Unlock Loki v2.3.0: Custom Retention, Deletion & Recording Rules Explained

DevOps

Aug 11, 2021 · Operations

Introduction to Chaos Engineering – Part 2: Four Steps for Disrupting Complex Systems

This article explains that chaos engineering is not a magic cure but a disciplined practice for testing distributed systems by designing and running controlled experiments, outlining four essential steps—observability, defining steady state, hypothesizing events, and executing experiments—to gain confidence in system resilience.

ObservabilityOperationschaos engineering

0 likes · 11 min read

Introduction to Chaos Engineering – Part 2: Four Steps for Disrupting Complex Systems

ITFLY8 Architecture Home

Aug 9, 2021 · Operations

How Liulishuo Scaled Its Unified Monitoring Platform for Billions of Users

This article examines the evolution of online education, introduces Liulishuo's massive English‑learning platform, and details the technical challenges, design choices, and architecture of its cloud‑native unified monitoring system that handles tens of terabytes of data daily.

ObservabilityPrometheusSLS

0 likes · 13 min read

How Liulishuo Scaled Its Unified Monitoring Platform for Billions of Users

Code Ape Tech Column

Jul 27, 2021 · Cloud Native

Understanding Loki: Advantages, Architecture, Installation, and Query Practices

This article explains Loki's low‑index overhead, concurrent query handling, tag‑based indexing, component roles, read/write paths, step‑by‑step installation of Promtail and Loki, label matching techniques, dynamic‑tag handling, high‑cardinality concerns, and query optimization strategies for cloud‑native log aggregation.

Cloud NativeLokiObservability

0 likes · 13 min read

Understanding Loki: Advantages, Architecture, Installation, and Query Practices

Tencent Cloud Developer

Jul 22, 2021 · Operations

Observability in Serverless Environments: Monitoring, Logging, Distributed Tracing, and Best Practices

In this talk, Gal Bashan explains how serverless architectures complicate observability and why metrics, logs, and especially distributed tracing with tools like OpenTelemetry, Jaeger, or commercial platforms are essential for gaining end-to-end visibility, automating instrumentation, and maintaining reliable, business-focused services across cloud providers.

Cloud NativeDistributed TracingObservability

0 likes · 12 min read

Observability in Serverless Environments: Monitoring, Logging, Distributed Tracing, and Best Practices

Alibaba Cloud Native

Jul 19, 2021 · Operations

Scaling Distributed Observability: A Case Study of ARMS Front‑End Monitoring at a Kids Coding Platform

This article details how a rapidly growing Chinese children's programming platform tackled the complexity of distributed system observability by adopting SkyWalking, Prometheus, and Alibaba Cloud ARMS front‑end monitoring, achieving faster fault detection, reduced operational workload, and improved user experience.

ARMSDistributed SystemsMicroservices

0 likes · 12 min read

Scaling Distributed Observability: A Case Study of ARMS Front‑End Monitoring at a Kids Coding Platform

High Availability Architecture

Jul 15, 2021 · Operations

Baidu Game Microservice Monitoring Practice and System Design

This article describes Baidu's comprehensive approach to monitoring game microservices, covering the background, initial monitoring tools, evolution of the monitoring system, systematic design for risk control, intelligent detection, alarm optimization, efficient fault localization, and future outlook for high‑availability architecture.

BaiduGame DevelopmentMicroservices

0 likes · 13 min read

Baidu Game Microservice Monitoring Practice and System Design

Top Architect

Jul 7, 2021 · Backend Development

Design and Implementation of a High‑Concurrency API Gateway

This article details the architecture and implementation of a high‑concurrency API gateway built on RxNetty, covering request routing, conditional routing, API management, rate limiting, circuit breaking, security policies, monitoring, tracing, and future enhancements within a microservices environment.

MicroservicesObservabilityapi-gateway

0 likes · 11 min read

Design and Implementation of a High‑Concurrency API Gateway

Baidu Geek Talk

Jul 5, 2021 · Operations

Automated and Intelligent Analysis of Baidu Search Stability Issues

The team automated Baidu Search fault diagnosis by building a side‑index for instant log lookup, streaming incremental analysis, exhaustive rule templates, feature‑engineering pipelines, query‑scene reconstruction, entropy‑based ranking, per‑second timeline views, and chaos‑engineered fault injection, achieving near‑99% accuracy and second‑level, module‑granular stability tracing.

ObservabilitySearch Stabilitychaos engineering

0 likes · 15 min read

Automated and Intelligent Analysis of Baidu Search Stability Issues

Cloud Native Technology Community

Jul 2, 2021 · Cloud Native

Unpacking the CNCF Cloud Native Landscape: A Layer‑by‑Layer Guide

This comprehensive guide breaks down the CNCF cloud native landscape into its four core layers—Provisioning, Runtime, Orchestration & Management, and Application Definition—explaining the problems each layer solves, the key technologies involved, and how they interoperate to enable modern, scalable applications.

CNCFCloud NativeObservability

0 likes · 60 min read

Unpacking the CNCF Cloud Native Landscape: A Layer‑by‑Layer Guide

Programmer DD

Jul 1, 2021 · Operations

Why Loki Beats Elasticsearch: Low Index Overhead, Fast Queries, and Easy Setup

This article explains Loki's advantages over Elasticsearch, including low indexing overhead, concurrent query processing with caching, seamless integration with Prometheus and Grafana, detailed architecture components, installation steps, label handling, high‑cardinality challenges, and best practices for efficient log management.

ElasticsearchGrafanaLoki

0 likes · 15 min read

Why Loki Beats Elasticsearch: Low Index Overhead, Fast Queries, and Easy Setup