Tagged articles
969 articles
Page 8 of 10
Shopee Tech Team
Shopee Tech Team
Jun 10, 2022 · Mobile Development

MDAP Stack Symbolization Service: Architecture, Implementation, and Optimization

The MDAP Stack Symbolization Service unifies high‑throughput address‑and symbol‑based stack resolution for iOS, Android native, Android Java, Web and React Native by parsing dSYM/ELF files and source‑map or ProGuard mappings, caching results in Redis (with RocksDB fallback), and exposing a gRPC API for fast, scalable de‑obfuscation.

DWARFObservabilitySourceMap
0 likes · 49 min read
MDAP Stack Symbolization Service: Architecture, Implementation, and Optimization
Top Architect
Top Architect
Jun 9, 2022 · Backend Development

Microservice Architecture and Design Patterns Overview

This article provides a comprehensive overview of microservice architecture, covering its core goals, design principles, various decomposition and integration patterns, database strategies, observability, resilience, deployment, and operational concerns, offering practical guidance for building scalable, maintainable services.

ArchitectureBackendDeployment
0 likes · 18 min read
Microservice Architecture and Design Patterns Overview
IT Architects Alliance
IT Architects Alliance
Jun 8, 2022 · Backend Development

Mastering Microservice Patterns: From Decomposition to Resilience

This article provides a comprehensive overview of common microservice patterns and design principles, covering goals such as cost reduction, faster releases, resilience, visibility, and detailing decomposition, integration, database, CQRS, observability, health‑check, and deployment strategies for building robust backend systems.

ArchitectureBlue‑Green deploymentCQRS
0 likes · 20 min read
Mastering Microservice Patterns: From Decomposition to Resilience
Alibaba Cloud Developer
Alibaba Cloud Developer
Jun 8, 2022 · Fundamentals

eBPF Explained: Core Concepts, Use Cases, and Best Practices

eBPF is a kernel‑level sandbox technology that enables safe, high‑performance, programmable instrumentation for networking, security, and observability, and this article answers seven key questions covering its definition, applications, origins, usage steps, implementation details, best practices, and current ecosystem.

Kernel InstrumentationLinuxObservability
0 likes · 21 min read
eBPF Explained: Core Concepts, Use Cases, and Best Practices
Architect
Architect
Jun 6, 2022 · Backend Development

Microservice Architecture and Design Patterns

This article provides a comprehensive overview of microservice architecture, detailing its core objectives, design principles, various decomposition and integration patterns, database strategies, consistency mechanisms, observability techniques, and deployment practices for building resilient, scalable backend systems.

ArchitectureMicroservicesObservability
0 likes · 18 min read
Microservice Architecture and Design Patterns
Efficient Ops
Efficient Ops
May 24, 2022 · Cloud Native

How AutoTagging and MultistageCodec Transform Cloud‑Native Observability

This article explores the challenges of building a unified observability data platform for hybrid‑cloud microservices, examines six common data‑island scenarios, and presents DeepFlow's AutoTagging and MultistageCodec techniques that dramatically reduce tagging overhead and storage costs while enabling seamless cross‑data correlation.

ClickHouseMicroservicesObservability
0 likes · 11 min read
How AutoTagging and MultistageCodec Transform Cloud‑Native Observability
Snowball Engineer Team
Snowball Engineer Team
May 24, 2022 · Cloud Native

How Snowball Used Apache APISIX to Build a Dual‑Active Architecture and Streamline Authentication

This article details Snowball's transition from a single‑datacenter setup to a dual‑active, cloud‑native architecture using Apache APISIX, covering background challenges, problem analysis, gateway selection, architectural adjustments, authentication unification, observability enhancements, ZooKeeper integration, and future plans.

Apache APISIXAuthenticationCloud Native
0 likes · 11 min read
How Snowball Used Apache APISIX to Build a Dual‑Active Architecture and Streamline Authentication
Programmer DD
Programmer DD
May 16, 2022 · Cloud Native

Master Loki: Scalable Log Aggregation for Kubernetes and Prometheus

This guide introduces Loki, the open‑source, horizontally scalable log aggregation system optimized for Prometheus and Kubernetes, covering its core concepts, architecture, components, deployment steps, Grafana integration, label‑based indexing, and best practices for handling dynamic and high‑cardinality tags.

GrafanaKubernetesLoki
0 likes · 19 min read
Master Loki: Scalable Log Aggregation for Kubernetes and Prometheus
Alibaba Cloud Native
Alibaba Cloud Native
May 11, 2022 · Cloud Native

How Zuoyebang Cut 22% Costs with Kubernetes Serverless Virtual Nodes

Zuoyebang’s shift to cloud‑native architecture leveraged Alibaba Cloud’s Kubernetes Serverless virtual nodes, achieving a 22.5% cost reduction during peak traffic by dynamically scaling workloads, while addressing scheduling, observability, and performance challenges through custom schedulers, enhanced monitoring, and careful testing.

Cloud NativeCost OptimizationKubernetes
0 likes · 11 min read
How Zuoyebang Cut 22% Costs with Kubernetes Serverless Virtual Nodes
Tencent Cloud Developer
Tencent Cloud Developer
May 7, 2022 · Cloud Native

Fourth Techo TVP Developer Conference: Cloud Native Trends and Best Practices

The Fourth Techo TVP Developer Conference highlighted current cloud‑native adoption, FinOps cost‑optimization, distributed‑cloud strategies, and maturity models on Day 1, then showcased practical best‑practice case studies—from automotive edge computing to service‑mesh migration, hybrid‑cloud PaaS evolution, observability standards, and high‑performance API‑gateway deployments—on Day 2.

APISIXCloud NativeDevOps
0 likes · 33 min read
Fourth Techo TVP Developer Conference: Cloud Native Trends and Best Practices
Efficient Ops
Efficient Ops
Apr 27, 2022 · Operations

Why Choose Loki Over ELK? A Practical Guide to Scalable Log Aggregation

This article explains the motivations for selecting Grafana Loki instead of traditional ELK/EFK stacks, introduces Loki's core concepts and architecture, details component roles, provides step‑by‑step deployment of Promtail and Loki, and demonstrates how to configure and query logs in Grafana while addressing label indexing, dynamic tags, high‑cardinality challenges, and query performance.

GrafanaKubernetesLoki
0 likes · 18 min read
Why Choose Loki Over ELK? A Practical Guide to Scalable Log Aggregation
JD Retail Technology
JD Retail Technology
Apr 27, 2022 · Industry Insights

How JD Achieves Seamless Stability During Massive Sales Events

The article reviews the Global Information System Stability Summit and JD's technical architect Li Junliang's detailed case study on the engineering practices, observability, chaos engineering, and resource‑scheduling innovations that enable JD’s e‑commerce platform to handle sales‑peak traffic that spikes hundreds of times over normal load.

Industry InsightsObservabilitychaos engineering
0 likes · 7 min read
How JD Achieves Seamless Stability During Massive Sales Events
Top Architect
Top Architect
Apr 27, 2022 · Backend Development

Comprehensive Guide to Backend Architecture: Microservices, Service Mesh, Observability, and Messaging

This article provides an in‑depth overview of modern backend architecture, covering microservice fundamentals, service mesh concepts, observability pillars, messaging queue choices, and practical design considerations such as service registration, configuration centers, and security mechanisms.

MessagingMicroservicesObservability
0 likes · 28 min read
Comprehensive Guide to Backend Architecture: Microservices, Service Mesh, Observability, and Messaging
Volcano Engine Developer Services
Volcano Engine Developer Services
Apr 26, 2022 · Operations

How Volcano Engine’s TLS Transforms Log Management for Kubernetes at Scale

This article explains the challenges of traditional open‑source log collection in cloud‑native environments, describes Volcano Engine’s unified TLS architecture, its centralized configuration, CRD‑based deployment, and showcases real‑world case studies that demonstrate improved availability, efficiency, and scalability.

Cloud NativeDistributed SystemsKubernetes
0 likes · 15 min read
How Volcano Engine’s TLS Transforms Log Management for Kubernetes at Scale
dbaplus Community
dbaplus Community
Apr 25, 2022 · Operations

From Monitoring to Observability: Expert Insights on Evolving Cloud‑Native Operations

In this interview series, three industry experts explain how monitoring differs from observability, the shifts required for ops, developers, and architects, the core methodologies and technologies behind metrics, traces, and logs, and practical guidance for selecting and integrating observability tools in cloud‑native environments.

MetricsObservabilityOperations
0 likes · 16 min read
From Monitoring to Observability: Expert Insights on Evolving Cloud‑Native Operations
MaGe Linux Operations
MaGe Linux Operations
Apr 22, 2022 · Backend Development

Essential Microservice Patterns: Decomposition, Integration & Observability

This article outlines the key microservice design patterns—including decomposition, integration, event‑driven, saga, and observability techniques—while explaining their goals, principles, and practical considerations such as database per service, CQRS, and cross‑cutting concerns like health checks and circuit breakers.

Backend ArchitectureDesign PatternsMicroservices
0 likes · 19 min read
Essential Microservice Patterns: Decomposition, Integration & Observability
Ops Development Stories
Ops Development Stories
Apr 21, 2022 · Cloud Native

Essential Kubernetes Production Checklist for Web Services

A comprehensive, step‑by‑step checklist guides teams through documentation, application design, security, CI/CD, Kubernetes configuration, monitoring, testing, and 24/7 support to reliably run web services with HTTP APIs in production on Kubernetes.

DevOpsKubernetesObservability
0 likes · 9 min read
Essential Kubernetes Production Checklist for Web Services
政采云技术
政采云技术
Apr 19, 2022 · Cloud Native

A Practical Guide to Dapr Core Features: Pub/Sub, Resource Bindings, Actors, Observability, Secrets, and Configuration

This comprehensive technical tutorial demonstrates how to implement and configure core Dapr features, including publish/subscribe messaging, resource bindings, virtual actors, distributed tracing, secrets management, and dynamic configuration, using Java applications deployed on Kubernetes with practical code examples and command-line instructions.

Cloud NativeDaprKubernetes
0 likes · 21 min read
A Practical Guide to Dapr Core Features: Pub/Sub, Resource Bindings, Actors, Observability, Secrets, and Configuration
YunZhu Net Technology Team
YunZhu Net Technology Team
Apr 15, 2022 · Operations

Design and Architecture of a Cloud‑Native Monitoring Platform for Business Systems

The document outlines the background, vision, current status, technical research, value, product and technical architecture, and functional design of a cloud‑native monitoring platform that integrates SkyWalking and Prometheus to provide comprehensive APM, resource utilization, alerting, and rapid fault localization for business and technical middle‑platform services.

APMMetricsObservability
0 likes · 10 min read
Design and Architecture of a Cloud‑Native Monitoring Platform for Business Systems
Alibaba Cloud Native
Alibaba Cloud Native
Apr 13, 2022 · Cloud Native

From Dapper to OpenTelemetry: A Practical Guide to Distributed Tracing and Observability

This article explains the challenges of long request chains in micro‑service architectures, reviews Google’s Dapper tracing requirements, introduces OpenTracing and OpenCensus standards, compares their strengths, and details how OpenTelemetry unifies tracing, metrics and logs with practical integration steps and best‑practice guidance.

Cloud NativeDistributed TracingMetrics
0 likes · 24 min read
From Dapper to OpenTelemetry: A Practical Guide to Distributed Tracing and Observability
DevOps
DevOps
Apr 12, 2022 · Operations

Understanding Observability: Core Concepts, SRE Methodology, AIOps, and Business Architecture

The article explains the rising importance of observability in modern operations, defines its control‑theory roots, breaks it down into metrics, traces and logs, and argues that successful implementation requires three pillars—SRE practices, AIOps algorithms, and deep business‑architecture knowledge—together with well‑designed SLOs and critical‑path mapping.

ObservabilitySREaiops
0 likes · 10 min read
Understanding Observability: Core Concepts, SRE Methodology, AIOps, and Business Architecture
Alibaba Cloud Native
Alibaba Cloud Native
Apr 3, 2022 · Cloud Native

How to Achieve Full Observability for Performance Testing with Prometheus

This guide explains the essential observability concepts—metrics, logs, and traces—for performance testing, compares Zabbix and Prometheus, shows how to extend JMeter with a Prometheus exporter, and details step‑by‑step integration of Alibaba Cloud PTS and Grafana dashboards for comprehensive monitoring.

Cloud NativeObservabilityPrometheus
0 likes · 9 min read
How to Achieve Full Observability for Performance Testing with Prometheus
SQB Blog
SQB Blog
Apr 2, 2022 · Operations

Designing a Next‑Gen Observability Platform: From Zipkin to Hera

This article chronicles the evolution of a company's monitoring system from a Zipkin‑based tracing solution to a cloud‑native observability platform called Hera, detailing design goals, technology choices, challenges with MySQL storage, and the adoption of Prometheus‑compatible metrics, Jaeger tracing, and Kubernetes operators.

Distributed TracingObservabilityPrometheus
0 likes · 22 min read
Designing a Next‑Gen Observability Platform: From Zipkin to Hera
Aikesheng Open Source Community
Aikesheng Open Source Community
Mar 29, 2022 · Databases

Performance Tuning and Observation Techniques for dble Using BenchmarkSQL

This article shares practical configuration recommendations, system‑resource monitoring methods, and thread‑adjustment strategies for optimizing dble performance during BenchmarkSQL TPC‑C style load testing, highlighting how observable metrics guide effective tuning of the middleware and underlying MySQL nodes.

BenchmarkSQLObservabilitythread optimization
0 likes · 10 min read
Performance Tuning and Observation Techniques for dble Using BenchmarkSQL
StarRocks
StarRocks
Mar 28, 2022 · Backend Development

Scaling Microservice Tracing with Zipkin and StarRocks: A Practical Guide

This article explains how Sohu Smart Media built a high‑performance tracing system for microservices by integrating Zipkin for data collection with StarRocks for storage and analytics, covering architecture, data models, SQL queries, Flink processing, and real‑world results that boost observability and engineering efficiency.

FlinkMicroservicesObservability
0 likes · 31 min read
Scaling Microservice Tracing with Zipkin and StarRocks: A Practical Guide
Open Source Linux
Open Source Linux
Mar 18, 2022 · Operations

Evolution of Open‑Source Monitoring Tools: From Nagios to Prometheus

This article traces the development of open‑source monitoring solutions from early tools like Nagios and Cacti through modern platforms such as Prometheus and Nightingale, comparing their strengths, weaknesses, and typical use cases while also looking ahead to emerging observability trends in cloud‑native environments.

NagiosObservabilityOperations
0 likes · 14 min read
Evolution of Open‑Source Monitoring Tools: From Nagios to Prometheus
Open Source Linux
Open Source Linux
Mar 8, 2022 · Operations

Master Kubernetes Troubleshooting: The Three Pillars Every Engineer Needs

This article breaks down Kubernetes troubleshooting into three essential steps—understanding the failure, managing the response, and preventing recurrence—while mapping key monitoring, observability, and incident‑response tools to each phase for reliable cloud‑native operations.

KubernetesObservabilityOperations
0 likes · 8 min read
Master Kubernetes Troubleshooting: The Three Pillars Every Engineer Needs
政采云技术
政采云技术
Mar 1, 2022 · Cloud Native

Introduction to Dapr: Features, Architecture, and Installation Guide

This article introduces Dapr, a cloud‑native sidecar runtime for building resilient microservices, explains its core features such as service invocation, state management, pub/sub, bindings, actors, observability, and secrets, and provides step‑by‑step installation instructions for CLI, binaries, Kubernetes, and Helm.

Cloud NativeDaprInstallation
0 likes · 10 min read
Introduction to Dapr: Features, Architecture, and Installation Guide
Alibaba Cloud Native
Alibaba Cloud Native
Feb 28, 2022 · Cloud Native

How to Observe and Diagnose DNS Failures in Kubernetes Clusters

This article explains how DNS operates inside Kubernetes, enumerates common failure causes, describes CoreDNS's built‑in observability plugins, introduces BPF‑based client‑side diagnostics, and provides a step‑by‑step troubleshooting workflow to identify and resolve DNS issues in cloud‑native environments.

BPFCoreDNSDNS
0 likes · 18 min read
How to Observe and Diagnose DNS Failures in Kubernetes Clusters
21CTO
21CTO
Feb 24, 2022 · Backend Development

42 Hard‑Earned Lessons for Building Reliable Production Databases

This article translates Mahesh Balakrishnan’s 42‑point guide on building production databases, covering customer focus, project management, design principles, code review practices, strategy, observability, and research, offering concrete advice for engineers and teams creating robust backend systems.

Code reviewObservabilityProduction Systems
0 likes · 12 min read
42 Hard‑Earned Lessons for Building Reliable Production Databases
Laravel Tech Community
Laravel Tech Community
Feb 20, 2022 · Backend Development

Highlights of .NET 7 Preview 1: Nullable Annotations, Observability, Code Generation, and New APIs

The article outlines the major features of .NET 7 Preview 1, including nullable annotations for Microsoft.Extensions libraries, enhancements to tracing APIs, code‑generation improvements, dynamic PGO and Arm64 support, p/invoke source generation, new System.Text.Json APIs, and expanded hot‑reload capabilities.

Nullable AnnotationsObservabilitycode-generation
0 likes · 5 min read
Highlights of .NET 7 Preview 1: Nullable Annotations, Observability, Code Generation, and New APIs
Ctrip Technology
Ctrip Technology
Feb 17, 2022 · Operations

Evolution and Architecture of the Hickwall Enterprise Monitoring Platform

The article details the background, challenges, multi‑year evolution, current architecture, and future roadmap of Hickwall, Ctrip's enterprise‑grade monitoring and observability platform, covering metrics, logs, traces, high‑cardinality handling, cloud‑native integration, alert governance, and storage engine migrations.

AlertingObservabilityOperations
0 likes · 15 min read
Evolution and Architecture of the Hickwall Enterprise Monitoring Platform
Baidu Tech Salon
Baidu Tech Salon
Jan 27, 2022 · Cloud Native

How China Unicom’s Service Mesh Evolved: From SDKs to Sidecars and Beyond

This article details China Unicom Software Research Institute's multi‑year journey of adopting Kubernetes‑based service mesh, outlining the evolution from SDK‑driven microservices to sidecar‑based architectures, migration strategies with Baidu, performance optimizations, observability enhancements, and future product roadmaps.

Cloud NativeIstioKubernetes
0 likes · 13 min read
How China Unicom’s Service Mesh Evolved: From SDKs to Sidecars and Beyond
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Jan 26, 2022 · Operations

Mastering Microservice Monitoring, Fault Tolerance, and Security: A Complete Guide

This article explains how to monitor microservice architectures, describes log, tracing, and metric monitoring, compares open‑source tracing tools, outlines fault‑tolerance strategies such as timeout, rate‑limiting, degradation, async buffering and circuit breaking, and details access‑security mechanisms including gateway authentication, service‑side auth, and OAuth2.0 token flows, while also introducing container technology and its role in microservice deployment.

ContainersMicroservicesObservability
0 likes · 43 min read
Mastering Microservice Monitoring, Fault Tolerance, and Security: A Complete Guide
MaGe Linux Operations
MaGe Linux Operations
Jan 22, 2022 · Cloud Native

Boost Kubernetes Monitoring: Migrate from Prometheus to Thanos for Scalable Low‑Cost Metrics

This article examines the limitations of a standard Prometheus‑based monitoring stack on Kubernetes, explains how adopting Thanos improves metric retention and reduces infrastructure costs, and provides a detailed multi‑cluster deployment guide with Terraform, TLS configuration, and Grafana visualization.

KubernetesObservabilityPrometheus
0 likes · 16 min read
Boost Kubernetes Monitoring: Migrate from Prometheus to Thanos for Scalable Low‑Cost Metrics
Efficient Ops
Efficient Ops
Jan 20, 2022 · Operations

Mastering Prometheus Metrics: Best Practices for Effective Monitoring

This article outlines practical guidelines for designing Prometheus metrics, covering how to define monitoring targets, choose appropriate vectors and labels, name metrics and labels correctly, select histogram buckets, and leverage Grafana features to visualize and troubleshoot data effectively.

GrafanaMetricsObservability
0 likes · 11 min read
Mastering Prometheus Metrics: Best Practices for Effective Monitoring
Baidu Geek Talk
Baidu Geek Talk
Jan 12, 2022 · Backend Development

Serverless Architecture Evolution: Baidu Search Content Platform's FaaS and Intelligent Transformation

Baidu’s search content platform transitioned to a serverless, FaaS‑based architecture with intelligent scheduling and automated control, cutting resource waste by 87%, boosting automatic recovery to 96.7%, and delivering roughly tenfold productivity gains across development, deployment, and maintenance while simplifying scalability and high‑availability concerns.

FaaSIntelligent SchedulingObservability
0 likes · 27 min read
Serverless Architecture Evolution: Baidu Search Content Platform's FaaS and Intelligent Transformation
Java High-Performance Architecture
Java High-Performance Architecture
Jan 12, 2022 · Cloud Native

Mastering Service Mesh with Istio: A Hands‑On Guide to Traffic, Security, and Observability

This tutorial explains the fundamentals of service mesh, explores Istio’s architecture and core components, and provides step‑by‑step instructions for installing Istio on Kubernetes, deploying a sample microservice application, and leveraging traffic management, mutual TLS, observability, and advanced use cases such as routing, circuit breaking, and JWT‑based access control.

IstioKubernetesObservability
0 likes · 22 min read
Mastering Service Mesh with Istio: A Hands‑On Guide to Traffic, Security, and Observability
IT Architects Alliance
IT Architects Alliance
Jan 7, 2022 · Cloud Native

Introduction to Service Mesh and Istio: Concepts, Architecture, and Practical Deployment

This tutorial explains the fundamentals of service mesh, outlines Istio’s architecture and core components, demonstrates how to install and configure Istio on Kubernetes, and showcases common use cases such as traffic management, security, observability, and alternatives, providing a comprehensive guide for modern micro‑service deployments.

IstioMicroservicesObservability
0 likes · 18 min read
Introduction to Service Mesh and Istio: Concepts, Architecture, and Practical Deployment
Architect
Architect
Jan 5, 2022 · Cloud Native

Introduction to Service Mesh and Istio: Concepts, Architecture, and Hands‑On Guide

This tutorial explains the fundamentals of service mesh, outlines Istio’s architecture and core components, demonstrates how to install Istio on Kubernetes, and walks through practical examples such as traffic routing, security policies, observability, and common use‑cases, while also comparing alternative solutions.

IstioKubernetesMicroservices
0 likes · 20 min read
Introduction to Service Mesh and Istio: Concepts, Architecture, and Hands‑On Guide
Tencent Cloud Developer
Tencent Cloud Developer
Dec 23, 2021 · Cloud Native

An Overview of OpenTelemetry: Origins, Architecture, and Instrumentation

OpenTelemetry unifies tracing, metrics, and logs by merging OpenTracing and OpenCensus into a cross‑language specification, collector, language SDKs, and instrumentation libraries, offering vendor‑agnostic, low‑maintenance telemetry collection that separates data gathering from business logic while requiring external back‑ends for storage and analysis.

Cloud NativeCollectorInstrumentation
0 likes · 10 min read
An Overview of OpenTelemetry: Origins, Architecture, and Instrumentation
Qingyun Technology Community
Qingyun Technology Community
Dec 22, 2021 · Cloud Native

What’s New in KubeSphere 3.2.1? Key Features, Fixes, and Upgrade Guide

Version 3.2.1 of the open‑source KubeSphere platform introduces a series of enhancements—including container group status filtering, improved image builder dialogs, expanded quota visibility, numerous UI bug fixes, and updated DevOps pipelines—alongside detailed installation and upgrade instructions for Linux and Kubernetes environments.

Cloud NativeKubeSphereKubernetes
0 likes · 8 min read
What’s New in KubeSphere 3.2.1? Key Features, Fixes, and Upgrade Guide
21CTO
21CTO
Dec 20, 2021 · Cloud Native

Why Cloud‑Native Architecture Is the Future of SaaS and How to Implement It

This article explains what cloud‑native architecture is, why it is essential for modern SaaS businesses, and provides a step‑by‑step guide—including serverless migration, elasticity, observability, resilience, and automation—on how to adopt it using Alibaba Cloud SAE and related services.

MicroservicesObservabilitySaaS
0 likes · 22 min read
Why Cloud‑Native Architecture Is the Future of SaaS and How to Implement It
Java Architecture Diary
Java Architecture Diary
Dec 13, 2021 · Backend Development

Essential Java & Cloud Native Resources: From JDK 17 to GraalVM, Spring & More

This curated collection gathers essential articles and tutorials covering Java 8‑17 updates, GraalVM performance tricks, Spring Native adoption, Spring Cloud and RSocket alternatives, GraphQL frameworks, observability stacks like Grafana, Prometheus and Loki, IDE enhancements, database fundamentals, and low‑code platform building, providing a comprehensive knowledge base for modern backend developers.

Observabilitydatabasesgraalvm
0 likes · 4 min read
Essential Java & Cloud Native Resources: From JDK 17 to GraalVM, Spring & More
Tencent Cloud Middleware
Tencent Cloud Middleware
Dec 9, 2021 · Cloud Native

Why Observability Is the Missing Piece for Day‑2 Success in Cloud‑Native and Serverless Systems

The article explains how observability—through logs, metrics, and traces—transforms the opaque, complex day‑2 operations of micro‑service, Kubernetes, and serverless environments into a deterministic, diagnosable system, highlighting OpenTelemetry, practical collection methods, and real‑world implementation challenges and benefits.

ObservabilityOpenTelemetryServerless
0 likes · 17 min read
Why Observability Is the Missing Piece for Day‑2 Success in Cloud‑Native and Serverless Systems
Alibaba Cloud Native
Alibaba Cloud Native
Dec 7, 2021 · Cloud Native

Unlocking the Third Way of Distributed Tracing: Post‑Aggregation Link Analysis Explained

This article introduces the third, post‑aggregation approach to link tracing—link analysis—showing how real‑time aggregation of stored trace data can quickly pinpoint uneven traffic, single‑machine failures, slow interfaces, business‑level traffic shifts, and gray‑release anomalies while outlining its practical constraints.

APMCloud NativeLink Analysis
0 likes · 11 min read
Unlocking the Third Way of Distributed Tracing: Post‑Aggregation Link Analysis Explained
Laravel Tech Community
Laravel Tech Community
Dec 2, 2021 · Cloud Native

New Features in Apache APISIX 2.11.0: LDAP Authentication, Observability Plugins, Azure Functions, and WASM Support

Apache APISIX 2.11.0 adds an LDAP‑based authentication plugin, expands observability with Datadog and SkyWalking plugins, introduces Azure Functions integration, provides early WASM support, and enhances existing plugins, all illustrated with detailed configuration examples and code snippets.

Azure FunctionsLDAPObservability
0 likes · 8 min read
New Features in Apache APISIX 2.11.0: LDAP Authentication, Observability Plugins, Azure Functions, and WASM Support
GrowingIO Tech Team
GrowingIO Tech Team
Dec 2, 2021 · Cloud Native

Mastering Chaos Mesh: A Hands‑On Guide to Cloud‑Native Chaos Engineering

Chaos Mesh is an open‑source cloud‑native chaos engineering platform that lets you experiment with fault injection across Kubernetes environments, offering visual dashboards, extensive fault types, and step‑by‑step installation and experiment creation guides to help teams uncover system weaknesses and improve resilience.

Chaos MeshFault InjectionKubernetes
0 likes · 12 min read
Mastering Chaos Mesh: A Hands‑On Guide to Cloud‑Native Chaos Engineering
Baidu Geek Talk
Baidu Geek Talk
Nov 24, 2021 · Operations

How Baidu’s Fengjing Uses Holographic Logs to Debug Massive Microservices

Baidu’s Fengjing monitoring platform tackles the daunting challenge of pinpointing failures in its massive Java‑based microservice ecosystem by employing a non‑intrusive probe that captures log metadata, stores it in a database, and reconstructs full request‑level logs with minimal storage overhead.

Distributed TracingMicroservicesObservability
0 likes · 9 min read
How Baidu’s Fengjing Uses Holographic Logs to Debug Massive Microservices
Efficient Ops
Efficient Ops
Nov 16, 2021 · Operations

How to Build a Scalable Prometheus Monitoring System with Thanos on Kubernetes

This article explains why monitoring is essential for production stability, compares white‑box and black‑box approaches, and provides a step‑by‑step guide to deploying Prometheus, configuring scrape targets, using Pushgateway and Alertmanager, and scaling the solution with Thanos in a Kubernetes environment.

AlertmanagerObservabilityPrometheus
0 likes · 21 min read
How to Build a Scalable Prometheus Monitoring System with Thanos on Kubernetes
Code Ape Tech Column
Code Ape Tech Column
Nov 15, 2021 · Operations

A Comprehensive Guide to Using Apache SkyWalking for Distributed Tracing, Logging, and Performance Analysis

This article introduces Apache SkyWalking as a powerful open‑source APM solution, compares it with Spring Cloud Sleuth+ZipKin, explains its architecture, walks through server and client setup, data persistence, log collection, performance profiling, alert configuration, and provides practical code snippets and configuration examples.

Distributed TracingObservabilitySkyWalking
0 likes · 14 min read
A Comprehensive Guide to Using Apache SkyWalking for Distributed Tracing, Logging, and Performance Analysis
Open Source Linux
Open Source Linux
Oct 31, 2021 · Operations

Designing Effective Metrics: From Requirements to Labels and Buckets

This guide explains how to define, name, and organize monitoring metrics—covering Google’s four golden indicators, system‑specific measurement objects, vector selection, label conventions, bucket design, and practical Grafana tips—for reliable observability of diverse services.

MetricsObservabilitylabeling
0 likes · 10 min read
Designing Effective Metrics: From Requirements to Labels and Buckets
Top Architect
Top Architect
Oct 17, 2021 · Cloud Native

How Redis Simplifies Microservice Design Patterns, Distributed Transactions, and Observability

This article explains how Redis can be used to implement and simplify a wide range of microservice design patterns—including bounded contexts, asynchronous messaging, orchestrated sagas, transaction inboxes, telemetry, event sourcing, CQRS, and shared data—while improving performance, scalability, and observability in cloud‑native architectures.

CQRSCloud NativeMicroservices
0 likes · 16 min read
How Redis Simplifies Microservice Design Patterns, Distributed Transactions, and Observability
Alibaba Cloud Native
Alibaba Cloud Native
Oct 10, 2021 · Cloud Native

How to Detect Service and Workload Anomalies in Kubernetes with Advanced Monitoring

This article explains the common pain points of locating anomalies in Kubernetes environments and presents a multi‑layer monitoring framework—trace, metrics, events, and alerts—along with best‑practice scenarios such as network performance, DNS issues, full‑link stress testing, external MySQL access, and multi‑tenant architectures.

DNSKubernetesMetrics
0 likes · 20 min read
How to Detect Service and Workload Anomalies in Kubernetes with Advanced Monitoring
21CTO
21CTO
Sep 27, 2021 · Cloud Native

Why Loki Beats ELK for Kubernetes Logging: Architecture, Deployment, and Query Guide

This article explains the motivation behind choosing Loki over heavyweight ELK/EFK stacks for container‑cloud logging, outlines Loki's lightweight architecture and components, provides step‑by‑step deployment instructions on OpenShift/Kubernetes, and demonstrates how to query logs using the LogQL language and HTTP API.

Cloud NativeKubernetesLogQL
0 likes · 17 min read
Why Loki Beats ELK for Kubernetes Logging: Architecture, Deployment, and Query Guide
21CTO
21CTO
Sep 26, 2021 · Backend Development

How Baidu’s Hulk Framework Accelerates Go Service Development

The Hulk framework, built on GDP2, provides a business‑oriented Go web development platform with out‑of‑the‑box components, standardized architecture, rich observability, and tooling that together improve code quality, development speed, and SRE efficiency for large‑scale short‑video services.

BackendFrameworkGo
0 likes · 18 min read
How Baidu’s Hulk Framework Accelerates Go Service Development
Top Architect
Top Architect
Sep 24, 2021 · Cloud Native

Loki Log System Overview, Architecture, and Deployment Guide

This article introduces Loki, a lightweight log aggregation system for Kubernetes, explains its background and motivations, details its simple architecture and core components (Distributor, Ingester, Querier), discusses scalability and storage options, and provides step‑by‑step deployment instructions with example YAML and shell commands.

Cloud NativeDeploymentKubernetes
0 likes · 16 min read
Loki Log System Overview, Architecture, and Deployment Guide
IT Architects Alliance
IT Architects Alliance
Sep 20, 2021 · Operations

Why Loki Beats ELK for Kubernetes Logging: Architecture and Deployment Guide

This article explains the motivations behind choosing Loki over ELK for container‑cloud logging, details Loki's lightweight architecture—including Distributor, Ingester, and Querier components—covers deployment steps on OpenShift/Kubernetes with YAML manifests, and demonstrates LogQL query syntax for efficient log retrieval.

KubernetesLogQLLoki
0 likes · 18 min read
Why Loki Beats ELK for Kubernetes Logging: Architecture and Deployment Guide
Alibaba Cloud Native
Alibaba Cloud Native
Sep 16, 2021 · Cloud Native

How to Use Kubernetes Monitoring for End-to-End Application Architecture Exploration

This session explains why Kubernetes monitoring is essential for end-to-end observability, describes the five data sources and layers it covers, and walks through discovering and locating architecture, performance, resource, scheduling, and network issues using topology, anomaly detection, and correlation techniques.

ArchitectureCloud NativeKubernetes
0 likes · 13 min read
How to Use Kubernetes Monitoring for End-to-End Application Architecture Exploration
IT Architects Alliance
IT Architects Alliance
Sep 15, 2021 · Backend Development

Comprehensive Guide to Backend Architecture: Microservices, Service Mesh, Messaging, and Observability

This article provides a detailed overview of modern backend architecture, covering microservice fundamentals, design principles such as Conway's Law and DDD, gateway patterns, communication protocols, service registration, configuration management, observability pillars, service mesh options, and a comparative analysis of popular message‑queue technologies.

MicroservicesObservabilitybackend-architecture
0 likes · 27 min read
Comprehensive Guide to Backend Architecture: Microservices, Service Mesh, Messaging, and Observability
HomeTech
HomeTech
Sep 15, 2021 · Backend Development

How ASF Simplifies gRPC‑to‑Go Migration and Boosts Service Governance

This article explains the AutoHome Service Framework (ASF), its architecture, how it enables seamless migration from gRPC to Go services, the added Dubbo‑go support, configuration optimizations, advanced load‑balancing strategies, observability enhancements, and future plans for adaptive balancing and zero‑downtime deployments.

GoMicroservicesObservability
0 likes · 18 min read
How ASF Simplifies gRPC‑to‑Go Migration and Boosts Service Governance
Dada Group Technology
Dada Group Technology
Sep 10, 2021 · Operations

Design and Implementation of JD Daojia Log System Based on Loki

This document details the motivation, architecture, components, query language, and deployment of a Loki‑based log collection and analysis platform for JD Daojia, comparing it with ELK, describing ingestion, real‑time and historical log handling, technical challenges, configuration examples, and future scaling plans.

GrafanaLog ManagementLoki
0 likes · 15 min read
Design and Implementation of JD Daojia Log System Based on Loki
Baidu Intelligent Testing
Baidu Intelligent Testing
Sep 9, 2021 · Cloud Native

Observability Practices in Baidu Search Platform: Real‑time Metrics, Tracing, Logging, and Topology at Hundred‑Billion Scale

This article explains how Baidu's search middle‑platform adopts cloud‑native observability—covering metrics, distributed tracing, log querying, and topology analysis—to ensure high availability, performance, and controllability for a system handling hundreds of billions of requests across millions of micro‑service instances.

Observabilityloggingtopology
0 likes · 12 min read
Observability Practices in Baidu Search Platform: Real‑time Metrics, Tracing, Logging, and Topology at Hundred‑Billion Scale
Efficient Ops
Efficient Ops
Sep 5, 2021 · Operations

Why Prometheus’s TSDB Makes Monitoring Scalable: A Deep Dive

This article explains how Prometheus’s time‑series database handles massive monitoring data, illustrates practical query examples, and shows why its storage engine and pre‑computation features enable efficient, high‑performance observability for large‑scale services.

ObservabilityPrometheusTSDB
0 likes · 8 min read
Why Prometheus’s TSDB Makes Monitoring Scalable: A Deep Dive
DevOps
DevOps
Aug 31, 2021 · Backend Development

Designing an Uber‑Like Microservice System with DDD, OpenTelemetry Observability, and Reinforced Chaos Engineering

This article describes how to model a complex Uber‑style ride‑hailing system using Domain‑Driven Design, implement it with Java Spring Boot microservices, instrument it with OpenTelemetry for full observability, and validate the observability pipeline through a gamified chaos‑engineering approach that reduces MTTR.

DDDMicroservicesObservability
0 likes · 13 min read
Designing an Uber‑Like Microservice System with DDD, OpenTelemetry Observability, and Reinforced Chaos Engineering
Open Source Linux
Open Source Linux
Aug 26, 2021 · Cloud Native

Why Switch from Prometheus to Thanos? Boost Metric Retention & Cut Costs

This article explains the limitations of a traditional Prometheus‑based monitoring stack for Kubernetes, demonstrates how integrating Thanos improves metric retention, scalability, and storage cost, and provides a complete multi‑cluster deployment example with Terraform and Helm configurations.

Cloud NativeKubernetesObservability
0 likes · 15 min read
Why Switch from Prometheus to Thanos? Boost Metric Retention & Cut Costs
21CTO
21CTO
Aug 20, 2021 · Backend Development

How the Hulk Framework Boosts Go Service Development and Operations

This article explains the background, design, components, ecosystem, and real‑world benefits of the Hulk Go web framework developed by the short‑video R&D team, showing how it improves development efficiency, code quality, performance, observability, and incident response for large‑scale microservices.

GoObservabilityWeb framework
0 likes · 19 min read
How the Hulk Framework Boosts Go Service Development and Operations
MaGe Linux Operations
MaGe Linux Operations
Aug 14, 2021 · Operations

Boost System Reliability: 4 Proven Practices to Master Observability

This article explains why observability is essential for DevOps, outlines four key practices—including production‑environment monitoring, structured logging, a DevOps‑focused culture, and pre‑deployment observability with remote debugging—to help teams detect, diagnose, and prevent issues throughout the software lifecycle.

CI/CDCultureDevOps
0 likes · 9 min read
Boost System Reliability: 4 Proven Practices to Master Observability
Java Architecture Diary
Java Architecture Diary
Aug 11, 2021 · Operations

Unlock Loki v2.3.0: Custom Retention, Deletion & Recording Rules Explained

Version 2.3.0 of Loki introduces enhanced features such as per‑tenant custom retention policies, time‑range log deletion via the Compactor API, Prometheus‑style recording rules, a new pattern parser for LogQL, ingestion sharding for faster queries, and advanced IP‑matching syntax, all aimed at improving storage efficiency, compliance, and observability.

Log ManagementLogQLLoki
0 likes · 9 min read
Unlock Loki v2.3.0: Custom Retention, Deletion & Recording Rules Explained
DevOps
DevOps
Aug 11, 2021 · Operations

Introduction to Chaos Engineering – Part 2: Four Steps for Disrupting Complex Systems

This article explains that chaos engineering is not a magic cure but a disciplined practice for testing distributed systems by designing and running controlled experiments, outlining four essential steps—observability, defining steady state, hypothesizing events, and executing experiments—to gain confidence in system resilience.

ObservabilityOperationschaos engineering
0 likes · 11 min read
Introduction to Chaos Engineering – Part 2: Four Steps for Disrupting Complex Systems
Code Ape Tech Column
Code Ape Tech Column
Jul 27, 2021 · Cloud Native

Understanding Loki: Advantages, Architecture, Installation, and Query Practices

This article explains Loki's low‑index overhead, concurrent query handling, tag‑based indexing, component roles, read/write paths, step‑by‑step installation of Promtail and Loki, label matching techniques, dynamic‑tag handling, high‑cardinality concerns, and query optimization strategies for cloud‑native log aggregation.

Cloud NativeLokiObservability
0 likes · 13 min read
Understanding Loki: Advantages, Architecture, Installation, and Query Practices
Tencent Cloud Developer
Tencent Cloud Developer
Jul 22, 2021 · Operations

Observability in Serverless Environments: Monitoring, Logging, Distributed Tracing, and Best Practices

In this talk, Gal Bashan explains how serverless architectures complicate observability and why metrics, logs, and especially distributed tracing with tools like OpenTelemetry, Jaeger, or commercial platforms are essential for gaining end-to-end visibility, automating instrumentation, and maintaining reliable, business-focused services across cloud providers.

Cloud NativeDistributed TracingObservability
0 likes · 12 min read
Observability in Serverless Environments: Monitoring, Logging, Distributed Tracing, and Best Practices
Alibaba Cloud Native
Alibaba Cloud Native
Jul 19, 2021 · Operations

Scaling Distributed Observability: A Case Study of ARMS Front‑End Monitoring at a Kids Coding Platform

This article details how a rapidly growing Chinese children's programming platform tackled the complexity of distributed system observability by adopting SkyWalking, Prometheus, and Alibaba Cloud ARMS front‑end monitoring, achieving faster fault detection, reduced operational workload, and improved user experience.

ARMSDistributed SystemsMicroservices
0 likes · 12 min read
Scaling Distributed Observability: A Case Study of ARMS Front‑End Monitoring at a Kids Coding Platform
High Availability Architecture
High Availability Architecture
Jul 15, 2021 · Operations

Baidu Game Microservice Monitoring Practice and System Design

This article describes Baidu's comprehensive approach to monitoring game microservices, covering the background, initial monitoring tools, evolution of the monitoring system, systematic design for risk control, intelligent detection, alarm optimization, efficient fault localization, and future outlook for high‑availability architecture.

BaiduGame DevelopmentMicroservices
0 likes · 13 min read
Baidu Game Microservice Monitoring Practice and System Design
Top Architect
Top Architect
Jul 7, 2021 · Backend Development

Design and Implementation of a High‑Concurrency API Gateway

This article details the architecture and implementation of a high‑concurrency API gateway built on RxNetty, covering request routing, conditional routing, API management, rate limiting, circuit breaking, security policies, monitoring, tracing, and future enhancements within a microservices environment.

MicroservicesObservabilityapi-gateway
0 likes · 11 min read
Design and Implementation of a High‑Concurrency API Gateway
Baidu Geek Talk
Baidu Geek Talk
Jul 5, 2021 · Operations

Automated and Intelligent Analysis of Baidu Search Stability Issues

The team automated Baidu Search fault diagnosis by building a side‑index for instant log lookup, streaming incremental analysis, exhaustive rule templates, feature‑engineering pipelines, query‑scene reconstruction, entropy‑based ranking, per‑second timeline views, and chaos‑engineered fault injection, achieving near‑99% accuracy and second‑level, module‑granular stability tracing.

ObservabilitySearch Stabilitychaos engineering
0 likes · 15 min read
Automated and Intelligent Analysis of Baidu Search Stability Issues
Programmer DD
Programmer DD
Jul 1, 2021 · Operations

Why Loki Beats Elasticsearch: Low Index Overhead, Fast Queries, and Easy Setup

This article explains Loki's advantages over Elasticsearch, including low indexing overhead, concurrent query processing with caching, seamless integration with Prometheus and Grafana, detailed architecture components, installation steps, label handling, high‑cardinality challenges, and best practices for efficient log management.

ElasticsearchGrafanaLoki
0 likes · 15 min read
Why Loki Beats Elasticsearch: Low Index Overhead, Fast Queries, and Easy Setup