Tag

Observability

2 views collected around this technical thread.

vivo Internet Technology
vivo Internet Technology
Jun 11, 2025 · Big Data

How Vivo Built a Scalable Pulsar Monitoring System for Trillion‑Message Workloads

This article details Vivo's end‑to‑end Pulsar observability solution, covering the challenges of Prometheus‑based monitoring, the architecture of the alerting pipeline, adaptor development, metric optimizations for subscription backlog and bundle load, and fixes for kop lag reporting issues.

Big DataMetricsObservability
0 likes · 12 min read
How Vivo Built a Scalable Pulsar Monitoring System for Trillion‑Message Workloads
Big Data Technology Tribe
Big Data Technology Tribe
Jun 10, 2025 · Cloud Native

Mastering eBPF Maps: Design, Implementation, and Real‑World Use Cases

This article provides an in‑depth analysis of BPF maps—explaining their design principles, core features, various map types with code examples, and the macro expansion process that turns high‑level BCC helpers into native kernel map definitions for cloud‑native observability.

BCCBPF mapsLinux Kernel
0 likes · 12 min read
Mastering eBPF Maps: Design, Implementation, and Real‑World Use Cases
DataFunSummit
DataFunSummit
Jun 1, 2025 · Big Data

Scaling WeChat’s Big Data and AI Workloads on Kubernetes: Challenges and Optimizations

This article details WeChat's migration of large‑scale big data and AI workloads to a cloud‑native Kubernetes platform, discussing performance bottlenecks, API server and ETCD overload protection, scheduler enhancements, observability solutions, resource utilization gains, and future serverless directions.

AIBig DataKubernetes
0 likes · 11 min read
Scaling WeChat’s Big Data and AI Workloads on Kubernetes: Challenges and Optimizations
Linux Ops Smart Journey
Linux Ops Smart Journey
May 29, 2025 · Cloud Native

Master Kubernetes Monitoring with kube-state-metrics and Prometheus

This guide walks you through deploying kube-state-metrics, configuring Prometheus scrape jobs, verifying metric collection, and adding Grafana dashboards to achieve a visible, manageable, and reliable Kubernetes monitoring solution for large‑scale clusters.

KubernetesObservabilityPrometheus
0 likes · 7 min read
Master Kubernetes Monitoring with kube-state-metrics and Prometheus
Java Architecture Diary
Java Architecture Diary
May 26, 2025 · Artificial Intelligence

How to Build Enterprise‑Ready AI Monitoring with Spring AI and Micrometer

This article explains why observability is essential for Spring AI applications, outlines common cost‑control and performance challenges, and provides a step‑by‑step guide—including Maven setup, client configuration, service implementation, metric exposure, Zipkin tracing, and architecture insights—to create a fully observable, enterprise‑grade AI translation service.

JavaMicrometerObservability
0 likes · 12 min read
How to Build Enterprise‑Ready AI Monitoring with Spring AI and Micrometer
Efficient Ops
Efficient Ops
May 7, 2025 · Operations

Why Choose SigNoz for Open‑Source Observability? A Deep Dive

This article introduces SigNoz, a self‑hosted open‑source observability platform that unifies metrics, logs, and traces, outlines its core capabilities, shows how to install it with Docker, and compares its resource efficiency to commercial solutions like DataDog and Elastic.

LogsMetricsObservability
0 likes · 4 min read
Why Choose SigNoz for Open‑Source Observability? A Deep Dive
macrozheng
macrozheng
May 7, 2025 · Backend Development

What’s New in Spring Boot 3.5? 13 Must‑Know Features for Java Backend Developers

Spring Boot 3.5 introduces a suite of enhancements—including task decorator support, the Vibur connection pool, SSL health metrics, flexible configuration loading, automatic Trace‑ID headers, richer Actuator capabilities, functional programming hooks, and many more—each explained with code examples and practical usage tips for modern Java backend development.

DevOpsJavaObservability
0 likes · 10 min read
What’s New in Spring Boot 3.5? 13 Must‑Know Features for Java Backend Developers
Java Architecture Diary
Java Architecture Diary
May 6, 2025 · Backend Development

Spring Boot 3.5 Release: Top 13 New Features You Must Know

Spring Boot 3.5 introduces a suite of powerful enhancements—including task decorator support, a new Vibur connection pool, SSL monitoring, flexible environment variable loading, Actuator-triggered Quartz jobs, automatic Trace ID headers, structured log customization, functional routing insights, expanded SSL client support, OpenTelemetry upgrades, Spring Batch tweaks, OAuth 2.0 JWT profiling, and functional bean registration—providing developers with richer capabilities for modern Java backend applications.

JavaObservabilitySpring Boot
0 likes · 11 min read
Spring Boot 3.5 Release: Top 13 New Features You Must Know
Raymond Ops
Raymond Ops
Apr 30, 2025 · Cloud Native

Master Loki Logging: Step-by-Step Kubernetes Deployment & Troubleshooting Guide

This comprehensive guide explains Loki's lightweight log aggregation architecture, compares it with ELK, details AllInOne, Helm, Kubernetes, and bare‑metal deployment methods, shows Promtail and Logstash integration, and provides practical troubleshooting tips for common issues.

HelmKubernetesLoki
0 likes · 23 min read
Master Loki Logging: Step-by-Step Kubernetes Deployment & Troubleshooting Guide
Efficient Ops
Efficient Ops
Apr 29, 2025 · Operations

Master Linux Performance: Essential Monitoring Tools & Commands

This guide compiles the most important Linux performance analysis utilities—such as vmstat, iostat, dstat, iotop, pidstat, top, htop, mpstat, netstat, ps, strace, uptime, lsof, and perf—explaining their usage, output fields, and how they fit into a comprehensive system observability workflow.

LinuxObservabilityPerformance Monitoring
0 likes · 15 min read
Master Linux Performance: Essential Monitoring Tools & Commands
Efficient Ops
Efficient Ops
Apr 25, 2025 · Operations

How Changan Auto Earned Top‑Tier DevOps Certification with a Full‑Link Observability Platform

Changan Automobile’s full‑link observability platform passed both ITU DevOps international and domestic standards assessments, showcasing its advanced monitoring capabilities, improved system stability, and strategic role in the company’s digital transformation, while the interview reveals implementation challenges, benefits, and future AI‑driven enhancements.

DevOpsDigital TransformationObservability
0 likes · 21 min read
How Changan Auto Earned Top‑Tier DevOps Certification with a Full‑Link Observability Platform
Baidu Geek Talk
Baidu Geek Talk
Apr 23, 2025 · Operations

Baidu SRE Digital Immunity System: Construction, Evolution, and Practice

Baidu’s SRE digital‑immune system, evolved into an AI‑powered intelligent immunity platform, quantifies and mitigates risk across thousands of services by integrating data‑driven monitoring, rule‑based detection, and large‑model GraphRAG knowledge mining, cutting degradation cases by ~40% and shifting operations from reactive troubleshooting to proactive, data‑centric quality assurance.

AIDigital ImmunityObservability
0 likes · 14 min read
Baidu SRE Digital Immunity System: Construction, Evolution, and Practice
Raymond Ops
Raymond Ops
Apr 22, 2025 · Operations

What Is OpenTelemetry? A Complete Guide to Modern Observability

OpenTelemetry unifies tracing and metrics by merging OpenTracing and OpenCensus, offering vendor‑neutral APIs, SDKs, and a collector that standardize telemetry data collection, context propagation, and export to various back‑ends, with detailed components such as Tracer, Meter, and shared Context layers.

MetricsObservabilitycloud native
0 likes · 12 min read
What Is OpenTelemetry? A Complete Guide to Modern Observability
Zhuanzhuan Tech
Zhuanzhuan Tech
Apr 16, 2025 · Backend Development

Analyzing Log4j2 Asynchronous Logging Blocking and Strategies for Fine-Grained Log Control

This article examines the causes of Log4j2 asynchronous logging blockage in high‑throughput Java services, explains the underlying Disruptor mechanics, and proposes a dual‑track logging architecture with compile‑time bytecode enhancement and IDE plugins for line‑level log activation.

JavaLogging StrategyObservability
0 likes · 15 min read
Analyzing Log4j2 Asynchronous Logging Blocking and Strategies for Fine-Grained Log Control
ByteDance Cloud Native
ByteDance Cloud Native
Apr 3, 2025 · Operations

How to Seamlessly Integrate CloudWeGo with APMPlus for Full‑Stack Observability

This article explains the challenges of observability in distributed microservice and LLM architectures, introduces CloudWeGo and APMPlus, and provides step‑by‑step integration guides for Kitex, Hertz, and Eino frameworks, including code samples, data reporting methods, and advanced monitoring features such as RED metrics, LLM‑specific indicators, service topology, and future roadmap.

APMCloudWeGoGo
0 likes · 13 min read
How to Seamlessly Integrate CloudWeGo with APMPlus for Full‑Stack Observability
ByteDance Cloud Native
ByteDance Cloud Native
Mar 27, 2025 · Operations

Taming High Cardinality in AI & Autonomous Driving with Prometheus

This article shares practical experience from Volcengine's managed Prometheus service and its deep integration with large‑model and autonomous‑driving platforms, explaining what high cardinality is, its impact on monitoring systems, root causes, and a range of design, collection, and analysis techniques to mitigate it.

AIObservabilityPrometheus
0 likes · 12 min read
Taming High Cardinality in AI & Autonomous Driving with Prometheus
Airbnb Technology Team
Airbnb Technology Team
Mar 24, 2025 · Artificial Intelligence

Chronon: Open‑Source Feature Platform for Machine Learning – Architecture, Workflow, and Code Examples

Chronon is an open‑source ML feature platform that lets engineers declaratively define, compute, and serve both batch and real‑time features with built‑in observability, data‑quality checks, and a low‑latency retrieval API, ensuring online‑offline consistency while simplifying pipeline management and enabling future automation.

ChrononFeature EngineeringObservability
0 likes · 13 min read
Chronon: Open‑Source Feature Platform for Machine Learning – Architecture, Workflow, and Code Examples
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Mar 23, 2025 · Frontend Development

Designing Effective Front-End Error Monitoring and Reporting Strategies

This article explains the core value of front‑end error monitoring, outlines key error categories, presents practical code examples for capturing explicit, implicit, resource, promise and framework errors, and proposes a multi‑layer defense strategy to improve observability, response time and team collaboration.

JavaScriptObservabilityerror monitoring
0 likes · 12 min read
Designing Effective Front-End Error Monitoring and Reporting Strategies
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
Mar 20, 2025 · Operations

Unlocking Application Reliability: Core APM Modules and Yunzhou’s OpenTelemetry Design

This article explains Application Performance Monitoring (APM), its key benefits such as business continuity, performance optimization, and cost reduction, outlines essential APM modules, and details Yunzhou Observation’s OpenTelemetry‑based design, data ingestion, processing, visualization, and future roadmap for observability.

APMObservabilityOpenTelemetry
0 likes · 10 min read
Unlocking Application Reliability: Core APM Modules and Yunzhou’s OpenTelemetry Design
Tencent Cloud Developer
Tencent Cloud Developer
Mar 19, 2025 · Cloud Native

Kubernetes Monitoring: Why It’s Needed, Core Components, and Metric Exposure

Monitoring Kubernetes is essential to detect resource contention, component failures, and network issues; it involves tracking core component metrics such as API server latency, etcd write times, scheduler delays, as well as node‑level CPU, memory, disk, and network statistics, pod health, and custom application metrics exposed via Prometheus exporters for comprehensive observability.

ExportersKubernetesMetrics
0 likes · 23 min read
Kubernetes Monitoring: Why It’s Needed, Core Components, and Metric Exposure