Tagged articles
577 articles
Page 1 of 6
Spring Full-Stack Practical Cases
Spring Full-Stack Practical Cases
May 19, 2026 · Backend Development

Why Logs Alone Fail in Spring Boot: Achieving True Observability

The article explains that relying solely on log statements in Spring Boot applications cannot reveal request identities, latency, async task health, failure details, or cross‑service flows, and demonstrates how to augment logs with MDC correlation IDs, Micrometer metrics, and Zipkin tracing for comprehensive observability.

Observabilityloggingmetrics
0 likes · 9 min read
Why Logs Alone Fail in Spring Boot: Achieving True Observability
AI Engineer Programming
AI Engineer Programming
May 2, 2026 · Artificial Intelligence

From Demo to Production: How to Evaluate RAG Effectively

This guide outlines a comprehensive RAG evaluation framework covering failure modes, multi‑layer metrics, test‑set construction, open‑source tools, CI/CD quality gates, production monitoring, and special considerations for agentic RAG to ensure reliable, trustworthy retrieval‑augmented generation systems.

AIGenerationLLM
0 likes · 18 min read
From Demo to Production: How to Evaluate RAG Effectively
Alibaba Cloud Native
Alibaba Cloud Native
Apr 26, 2026 · Cloud Native

Seeing Inside Hermes: Full Visibility into Agent Execution with OpenTelemetry

The article introduces Alibaba Cloud's Hermes observability plugin built on OpenTelemetry, which transforms the previously opaque AI agent runtime into a fully traceable system by recording every reasoning step, tool invocation, token usage, latency, and security event, enabling precise cost attribution, performance analysis, and audit of high‑risk behaviors.

AI AgentHermesObservability
0 likes · 13 min read
Seeing Inside Hermes: Full Visibility into Agent Execution with OpenTelemetry
Smart Workplace Lab
Smart Workplace Lab
Apr 19, 2026 · Industry Insights

How to Turn AI-Boosted Productivity into Visible Performance Metrics

This article presents a practical framework for documenting AI‑enhanced work contributions, introducing a weekly performance‑evidence matrix that quantifies decision density, risk interception, and asset accumulation, along with communication scripts tailored to different manager types and step‑by‑step SOPs for archiving proof, helping professionals turn speed gains into measurable performance value.

AIEvidencemetrics
0 likes · 7 min read
How to Turn AI-Boosted Productivity into Visible Performance Metrics
PMTalk Product Manager Community
PMTalk Product Manager Community
Apr 10, 2026 · Artificial Intelligence

Why AI Product Evaluation Is Hard and How to Build a Scientific Assessment Framework

The article analyzes the unique challenges of evaluating AI products—output uncertainty, subjective criteria, over‑fitting risk, high cost, and vague metrics—compares traditional testing with AI testing, proposes a five‑step evaluation workflow, defines concrete metrics such as pass rate and efficiency gain, and illustrates the process with a real‑world sales‑script generation case study, concluding with five key success factors and future trends.

AI EvaluationAutomationCase Study
0 likes · 13 min read
Why AI Product Evaluation Is Hard and How to Build a Scientific Assessment Framework
AI Step-by-Step
AI Step-by-Step
Apr 8, 2026 · Operations

How to Light Up the Black Box of LLM Agents with Full‑Stack Observability

The article explains why traditional logs are insufficient for LLM agents, outlines five observability dimensions—tracing, metrics, behavioral governance, state & memory, and evaluation—and provides concrete, open‑source‑based steps to instrument, monitor, and act on agent workloads in production.

Behavioral GovernanceLLM agentsObservability
0 likes · 11 min read
How to Light Up the Black Box of LLM Agents with Full‑Stack Observability
Alibaba Cloud Native
Alibaba Cloud Native
Apr 5, 2026 · Operations

How OpenClaw CMS Plugin v0.1.2 Turns Agent Tracing into Precise, Cost‑Effective Observability

The OpenClaw CMS observability plugin v0.1.2 solves the hidden‑trace problem by fully restoring multi‑round LLM execution, stabilizing concurrent chains, and introducing granular agent metrics, enabling developers, testers, and operators to debug faster, assess costs accurately, and improve cross‑team collaboration.

AgentCloud NativeObservability
0 likes · 8 min read
How OpenClaw CMS Plugin v0.1.2 Turns Agent Tracing into Precise, Cost‑Effective Observability
AgentGuide
AgentGuide
Apr 3, 2026 · Artificial Intelligence

How to Evaluate RAG Systems: Key Metrics and the Ragas Framework

The article explains how to assess Retrieval-Augmented Generation (RAG) projects using the Ragas automated evaluation framework, detailing four key dimensions—recall quality, answer faithfulness, answer relevance, and context utilization—and describes the underlying metrics for both retrieval and generation stages.

LLMRAGRAGAS
0 likes · 5 min read
How to Evaluate RAG Systems: Key Metrics and the Ragas Framework
DevOps Coach
DevOps Coach
Mar 26, 2026 · Industry Insights

Which DevOps Metrics Will Drive Business Success by 2026?

The article analyzes how traditional DevOps activity metrics are being replaced by outcome‑focused indicators that directly affect cost, delivery speed, reliability and overall business performance, citing New Relic and Flexera forecasts and outlining the metrics teams should adopt or discard by 2026.

DevOpsDoRAFinOps
0 likes · 13 min read
Which DevOps Metrics Will Drive Business Success by 2026?
Big Data Tech Team
Big Data Tech Team
Mar 18, 2026 · Big Data

From Zero to One: Building Enterprise Data Standards for Data Warehouses

This guide explains why data standards are essential for data warehouses, outlines the four categories of standards, and provides a step‑by‑step process—including research, framework design, template creation, review, implementation, and ongoing maintenance—to help practitioners and interviewees establish robust, business‑aligned data standards.

Data StandardizationData Warehousemetrics
0 likes · 10 min read
From Zero to One: Building Enterprise Data Standards for Data Warehouses
Woodpecker Software Testing
Woodpecker Software Testing
Mar 15, 2026 · R&D Management

Shift‑Left Testing: Transforming Teams from Reactive Bug‑Fixers to Proactive Quality Architects

The article explains how shift‑left testing evolves from a simple early‑testing tactic into a comprehensive team transformation that embeds quality into every stage of software delivery, detailing new roles, metrics, toolchains, and practical steps for test experts to become quality architects.

Quality EngineeringShift-Left TestingTeam Transformation
0 likes · 8 min read
Shift‑Left Testing: Transforming Teams from Reactive Bug‑Fixers to Proactive Quality Architects
PMTalk Product Manager Community
PMTalk Product Manager Community
Mar 15, 2026 · Product Management

7-Step Architecture Framework for AI Product Management: A Hands‑On Case Study

This article walks through a real‑world AI‑driven image generation system for cross‑border e‑commerce, detailing business pain points, stakeholder analysis, technical selection, MVP scope, architecture decisions, metric funnels, gray‑release strategy, and continuous evolution that cut per‑image cost to under ¥0.5 and delivery time to one minute.

AICase Studyarchitecture
0 likes · 16 min read
7-Step Architecture Framework for AI Product Management: A Hands‑On Case Study
PMTalk Product Manager Community
PMTalk Product Manager Community
Mar 13, 2026 · Product Management

How AI Product Managers Should Rethink Funnel Analysis

In the AI era the classic funnel of exposure‑click‑register‑retain‑pay no longer reflects value creation, so product managers must shift the focus to effective task entry, first usable results, mid‑funnel adoption, retention of high‑impact tasks, and stable commercial metrics.

AIFunnel AnalysisGrowth
0 likes · 24 min read
How AI Product Managers Should Rethink Funnel Analysis
Architect-Kip
Architect-Kip
Mar 4, 2026 · Operations

Essential SRE Monitoring and Alerting Standards: From Metrics to Incident Response

This guide outlines comprehensive SRE monitoring and alerting standards, covering core principles, log instrumentation, health‑check requirements, baseline resource and application metrics, alarm severity tiers, response SLAs, on‑call rotation, continuous optimization, and noise‑reduction mechanisms to ensure reliable service operation.

AlertingOperationsSRE
0 likes · 14 min read
Essential SRE Monitoring and Alerting Standards: From Metrics to Incident Response
DeWu Technology
DeWu Technology
Mar 2, 2026 · Big Data

Mastering Spark UI: Deep Dive into Metrics, Tuning, and Real‑World Cases

This article provides a comprehensive guide to Spark UI, explaining each primary and secondary tab, the key metrics they expose, and how to interpret them for performance bottleneck detection, followed by two detailed case studies and practical tuning recommendations for Spark workloads.

Big DataCase StudySpark
0 likes · 19 min read
Mastering Spark UI: Deep Dive into Metrics, Tuning, and Real‑World Cases
Alibaba Cloud Native
Alibaba Cloud Native
Mar 2, 2026 · Artificial Intelligence

How to Make AI Agents Auditable and Controlled with OpenClaw, SLS, and OTEL

This article explains how to combine OpenClaw session logs, application logs, and OpenTelemetry metrics in Alibaba Cloud SLS to answer who triggered an AI agent, what actions were taken, how much it cost, and whether the behavior is traceable, enabling a complete observability and security solution for AI agents.

AI AgentOTELObservability
0 likes · 26 min read
How to Make AI Agents Auditable and Controlled with OpenClaw, SLS, and OTEL
Woodpecker Software Testing
Woodpecker Software Testing
Mar 1, 2026 · Artificial Intelligence

Four Hidden Model Evaluation Pitfalls That Undermine AI Deployments

The article examines four common yet hidden model evaluation mistakes—confusing attractive metrics with business impact, using static test sets, ignoring statistical significance, and lacking fine‑grained attribution—illustrating each with real‑world cases and offering concrete practices to build a more robust, business‑aligned evaluation pipeline.

A/B testingAI deploymentModel Evaluation
0 likes · 8 min read
Four Hidden Model Evaluation Pitfalls That Undermine AI Deployments
Yunqi AI+
Yunqi AI+
Feb 22, 2026 · R&D Management

Rethinking Product Development: How AI Reshapes the Value Stream, Not Just Code Speed

The article analyzes how AI has evolved from a code‑completion aid to a foundational operating system that forces product‑research teams to redesign the entire requirement‑to‑delivery value stream, outlining practical boundaries, pilot implementation, organizational role changes, metric shifts, and risk governance.

AIR&D managementSoftware Engineering
0 likes · 17 min read
Rethinking Product Development: How AI Reshapes the Value Stream, Not Just Code Speed
dbaplus Community
dbaplus Community
Feb 8, 2026 · Databases

Why Oracle AWR Is the Gold Standard for DB Performance and How Domestic Databases Compare

The article explains Oracle's Automatic Workload Repository (AWR) as a comprehensive performance‑diagnostic tool, breaks down its core functions, and then evaluates how several domestic databases such as Kingbase measure up in terms of report completeness, metric richness, SQL analysis, wait‑event handling, OS integration, and usability.

AWRDomestic DatabasesOracle
0 likes · 21 min read
Why Oracle AWR Is the Gold Standard for DB Performance and How Domestic Databases Compare
Raymond Ops
Raymond Ops
Feb 2, 2026 · Operations

10 Essential PromQL Queries Every Ops Engineer Should Master

This article presents ten practical PromQL query examples covering CPU, memory, disk, network, database, Kubernetes, and business metrics, explains the underlying concepts, provides alert thresholds and best‑practice tips, and includes advanced optimization and alert‑rule design guidance for reliable monitoring.

AlertingObservabilityPromQL
0 likes · 22 min read
10 Essential PromQL Queries Every Ops Engineer Should Master
Ops Community
Ops Community
Jan 27, 2026 · Operations

Master Linux System Monitoring: Deep Dive into CPU, Memory, and I/O Metrics

This comprehensive guide explains how to collect and analyze Linux system metrics—including CPU usage, memory consumption, disk I/O, and load average—using native /proc and /sys interfaces, popular command‑line tools, and Prometheus Node Exporter, with practical scripts, configuration examples, and troubleshooting case studies for reliable performance monitoring and capacity planning.

LinuxPrometheusSysadmin
0 likes · 39 min read
Master Linux System Monitoring: Deep Dive into CPU, Memory, and I/O Metrics
PMTalk Product Manager Community
PMTalk Product Manager Community
Jan 18, 2026 · Product Management

Cut Through the Fog: How Product Managers Can Re‑Anchor Value and Evolve

Amid slowing growth and noisy data, product managers face three crises—demand fog, value vacuum, and capability gaps; the article offers a step‑by‑step framework with real‑world cases to clarify user needs, align actions with business goals, strengthen technical and analytical skills, and make data‑driven decisions that turn feature work into measurable value.

User Researchdecision makinggrowth strategies
0 likes · 14 min read
Cut Through the Fog: How Product Managers Can Re‑Anchor Value and Evolve
Woodpecker Software Testing
Woodpecker Software Testing
Jan 13, 2026 · User Experience Design

A Complete User Experience Testing Process: From Planning to Implementation

The article outlines a systematic, end‑to‑end UX testing workflow—defining goals, designing test plans, recruiting representative users, preparing materials, calibrating and managing test sessions, collecting quantitative and qualitative data, analyzing results with metrics like SUS and efficiency index, extracting actionable insights, and converting findings into concrete product improvements—highlighting how AI‑driven tools can boost test efficiency and business value.

AI Testing ToolsProduct DesignUX Research
0 likes · 7 min read
A Complete User Experience Testing Process: From Planning to Implementation
Programmer DD
Programmer DD
Jan 12, 2026 · Artificial Intelligence

5 Counterintuitive Lessons for Evaluating AI Agents Effectively

This article shares five surprising, high‑impact lessons from Anthropic on building robust AI agent evaluation suites, covering early failure‑case collections, recognizing clever “failures,” focusing on outcomes over process, choosing the right success metrics, and the irreplaceable value of human review.

AI EvaluationAnthropicagent testing
0 likes · 10 min read
5 Counterintuitive Lessons for Evaluating AI Agents Effectively
Huolala Tech
Huolala Tech
Jan 7, 2026 · Operations

How Exemplar Bridges the Last‑Mile Gap in Observability

Facing the “last mile” challenge of correlating metrics, logs, and traces, the article examines common heterogeneous storage architectures, critiques existing Exemplar implementations, and presents HuoLala’s end‑to‑end solution that treats Exemplar as an independent observable dimension, detailing its data model, SDK integration, collector, and interactive visualization.

ExemplarLogAggregationObservability
0 likes · 22 min read
How Exemplar Bridges the Last‑Mile Gap in Observability
Woodpecker Software Testing
Woodpecker Software Testing
Jan 5, 2026 · Backend Development

Five Core Dimensions of Maintainability Testing for Microservice Systems

This article presents a detailed, step‑by‑step guide to maintainability testing, defining five core dimensions—modularization, reusability, analysability, modifiability, and testability—along with their metrics, a relationship model, a comprehensive microservice e‑shop case study, concrete test scenarios, code examples, and best‑practice recommendations for improving software quality and delivery speed.

DevOpsMicroservicesarchitecture
0 likes · 20 min read
Five Core Dimensions of Maintainability Testing for Microservice Systems
Woodpecker Software Testing
Woodpecker Software Testing
Jan 5, 2026 · Operations

Three Core Dimensions of Performance Testing: Time Behavior, Resource Utilization, and Capacity

This article breaks down performance testing into three essential dimensions—time behavior, resource utilization, and capacity—explains their key metrics, demonstrates a detailed e‑commerce flash‑sale case study, and shows how systematic testing and optimization can dramatically improve response times, throughput, and scalability.

JMeterLoad TestingPerformance Testing
0 likes · 12 min read
Three Core Dimensions of Performance Testing: Time Behavior, Resource Utilization, and Capacity
DevOps Coach
DevOps Coach
Dec 26, 2025 · Operations

10 Actionable Agile Metrics to Replace Velocity and Deliver Real Value

This article presents ten practical, measurable Agile metrics—each with a problem statement, improvement action, real‑world example, concise code snippet, and baseline—showing how teams can shift from velocity to telemetry that reveals flow, quality, and predictability.

agilemetricstelemetry
0 likes · 20 min read
10 Actionable Agile Metrics to Replace Velocity and Deliver Real Value
DevOps Coach
DevOps Coach
Dec 22, 2025 · R&D Management

Why We Abandoned Scrum: Inside Our Developer‑Led Delivery Transformation

After discovering that traditional Agile rituals stifled high‑output engineering teams, we rebuilt our workflow around autonomous, domain‑owned squads using GitHub PRs, feature flags, and real‑time metrics, resulting in dramatically faster deployments, fewer incidents, and higher developer satisfaction.

Agile TransformationDeveloper-Led DeliveryFlow Engineering
0 likes · 8 min read
Why We Abandoned Scrum: Inside Our Developer‑Led Delivery Transformation
Alibaba Cloud Observability
Alibaba Cloud Observability
Dec 15, 2025 · Cloud Native

How UModel PaaS API Simplifies Observability Queries with Unified Entity Search

This article explains how the UModel PaaS API abstracts complex observability concepts—such as EntitySet, DataSet, StorageLink, and Filter—into a unified, object‑oriented query interface, offering Table, Object, and metadata modes, code examples, UI and SDK usage, and AI‑agent integration for efficient, low‑maintenance monitoring.

AI AgentAPICloud Native
0 likes · 16 min read
How UModel PaaS API Simplifies Observability Queries with Unified Entity Search
PMTalk Product Manager Community
PMTalk Product Manager Community
Dec 9, 2025 · Product Management

Real‑World AI Data Analysis Case for Product Managers: Iteration & Optimization

The article shows how product managers can avoid the disappointment of a feature that looks perfect but gets no users by building a complete data‑driven loop that combines user‑behavior and business metrics, walks through a real e‑commerce recommendation case, outlines data‑collection pitfalls, metric‑design methods, hypothesis‑driven analysis, testing procedures and concrete steps to turn insights into iterative product improvements.

AICase Studydata analysis
0 likes · 33 min read
Real‑World AI Data Analysis Case for Product Managers: Iteration & Optimization
DevOps Coach
DevOps Coach
Dec 8, 2025 · Operations

How to Quantify SRE ROI: Turning Reliability Metrics into Business Value

This article explains how SRE leaders can bridge the gap between technical reliability metrics and business outcomes by defining core SRE concepts, applying a step‑by‑step ROI formula, illustrating code‑level impact, avoiding common pitfalls, and looking ahead to AI‑driven reliability forecasting.

BusinessValueOperationsROI
0 likes · 10 min read
How to Quantify SRE ROI: Turning Reliability Metrics into Business Value
Ray's Galactic Tech
Ray's Galactic Tech
Nov 26, 2025 · Cloud Native

Mastering Kubernetes Performance Bottlenecks: The Ultimate Troubleshooting Guide

This comprehensive guide walks you through the seven key performance metrics, resource, application, and system component indicators, and provides step‑by‑step methods, advanced tips, and tool recommendations for diagnosing and resolving Kubernetes performance bottlenecks from cluster‑wide to pod‑level details.

Cloud NativeKubernetesmetrics
0 likes · 11 min read
Mastering Kubernetes Performance Bottlenecks: The Ultimate Troubleshooting Guide
IT Architects Alliance
IT Architects Alliance
Nov 25, 2025 · Operations

Making Architecture Decisions Observable with DevOps Monitoring

The article explains how to integrate architecture decision tracking into DevOps monitoring, detailing tagging, multi‑layer metric design, time‑window analysis, automated alerts, reporting, and continuous optimization to turn architectural choices into measurable, data‑driven outcomes.

DevOpsObservabilitycloud-native
0 likes · 9 min read
Making Architecture Decisions Observable with DevOps Monitoring
Architecture Digest
Architecture Digest
Nov 24, 2025 · Operations

Boost Java Service Performance with MyPerf4J: A High‑Speed, Low‑Impact Monitoring Tool

MyPerf4J is an open‑source, high‑performance Java monitoring and statistics tool that uses a JavaAgent for zero‑intrusion, records up to ten million method calls per second with nanosecond precision, and provides real‑time metrics such as QPS, latency percentiles, memory and GC stats, making it ideal for both development and production environments.

JavaJavaAgentPerformance Monitoring
0 likes · 6 min read
Boost Java Service Performance with MyPerf4J: A High‑Speed, Low‑Impact Monitoring Tool
Wu Shixiong's Large Model Academy
Wu Shixiong's Large Model Academy
Nov 20, 2025 · Artificial Intelligence

How to Build a Quantifiable Data Quality Framework for Dynamic Incremental RAG

This article explains why static RAG metrics don’t apply to dynamic pipelines, introduces five essential dimensions—Parseability, Deduplication, Relevance, Chunk Quality, and Freshness—and shows how to combine them into a weighted score that enables monitoring, alerts, and continuous improvement of dynamic RAG systems.

Data QualityDynamic RAGRetrieval Augmented Generation
0 likes · 10 min read
How to Build a Quantifiable Data Quality Framework for Dynamic Incremental RAG
High Availability Architecture
High Availability Architecture
Nov 14, 2025 · Artificial Intelligence

Quantifying AI Programming Efficiency: A Traceable and Measurable System

This article outlines the challenges of tracking AI‑generated code and measuring AI contribution, reviews earlier ad‑hoc methods, and presents a comprehensive solution featuring a VSCode plugin for unified AI dialogue management and a cloud service that quantifies AI impact across projects, teams, and individual developers.

AIAnalyticsVSCode
0 likes · 9 min read
Quantifying AI Programming Efficiency: A Traceable and Measurable System
DevOps Coach
DevOps Coach
Nov 10, 2025 · Operations

How to Use SRE Metrics for Data‑Driven Reliability and Faster Releases

This guide explains the SRE framework—SLA, SLO, SLI hierarchy, golden signals, error budgets, and DORA metrics—showing how to instrument a Python app with OpenTelemetry, query Prometheus, avoid common pitfalls, and adopt a cultural and technical process that balances feature velocity with system stability.

DoRAError BudgetGolden Signals
0 likes · 18 min read
How to Use SRE Metrics for Data‑Driven Reliability and Faster Releases
Architect
Architect
Nov 4, 2025 · Operations

How to Accurately Track API Calls per Minute: 5 Proven Monitoring Strategies

This article explores why precise per‑minute API call statistics are essential for performance bottleneck detection, capacity planning, security alerts, billing, and troubleshooting, and presents five practical implementations—including fixed‑window counters, sliding windows, AOP‑based interception, Redis time‑series storage, and Micrometer‑Prometheus integration—along with their trade‑offs and capacity‑planning guidelines.

API monitoringJavaPerformance Optimization
0 likes · 25 min read
How to Accurately Track API Calls per Minute: 5 Proven Monitoring Strategies
JakartaEE China Community
JakartaEE China Community
Nov 4, 2025 · Operations

How Logs, Traces, and Metrics Differ—and Why It Matters

Logs, tracing, and metrics each serve distinct monitoring goals—logs capture discrete events for debugging and audit, traces map request flows to pinpoint performance bottlenecks, and metrics provide time‑series health data; understanding their differences and integrating tools like ELK, OpenTelemetry, Prometheus, and Grafana enables robust observability.

ELKGrafanaObservability
0 likes · 7 min read
How Logs, Traces, and Metrics Differ—and Why It Matters
Alibaba Cloud Developer
Alibaba Cloud Developer
Oct 27, 2025 · Artificial Intelligence

How to Build a Quantifiable AI Coding Efficiency Metric System

This article explains how, amid the rapid rise of AI‑assisted programming, a scientific and actionable R&D efficiency metric framework was designed, detailing core indicators such as AI code adoption rate, data collection methods, platform architecture, and practical insights from a large‑scale implementation.

AIMCPcoding
0 likes · 18 min read
How to Build a Quantifiable AI Coding Efficiency Metric System
Raymond Ops
Raymond Ops
Oct 12, 2025 · Operations

Master PromQL: From Basics to Advanced Query Techniques

This comprehensive guide walks you through PromQL fundamentals, covering data types, gauge and counter metrics, time‑series concepts, query selectors, offsets, arithmetic and logical operators, vector matching, aggregation functions, and key Prometheus functions such as increase, rate, and histogram_quantile, with practical examples and visual illustrations.

AlertingPromQLPrometheus
0 likes · 29 min read
Master PromQL: From Basics to Advanced Query Techniques
Efficient Ops
Efficient Ops
Oct 9, 2025 · Operations

Changan Auto’s Dual DevOps Certification: Boosting Delivery Speed and Quality

Changan Automobile’s Gaia platform V3.0 earned both international ITU and domestic DevOps certifications, demonstrating a mature, end‑to‑end DevOps system that dramatically shortened deployment cycles, reduced failure rates, and enhanced automation coverage, while outlining future plans for AI‑driven optimization and broader enterprise adoption.

AutomationContinuous DeliveryDevOps
0 likes · 16 min read
Changan Auto’s Dual DevOps Certification: Boosting Delivery Speed and Quality
Java One
Java One
Sep 21, 2025 · Operations

Mastering Prometheus rate, irate, and increase: When and How to Use Each

This article explains how Prometheus’s rate, irate, and increase functions calculate counter growth rates, handle counter resets, and differ in smoothing and responsiveness, guiding you to choose the appropriate function for monitoring request rates, CPU usage, and other metrics.

Prometheusincreaseirate
0 likes · 7 min read
Mastering Prometheus rate, irate, and increase: When and How to Use Each
Efficient Ops
Efficient Ops
Sep 15, 2025 · Operations

Mastering Prometheus Histograms: From Basics to Advanced Queries

This article explains the fundamentals of Prometheus Histogram metrics, covering data format, metric types, how histograms work as cumulative time series, provides Go code examples for collection, and demonstrates practical queries for rate, bucket analysis, and quantile calculations to monitor service performance.

GoHistogrammetrics
0 likes · 12 min read
Mastering Prometheus Histograms: From Basics to Advanced Queries
Code Ape Tech Column
Code Ape Tech Column
Sep 12, 2025 · Operations

Master Grafana & Prometheus: Step‑by‑Step Guide to Build a Full‑Featured Monitoring System

This comprehensive tutorial walks you through installing and configuring Grafana, Prometheus, and related exporters, setting up dashboards, enabling email alerts, and extending monitoring to MySQL, RabbitMQ, Redis, and TiDB, all while providing clear code snippets and practical tips for a robust observability stack.

AlertingDevOpsGrafana
0 likes · 24 min read
Master Grafana & Prometheus: Step‑by‑Step Guide to Build a Full‑Featured Monitoring System
dbaplus Community
dbaplus Community
Sep 1, 2025 · Operations

How to Keep VictoriaMetrics Stable During Sudden Metric Surges

This article outlines practical strategies for protecting VictoriaMetrics storage under bursty metric traffic, covering communication with business teams, splitting deployments, choosing single‑node versus cluster setups, key monitoring metrics, separate storage for self‑monitoring, the VMUI Explore UI, and techniques for discarding high‑cardinality metrics.

VictoriaMetricsmetricsmonitoring
0 likes · 10 min read
How to Keep VictoriaMetrics Stable During Sudden Metric Surges
Kuaishou Frontend Engineering
Kuaishou Frontend Engineering
Jul 3, 2025 · Frontend Development

How Kuaishou’s Tianshou Platform Scales Front‑End Quality for Billions of Users

The article reviews the evolution of Kuaishou's Tianshou front‑end quality assurance platform, its layered architecture, distributed scheduler, quality models, measurement functions, DMAIC process, and lessons learned in scaling to billions of DAU, offering a blueprint for building robust front‑end engineering systems.

Scalabilityarchitecturedmaic
0 likes · 25 min read
How Kuaishou’s Tianshou Platform Scales Front‑End Quality for Billions of Users
Alibaba Cloud Developer
Alibaba Cloud Developer
Jun 26, 2025 · Artificial Intelligence

How to Build a Multi‑Dimensional Evaluation Framework for AI‑Powered Data Analysis Platforms

This article outlines the design of a scientific, quantifiable, multi‑dimensional evaluation system for the DataV‑Note intelligent analysis platform, addressing the lack of unified standards and accuracy challenges in AI‑driven data reporting, and proposes concrete metrics, model architecture, and future automation plans.

AI EvaluationModel Designdata analysis
0 likes · 13 min read
How to Build a Multi‑Dimensional Evaluation Framework for AI‑Powered Data Analysis Platforms
Qiming AI - Digital Management Talk
Qiming AI - Digital Management Talk
Jun 23, 2025 · Operations

9 Essential Supply Chain Metrics to Transform Data‑Driven Decisions

This article outlines nine crucial supply‑chain metrics across procurement, production, logistics and overall efficiency, explains their formulas and real‑world examples, and shows how each indicator can be used to identify problems, benchmark performance, and drive data‑driven decision‑making for cost reduction and customer satisfaction.

Data-drivenLogisticsefficiency
0 likes · 12 min read
9 Essential Supply Chain Metrics to Transform Data‑Driven Decisions
Qunhe Technology Quality Tech
Qunhe Technology Quality Tech
Jun 12, 2025 · Artificial Intelligence

Boosting CAD & Ad Design Algorithms with a Goldenset Review Platform

The article describes how a custom algorithm review platform, built around goldenset test cases, quantifies and visualizes CAD recognition and advertising design tool outputs, enabling rapid regression testing, objective metric tracking, and efficient manual review, ultimately improving development speed and bug detection rates.

AdvertisingCADalgorithm
0 likes · 12 min read
Boosting CAD & Ad Design Algorithms with a Goldenset Review Platform
Alibaba Cloud Observability
Alibaba Cloud Observability
Jun 3, 2025 · Cloud Native

How PromQL Copilot Turns Natural Language into Precise Monitoring Queries

PromQL Copilot leverages Alibaba Cloud's observability platform and AI techniques to convert ambiguous natural‑language monitoring requests into accurate PromQL statements, addressing challenges of ambiguity, domain knowledge, and metric coverage while providing generation, explanation, diagnosis, and recommendation features for cloud‑native environments.

AICloud NativePromQL
0 likes · 12 min read
How PromQL Copilot Turns Natural Language into Precise Monitoring Queries
Architecture Breakthrough
Architecture Breakthrough
May 26, 2025 · R&D Management

How One KPI Can Transform R&D Efficiency: Lessons from TDengine

The article analyzes why overly complex R&D metrics often hinder productivity, proposes aligning indicators with company strategy and culture, and illustrates the approach with TDengine’s single‑KPI model and a three‑metric framework for banking, while also detailing the “Everything as Code” practices that boost development speed and quality.

R&D efficiencyeverything as codeindustry insight
0 likes · 9 min read
How One KPI Can Transform R&D Efficiency: Lessons from TDengine
Efficient Ops
Efficient Ops
May 7, 2025 · Operations

Why Choose SigNoz for Open‑Source Observability? A Deep Dive

This article introduces SigNoz, a self‑hosted open‑source observability platform that unifies metrics, logs, and traces, outlines its core capabilities, shows how to install it with Docker, and compares its resource efficiency to commercial solutions like DataDog and Elastic.

ObservabilityOpenTelemetryOperations
0 likes · 4 min read
Why Choose SigNoz for Open‑Source Observability? A Deep Dive
dbaplus Community
dbaplus Community
Apr 24, 2025 · Operations

How Ctrip Built a Scalable Observability Platform and AIOps Engine for Millions of Metrics and Logs

This article details Ctrip's end‑to‑end observability platform—covering metrics, logging, and tracing—its architecture, data governance, AIOps capabilities, and practical case studies, while addressing challenges like data volume, alert noise, and metric explosion in a massive micro‑service environment.

Ctripaiopscloud‑native
0 likes · 17 min read
How Ctrip Built a Scalable Observability Platform and AIOps Engine for Millions of Metrics and Logs
Raymond Ops
Raymond Ops
Apr 22, 2025 · Operations

What Is OpenTelemetry? A Complete Guide to Modern Observability

OpenTelemetry unifies tracing and metrics by merging OpenTracing and OpenCensus, offering vendor‑neutral APIs, SDKs, and a collector that standardize telemetry data collection, context propagation, and export to various back‑ends, with detailed components such as Tracer, Meter, and shared Context layers.

cloud-nativemetricstelemetry
0 likes · 12 min read
What Is OpenTelemetry? A Complete Guide to Modern Observability
21CTO
21CTO
Apr 9, 2025 · Operations

9 Must‑Have Container Monitoring Tools and Best Practices for Modern Cloud‑Native Environments

This article reviews nine practical container‑monitoring solutions—from Last9 and Prometheus to Dynatrace and Elastic Observability—detailing their key features, pricing, and why developers prefer them, and then offers comprehensive best‑practice guidance for metrics, tagging, alerts, and advanced observability strategies in Kubernetes‑driven cloud‑native deployments.

AlertingCloud NativeDevOps
0 likes · 25 min read
9 Must‑Have Container Monitoring Tools and Best Practices for Modern Cloud‑Native Environments
Tencent Cloud Developer
Tencent Cloud Developer
Mar 19, 2025 · Cloud Native

Kubernetes Monitoring: Why It’s Needed, Core Components, and Metric Exposure

Monitoring Kubernetes is essential to detect resource contention, component failures, and network issues; it involves tracking core component metrics such as API server latency, etcd write times, scheduler delays, as well as node‑level CPU, memory, disk, and network statistics, pod health, and custom application metrics exposed via Prometheus exporters for comprehensive observability.

Cloud NativeExportersKubernetes
0 likes · 23 min read
Kubernetes Monitoring: Why It’s Needed, Core Components, and Metric Exposure
JD Tech Talk
JD Tech Talk
Feb 26, 2025 · Operations

Business Monitoring: Importance, Metric System Design, and Practical Implementation

This article explains the significance of business monitoring, distinguishes technical and business metrics, outlines a step‑by‑step process for building a business metric system, and shares practical experiences, tools, and common pitfalls to help teams improve operational reliability and decision‑making.

Operationsbusiness monitoringincident management
0 likes · 13 min read
Business Monitoring: Importance, Metric System Design, and Practical Implementation
Bitu Technology
Bitu Technology
Jan 15, 2025 · Operations

Refactoring Playback Error Reporting, Metrics, and Recovery in Tubi Web/OTT Player

The article details how Tubi's Web/OTT team restructured player error reporting, statistical metrics, and unified handling, introduced precise error‑tracking enums, defined new recovery strategies for device decoding, network, and cache issues, and validated their impact through extensive experiments that improved user experience and key business KPIs.

OTTOperationsVideo Streaming
0 likes · 14 min read
Refactoring Playback Error Reporting, Metrics, and Recovery in Tubi Web/OTT Player
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Jan 3, 2025 · Cloud Native

How to Enable LLM Traffic Observability with Alibaba Cloud Service Mesh (ASM)

This guide explains how to use Alibaba Cloud Service Mesh (ASM) to add infrastructure‑level observability for large language model (LLM) traffic, covering custom access‑log fields, new Prometheus metrics for token usage, and adding model dimensions to native Istio metrics, with step‑by‑step commands and configuration examples.

ASMKubernetesLLM
0 likes · 14 min read
How to Enable LLM Traffic Observability with Alibaba Cloud Service Mesh (ASM)
Architect
Architect
Dec 31, 2024 · Operations

Integrating Prometheus with Spring Boot and Visualizing Metrics Using Grafana

This guide explains how to monitor a Spring Boot application using Prometheus, configure Spring Boot Actuator, run Prometheus (including Docker deployment), set up Grafana for visualizing metrics, and create custom metrics with Micrometer, providing step‑by‑step instructions and code examples.

ActuatorDockerGrafana
0 likes · 10 min read
Integrating Prometheus with Spring Boot and Visualizing Metrics Using Grafana
Kuaishou Tech
Kuaishou Tech
Dec 11, 2024 · Frontend Development

Performance Governance and Optimization of Kuaishou Commercial Frontend Pages

This article presents a comprehensive analysis of page performance issues across Kuaishou's commercial front‑end projects, outlines the challenges of unified governance, B‑end experience measurement, and C‑end web‑native integration, and details the systematic optimization strategies and measurable results that significantly improved user experience and business metrics.

KuaishouWebfrontend
0 likes · 23 min read
Performance Governance and Optimization of Kuaishou Commercial Frontend Pages
iQIYI Technical Product Team
iQIYI Technical Product Team
Nov 28, 2024 · R&D Management

Advanced Exploration and Practice of Value Delivery in Project Management

At the 12th QECon conference, iQIYI presented a systematic value‑delivery framework that tackles misaligned goals, planning‑execution gaps, and metric deficiencies by using a two‑scenario model for iterative and special projects—defining SMART goals, tight scope control, continuous monitoring, and AI‑driven automation—to accelerate rollout, quantify impact, and guide future integrated, intelligent delivery.

AIR&D managementValue Delivery
0 likes · 15 min read
Advanced Exploration and Practice of Value Delivery in Project Management
Alibaba Cloud Native
Alibaba Cloud Native
Nov 27, 2024 · Cloud Native

How to Add Zero‑Code Observability to Golang Apps with Alibaba’s OpenTelemetry Agent

This guide explains how to use Alibaba’s open‑source Golang Agent to automatically instrument Go applications for tracing, metrics, and log correlation without modifying source code, covering binary download, build replacement for go build, endpoint configuration, and step‑by‑step examples with Docker‑based dependencies and Jaeger visualization.

AgentGolangOpenTelemetry
0 likes · 11 min read
How to Add Zero‑Code Observability to Golang Apps with Alibaba’s OpenTelemetry Agent
DevOps
DevOps
Nov 17, 2024 · R&D Management

Improving R&D Efficiency: Lessons from a Leading Brokerage’s DevOps Journey

The article distills a senior technology leader’s reflections on boosting R&D efficiency, covering metric design versus KPI, Conway’s law, digital‑transformation prerequisites, platform‑engineering strategies, and practical tools that together reshape collaboration, culture, and value delivery in complex financial software projects.

CollaborationDevOpsR&D efficiency
0 likes · 12 min read
Improving R&D Efficiency: Lessons from a Leading Brokerage’s DevOps Journey
Linux Kernel Journey
Linux Kernel Journey
Nov 14, 2024 · Artificial Intelligence

Deep Dive: How DeepFlow Collects Business Metrics for Large‑Model Services

This article explains how China Mobile built a hybrid‑cloud production environment for its customer‑service LLM, using eBPF and WebAssembly plugins from DeepFlow to achieve zero‑intrusion observability, automatically capture full‑stack topology, application/network metrics, and key LLM business indicators such as TTFT, TPOT, and token throughput.

DeepFlowGrafanaLLM
0 likes · 19 min read
Deep Dive: How DeepFlow Collects Business Metrics for Large‑Model Services
Baobao Algorithm Notes
Baobao Algorithm Notes
Nov 4, 2024 · Artificial Intelligence

Uncovering 16 Limits of AI Search Engines and 16 Design Recommendations

A user study with 21 participants reveals sixteen critical limitations of generative AI search engines, maps them to eight quantitative metrics, proposes sixteen design recommendations, and evaluates You.com, Perplexity and BingChat against this framework to highlight current performance gaps.

AI searchGenerative SearchLLM
0 likes · 12 min read
Uncovering 16 Limits of AI Search Engines and 16 Design Recommendations
Linux Ops Smart Journey
Linux Ops Smart Journey
Nov 3, 2024 · Cloud Native

Build a Robust Kubernetes Monitoring System with Prometheus and HAProxy

This guide walks you through setting up a comprehensive Kubernetes monitoring solution—covering component metrics collection, configuring HAProxy for network access, exposing metrics from kube-proxy, Calico, and kube-state-metrics, and integrating everything into Prometheus for reliable cluster health visibility.

CalicoHAProxyKubernetes
0 likes · 12 min read
Build a Robust Kubernetes Monitoring System with Prometheus and HAProxy
DevOps
DevOps
Oct 21, 2024 · R&D Management

Constructing an Effective R&D Efficiency Measurement System: Strategies, Models, and Implementation

This article explores how to build a comprehensive R&D efficiency measurement framework by outlining goals, principles, metric dimensions, the GQM method, the E³CI model, implementation steps, data collection, continuous improvement mechanisms, cultural integration, tool support, and common pitfalls, aiming to enhance software delivery speed and quality.

Continuous ImprovementDevOpsE3CI
0 likes · 13 min read
Constructing an Effective R&D Efficiency Measurement System: Strategies, Models, and Implementation
Wukong Talks Architecture
Wukong Talks Architecture
Oct 17, 2024 · Operations

A Retrospective on DevOps System Design and Platform Engineering (2008‑2022)

From 2008 onward, the author chronicles the development of multiple DevOps systems, examining their origins, design choices, challenges, and evolution—including CI tools like CruiseControl, Hudson, Jenkins, custom plugins, metrics, platform engineering, and the impact of AI—offering insights for modern continuous integration and delivery practices.

AIAutomationDevOps
0 likes · 34 min read
A Retrospective on DevOps System Design and Platform Engineering (2008‑2022)
Software Development Quality
Software Development Quality
Oct 12, 2024 · R&D Management

Essential Agile Metrics for R&D Teams: Boost Delivery & Quality

This article presents a comprehensive set of agile and R&D process metrics—including delivery cycle, team productivity, sprint throughput, integration frequency, technical debt, and test coverage—detailing their definitions, calculation formulas, recommended improvement actions, and normal versus warning ranges to help engineering teams monitor and enhance performance.

R&Dmetrics
0 likes · 14 min read
Essential Agile Metrics for R&D Teams: Boost Delivery & Quality
Open Source Linux
Open Source Linux
Oct 11, 2024 · Operations

Essential IT Operations Metrics: Definitions, Formulas, and Benchmarks

This article explains why operations metrics are vital for businesses, describes how tracking availability, failure rate, MTTR, MTBF, response time, throughput, error rate, capacity utilization, latency, data integrity, backup success, recovery time, security patch time, server and network utilization can improve reliability, reduce costs, and boost competitiveness.

AvailabilityIT OperationsMTBF
0 likes · 7 min read
Essential IT Operations Metrics: Definitions, Formulas, and Benchmarks
ITPUB
ITPUB
Oct 6, 2024 · Operations

Mastering Prometheus Metrics: Practical Best‑Practice Guide for Effective Monitoring

This guide explains how to design and implement Prometheus metrics for application monitoring, covering the selection of monitoring targets, the four golden metrics, system‑specific metric groups, vector and label choices, naming conventions, histogram bucket design, and useful Grafana visualization tips.

GrafanaOperationsPrometheus
0 likes · 9 min read
Mastering Prometheus Metrics: Practical Best‑Practice Guide for Effective Monitoring
Sohu Tech Products
Sohu Tech Products
Sep 25, 2024 · Cloud Native

Observability Concepts and OpenTelemetry Architecture Overview

Observability turns a black‑box application into a system by gathering logs, metrics, and traces, using alerts to spot anomalies, then linking trace IDs to logs; OpenTelemetry standardizes this with instrumented client agents, a Collector (receivers, processors, exporters), and backend storage, while Java agents, span propagation, exemplars, eBPF, and bundles like SigNoz or OpenObserve let teams choose between a custom OTel stack or a solution.

Cloud NativeObservabilityOpenTelemetry
0 likes · 11 min read
Observability Concepts and OpenTelemetry Architecture Overview
Airbnb Technology Team
Airbnb Technology Team
Sep 19, 2024 · Mobile Development

How Airbnb Instruments Android Apps to Capture User‑Centric Performance Metrics

Airbnb’s Android Page Performance Score (PPS) framework instruments fragments to collect user‑centric metrics such as TTFL, TTIL, MTH, ALT and RCLT, using a standardized logging config, LoadableView interface, and visibility algorithms, enabling detailed performance analysis and automated alerts for mobile teams.

AndroidInstrumentationMobile Development
0 likes · 10 min read
How Airbnb Instruments Android Apps to Capture User‑Centric Performance Metrics