Tagged articles
578 articles
Page 2 of 6
Airbnb Technology Team
Airbnb Technology Team
Sep 19, 2024 · Mobile Development

How Airbnb Instruments Android Apps to Capture User‑Centric Performance Metrics

Airbnb’s Android Page Performance Score (PPS) framework instruments fragments to collect user‑centric metrics such as TTFL, TTIL, MTH, ALT and RCLT, using a standardized logging config, LoadableView interface, and visibility algorithms, enabling detailed performance analysis and automated alerts for mobile teams.

AndroidInstrumentationMetrics
0 likes · 10 min read
How Airbnb Instruments Android Apps to Capture User‑Centric Performance Metrics
Architect
Architect
Sep 13, 2024 · Operations

Introducing MyPerf4J: A High‑Performance Java Monitoring and Statistics Tool

The article presents MyPerf4J, a Java‑agent based, low‑overhead performance monitoring library that provides real‑time metrics such as method latency, QPS, memory usage, GC statistics, and class loading, along with quick‑start instructions, configuration details, and open‑source links for Java backend services.

BackendJavaJavaAgent
0 likes · 7 min read
Introducing MyPerf4J: A High‑Performance Java Monitoring and Statistics Tool
Architect
Architect
Sep 12, 2024 · Operations

How Bilibili Scaled Its Monitoring: From Prometheus OOMs to VictoriaMetrics & Flink Pre‑Aggregation

The article details Bilibili's evolution of its monitoring platform, describing the stability and performance challenges of a Prometheus‑Thanos stack, the redesign using VictoriaMetrics, collection‑storage separation, unit‑level disaster recovery, query‑tree auto‑replacement, Flink‑based pre‑aggregation, Grafana upgrades, and future roadmap for observability.

Cloud NativeFlinkMetrics
0 likes · 30 min read
How Bilibili Scaled Its Monitoring: From Prometheus OOMs to VictoriaMetrics & Flink Pre‑Aggregation
Airbnb Technology Team
Airbnb Technology Team
Sep 9, 2024 · Mobile Development

Implementing Page Performance Score on iOS: PPSStateMachine, Metrics, and Instrumentation

Airbnb’s iOS Page Performance Score implementation introduces a PPSStateMachine that ties a UIViewController’s lifecycle to metric collection—tracking first layout, initial load, scroll thread hangs, additional and rich content load times—using nanosecond timestamps, state‑machine protocols, and view‑responder discovery to emit standardized performance logs.

InstrumentationMetricsMobile Development
0 likes · 9 min read
Implementing Page Performance Score on iOS: PPSStateMachine, Metrics, and Instrumentation
Sohu Tech Products
Sohu Tech Products
Sep 5, 2024 · Backend Development

Instrumentation of gRPC in OpenTelemetry: Adding Request Size Metrics via Byte‑Buddy

The new OpenTelemetry Java instrumentation adds client and server request‑size metrics to gRPC by injecting a tracing interceptor via Byte‑Buddy bytecode enhancement, extracting payload sizes from protobuf messages, recording them with custom attributes and histograms, and applying analogous handler‑based logic for Go.

ByteBuddyInstrumentationJava
0 likes · 12 min read
Instrumentation of gRPC in OpenTelemetry: Adding Request Size Metrics via Byte‑Buddy
Bilibili Tech
Bilibili Tech
Aug 9, 2024 · Operations

Design and Optimization of Monitoring 2.0 Architecture with VictoriaMetrics and Flink

The new Monitoring 2.0 architecture separates collection, compute and storage, adopts VictoriaMetrics for compact time‑series storage and a zone‑based scheduler, introduces push‑based ingestion, uses Flink for real‑time pre‑aggregation and automatic PromQL rewrite, delivering ten‑fold query speedups, sub‑300 ms p90 latency, and dramatically higher write and query throughput.

FlinkMetricsObservability
0 likes · 29 min read
Design and Optimization of Monitoring 2.0 Architecture with VictoriaMetrics and Flink
phodal
phodal
Aug 8, 2024 · Artificial Intelligence

How to Design an AI‑Assisted Software Engineering Framework for Any Team

This article provides a comprehensive, step‑by‑step guide to designing, prototyping, and continuously improving an AI‑assisted software engineering (AI4SE) framework, covering goal definition, pain‑point identification, technology selection, cross‑disciplinary team building, metric evaluation, and real‑world examples for teams of all sizes.

AI integrationAI4SEMetrics
0 likes · 19 min read
How to Design an AI‑Assisted Software Engineering Framework for Any Team
Liangxu Linux
Liangxu Linux
Aug 1, 2024 · Operations

Essential Operations Metrics Every IT Team Should Track

This guide outlines key operational metrics—availability, failure rate, MTTR, MTBF, response time, throughput, error rate, capacity utilization, latency, data integrity, and more—explaining their calculations, typical benchmark values, and practical application areas to help organizations monitor and improve IT performance.

AvailabilityMTTRMetrics
0 likes · 6 min read
Essential Operations Metrics Every IT Team Should Track
Spring Full-Stack Practical Cases
Spring Full-Stack Practical Cases
Jul 25, 2024 · Backend Development

Master Spring Cloud Gateway: Routing, Metrics, Filters & Debugging Tips

This guide walks through Spring Cloud Gateway features—including marking exchanges as routed, enabling route metrics, configuring metadata, accessing Reactor Netty logs, troubleshooting with debug logging, disabling automatic route refresh, ordering global filters, using Actuator sub‑paths, and rewriting request parameters—complete with code snippets and configuration examples.

DebuggingFiltersMetrics
0 likes · 8 min read
Master Spring Cloud Gateway: Routing, Metrics, Filters & Debugging Tips
Alibaba Cloud Native
Alibaba Cloud Native
Jul 24, 2024 · Cloud Native

How to Observe and Optimize LLM Applications with Alibaba Cloud ARMS

This article explains the challenges of deploying large language model (LLM) applications, outlines the need for end‑to‑end observability, and details Alibaba Cloud ARMS' LLM‑specific tracing, metrics, and Python agent solutions for monitoring, debugging, and performance optimization.

AILLMMetrics
0 likes · 20 min read
How to Observe and Optimize LLM Applications with Alibaba Cloud ARMS
Continuous Delivery 2.0
Continuous Delivery 2.0
Jul 24, 2024 · R&D Management

Understanding DORA Metrics and Their Business Implications

The article explains the four core DORA metrics—deployment frequency, change failure rate, mean time to recovery, and lead time for changes—highlighting their focus on speed and stability, their limited business relevance, and proposes using leading indicators and broader measures to align engineering performance with business outcomes.

DevOpsDoRAEngineering management
0 likes · 5 min read
Understanding DORA Metrics and Their Business Implications
Airbnb Technology Team
Airbnb Technology Team
Jul 16, 2024 · Frontend Development

Airbnb Web Performance Metrics and Their Measurement

Airbnb tracks five real‑user performance metrics—Time To First Contentful Paint, Time To First Meaningful Paint, First Input Delay, Total Blocking Time, and Cumulative Layout Shift—using a mix of browser APIs and custom polyfills, combines them into a weighted Page Performance Score, and leverages that score to guide trade‑off decisions and detect regressions.

AirbnbMetricsWeb Performance
0 likes · 11 min read
Airbnb Web Performance Metrics and Their Measurement
Airbnb Technology Team
Airbnb Technology Team
Jul 2, 2024 · Frontend Development

Airbnb Page Performance Score (PPS): Multi‑Platform Metrics, Weighting, and Evolution

Airbnb created the Page Performance Score (PPS), a unified 0‑100 metric that aggregates platform‑specific initial‑load and post‑load user‑centric measurements for Web, iOS, and Android, using weighted curves to enable cross‑page, cross‑team comparisons, track organizational weighted averages, and evolve with new metrics while preserving a stable scale.

AirbnbMetricsPage Performance Score
0 likes · 10 min read
Airbnb Page Performance Score (PPS): Multi‑Platform Metrics, Weighting, and Evolution
Efficient Ops
Efficient Ops
Jul 1, 2024 · Cloud Native

How to Monitor Business Metrics with Prometheus in Kubernetes

This article explains the concept of observability, details Prometheus metric definitions and types, and provides Go code examples for exposing, defining, generating, and scraping business‑level metrics in a Kubernetes‑based cloud‑native environment.

GoKubernetesMetrics
0 likes · 11 min read
How to Monitor Business Metrics with Prometheus in Kubernetes
DataFunTalk
DataFunTalk
Jul 1, 2024 · Big Data

JD Retail Metric Middle Platform: Architecture, Semantic Layer, Production, Governance and Practical Cases

This article presents JD Retail’s metric middle‑platform practice, describing the background problems of legacy metric systems, the four‑step solution framework, the overall architecture, semantic‑layer construction with the 4W1H method, configurable metric production, acceleration techniques, governance mechanisms, achieved results and future plans.

Metricsbig-datadata-platform
0 likes · 19 min read
JD Retail Metric Middle Platform: Architecture, Semantic Layer, Production, Governance and Practical Cases
DevOps Operations Practice
DevOps Operations Practice
Jun 17, 2024 · Operations

Key DevOps Metrics: Deployment Frequency, Lead Time, Change Failure Rate, MTTR, and Customer Satisfaction

This article explains essential DevOps metrics—including deployment frequency, lead time for changes, change failure rate, mean time to recovery, and customer satisfaction—detailing why they matter, how to measure them, and practical practices to improve each metric for more efficient and reliable software delivery.

Change Failure RateDevOpsLead Time
0 likes · 9 min read
Key DevOps Metrics: Deployment Frequency, Lead Time, Change Failure Rate, MTTR, and Customer Satisfaction
Alibaba Cloud Observability
Alibaba Cloud Observability
Jun 13, 2024 · Cloud Native

Kickstart Your Observability Journey with Alibaba Cloud Monitoring

This guide introduces new Alibaba Cloud users to the fundamentals of cloud observability, explaining the metric‑trace‑log stack, the layered monitoring pyramid, and step‑by‑step how to set up out‑of‑the‑box resource monitoring, dashboards, alerts, and advanced integration options.

Alibaba CloudCloud NativeMetrics
0 likes · 7 min read
Kickstart Your Observability Journey with Alibaba Cloud Monitoring
Baidu Geek Talk
Baidu Geek Talk
Jun 12, 2024 · Big Data

Event Tracking Governance and Logging Platform Solutions

The article explains event tracking, its data‑quality challenges, and presents a logging platform that enforces quality standards, an end‑to‑end online workflow, and specialized design, testing, and validation tools—including extended field types—to govern, monitor, and improve tracking point compliance across applications.

Data QualityMetricsevent tracking
0 likes · 13 min read
Event Tracking Governance and Logging Platform Solutions
Efficient Ops
Efficient Ops
Jun 4, 2024 · Operations

How Huya Unified Its Monitoring Platform with OpenTelemetry for Zero‑Cost Integration

This article details Huya's transition from fragmented, non‑standard monitoring solutions to a unified OpenTelemetry‑based platform, covering project background, pain points, design decisions, SDK architecture, data pipeline, storage, alerting, root‑cause analysis, and future plans, highlighting the benefits of standardization and zero‑cost service integration.

HuyaMetricsObservability
0 likes · 13 min read
How Huya Unified Its Monitoring Platform with OpenTelemetry for Zero‑Cost Integration
Qunhe Technology Quality Tech
Qunhe Technology Quality Tech
Jun 3, 2024 · Frontend Development

How to Diagnose and Optimize Frontend Performance in 2D Design Tools

This article outlines the challenges of front‑end performance troubleshooting for a 2D design tool, proposes systematic approaches for identifying issues, describes monitoring metrics such as load time and frame rate, and presents real‑world case studies demonstrating effective optimization and baseline management.

MetricsPerformance MonitoringWeb Optimization
0 likes · 11 min read
How to Diagnose and Optimize Frontend Performance in 2D Design Tools
DataFunSummit
DataFunSummit
May 22, 2024 · Operations

Building an Observability System: Practices and Solutions from Yanhuang Data

This article explains how to build a robust observability system for cloud‑native microservice architectures, detailing the three core signals—metrics, traces, and logs—common challenges such as complexity and data silos, and presents Yanhuang Data’s integrated platform with unified data collection, storage, analysis, and visualization solutions.

KubernetesMetricsObservability
0 likes · 23 min read
Building an Observability System: Practices and Solutions from Yanhuang Data
Tencent Cloud Developer
Tencent Cloud Developer
May 21, 2024 · Operations

Why Prometheus Metrics Aren’t 100% Accurate – The Hidden Trade‑offs Explained

The article analyzes why Prometheus sometimes returns inaccurate metric values, revealing the design trade‑offs that favor efficiency over precision, and walks through common pitfalls in rate/increase calculations, histogram P99 estimation, and practical recommendations for choosing scrape intervals and query windows.

HistogramMetricsObservability
0 likes · 20 min read
Why Prometheus Metrics Aren’t 100% Accurate – The Hidden Trade‑offs Explained
Model Perspective
Model Perspective
May 13, 2024 · Fundamentals

How to Identify and Quantify Core Variables for Better Decision‑Making

The article explains why pinpointing core variables is crucial, outlines domain‑knowledge and technical methods such as sensitivity analysis and data mining to discover them, and describes practical ways to turn those variables into quantitative indicators like scoring systems, composite indices, and real‑world examples.

Metricscore variablesdata mining
0 likes · 10 min read
How to Identify and Quantify Core Variables for Better Decision‑Making
iKang Technology Team
iKang Technology Team
May 11, 2024 · Operations

How to Conduct Full‑Stack Load Testing for Reliable Production Systems

Full‑link load testing evaluates the performance of an entire application stack—from user interface to databases—by simulating real‑world traffic, isolating test data, verifying security and SLA thresholds, measuring key metrics such as throughput and response time, and comparing tools like tcpcopy and goreplay to ensure system stability and scalability.

Load TestingMetricsTool comparison
0 likes · 7 min read
How to Conduct Full‑Stack Load Testing for Reliable Production Systems
Data Thinking Notes
Data Thinking Notes
May 9, 2024 · Big Data

How to Build an Effective Indicator System: From Concept to Productization

This article explores the complete lifecycle of an indicator system—from defining metrics and addressing common ambiguities, through designing concept consensus, semantic layers, mechanisms, and governance, to productizing platforms, optimizing development, and envisioning future AI‑driven enhancements.

Big DataData PlatformIndicator System
0 likes · 22 min read
How to Build an Effective Indicator System: From Concept to Productization
Sohu Tech Products
Sohu Tech Products
Apr 17, 2024 · Operations

Developing an OpenTelemetry Extension for Pulsar Java Client Metrics

The article walks through building a custom OpenTelemetry Java‑agent extension for Pulsar client metrics—migrating from SkyWalking, setting up a Gradle project, using ByteBuddy to instrument methods with advice, registering gauge metrics, packaging the jar, handling common class‑loader pitfalls, and configuring deployment via the OpenTelemetry operator.

ExtensionInstrumentationJava
0 likes · 14 min read
Developing an OpenTelemetry Extension for Pulsar Java Client Metrics
Tencent Cloud Developer
Tencent Cloud Developer
Apr 2, 2024 · Backend Development

tRPC Scaffolding Tooling and Observability Best Practices for Tencent Docs Backend

By introducing the unified tRPC scaffolding tool trpcx and embedding OpenTelemetry‑generated observability configurations, the Tencent Docs backend team streamlined service creation, standardized directory structures, migrated metrics and logs to ClickHouse for cost‑effective performance, and established best‑practice workflows that dramatically improve development speed and fault‑diagnosis efficiency.

Backend DevelopmentClickHouseMetrics
0 likes · 18 min read
tRPC Scaffolding Tooling and Observability Best Practices for Tencent Docs Backend
DeWu Technology
DeWu Technology
Apr 1, 2024 · R&D Management

Scaling Agile at DeWu: The Type‑P Framework and PMO Evolution

The article details how DeWu’s technology organization scaled from a few hundred to over a thousand engineers by adopting the custom Type‑P framework—emphasizing value‑orientation, small‑step rapid iteration, bi‑weekly sprints, unified domain‑level agile processes, metric‑driven governance, and an evolving PMO that shifted from throughput‑first to value‑first objectives.

Agile ScalingMetricsPMO
0 likes · 16 min read
Scaling Agile at DeWu: The Type‑P Framework and PMO Evolution
DataFunSummit
DataFunSummit
Apr 1, 2024 · Big Data

DataOps at ByteDance: Challenges, Implementation, and Future Outlook

This article examines ByteDance's DataOps journey, detailing the data‑engineering challenges faced, the concrete solutions and productization through the DataLeap platform, the metrics and best‑practice framework adopted, and the future directions involving AI‑assisted development and open‑source collaboration.

Big DataData PlatformMetrics
0 likes · 20 min read
DataOps at ByteDance: Challenges, Implementation, and Future Outlook
DevOps Operations Practice
DevOps Operations Practice
Mar 25, 2024 · Operations

How to Monitor MySQL with Prometheus and Grafana

This tutorial explains how to install the MySQL Exporter, configure Prometheus to scrape MySQL metrics, set up Grafana dashboards for visualization, and define alerting rules for common MySQL performance indicators, providing a complete end‑to‑end monitoring solution.

AlertingExporterGrafana
0 likes · 5 min read
How to Monitor MySQL with Prometheus and Grafana
Architect's Alchemy Furnace
Architect's Alchemy Furnace
Mar 16, 2024 · R&D Management

How to Boost R&D Efficiency: Strategies, Pitfalls, and a Golden Triangle Framework

This article explores the concept of R&D efficiency, outlines its goals, debunks common misconceptions, and presents a practical framework—including practice, platform, and measurement components—supported by visual models to help technology organizations improve delivery speed, quality, reliability, and sustainability.

DevOpsMetricsR&D efficiency
0 likes · 25 min read
How to Boost R&D Efficiency: Strategies, Pitfalls, and a Golden Triangle Framework
58UXD
58UXD
Mar 15, 2024 · Product Management

When Data Beats UX: Balancing B2B Design Metrics and User Experience

This article examines a real‑world case where a B2B product team prioritized video‑quality metrics over user experience, discusses the resulting trade‑offs, and offers a step‑by‑step framework for aligning business goals with optimal UX design.

B2BData-drivenMetrics
0 likes · 8 min read
When Data Beats UX: Balancing B2B Design Metrics and User Experience
Didi Tech
Didi Tech
Mar 12, 2024 · Big Data

Understanding Flink Metrics System: Core Concepts, Elastic Design, and Practical Usage

The article explains Flink’s metrics architecture—core concepts, reporter interfaces, built‑in and custom metric types, elastic plugin design, and scheduled reporting—illustrated with a consumption‑latency example, and shows how Didi uses these metrics for real‑time UI curves, alerts, and intelligent task diagnosis.

Big DataFlinkMetrics
0 likes · 11 min read
Understanding Flink Metrics System: Core Concepts, Elastic Design, and Practical Usage
Efficient Ops
Efficient Ops
Mar 3, 2024 · Operations

Mastering Prometheus: From Metrics Collection to Alerting and Visualization

This comprehensive guide explains Prometheus' architecture, metric collection models, storage format, query language (PromQL), alerting workflow, configuration reload methods, metric types, custom exporters, and how to visualise data with Grafana, providing a complete end‑to‑end monitoring solution.

GrafanaMetricsObservability
0 likes · 21 min read
Mastering Prometheus: From Metrics Collection to Alerting and Visualization
Efficient Ops
Efficient Ops
Feb 19, 2024 · Operations

Mastering Prometheus: Practical Tips for Effective Application Monitoring

This article explains how to design and implement Prometheus metrics for application monitoring, covering the selection of monitoring targets, golden metrics, label conventions, naming rules, histogram bucket choices, and Grafana visualization tricks to help engineers build reliable observability pipelines.

GrafanaMetricsObservability
0 likes · 10 min read
Mastering Prometheus: Practical Tips for Effective Application Monitoring
DataFunTalk
DataFunTalk
Feb 8, 2024 · Big Data

Design and Practice of Ant Group's Metric System

This talk by Ant Group’s senior technical expert Wang Gaohang details the definition, design, mechanism, productization, and future outlook of the company’s metric system, covering concept consensus, semantic layers, workflow, AI assistance, performance optimization, and practical case studies.

AIBig DataData Platform
0 likes · 28 min read
Design and Practice of Ant Group's Metric System
DaTaobao Tech
DaTaobao Tech
Jan 29, 2024 · Cloud Native

Observability: Logging, Metrics, and Tracing in Distributed Systems

Observability in distributed systems combines event logging, aggregated metrics, and request tracing—each offering distinct trade‑offs in detail, storage, and overhead—and while the ELK stack dominates log and metric handling, tracing solutions such as EagleEye and SkyWalking differ by protocol and language, prompting many teams to adopt unified, cloud‑native platforms like Alibaba Cloud’s Log Service for lower cost, real‑time analysis and simplified management.

ELKMetricsObservability
0 likes · 32 min read
Observability: Logging, Metrics, and Tracing in Distributed Systems
MaGe Linux Operations
MaGe Linux Operations
Jan 25, 2024 · Operations

Mastering Monitoring: From Concepts to Prometheus in Operations

This article explains monitoring fundamentals, distinguishes black‑box and white‑box approaches, outlines key metrics and their aggregation, and provides a comprehensive guide to Prometheus architecture, data model, query language, and practical examples for CPU, memory, and disk usage monitoring.

MetricsObservabilityPrometheus
0 likes · 18 min read
Mastering Monitoring: From Concepts to Prometheus in Operations
DataFunTalk
DataFunTalk
Jan 21, 2024 · Cloud Native

Building a System Observability Framework with YHP: Practices, Challenges, and Integrated Solutions

This article explains how YHP enables cloud‑native systems to achieve comprehensive observability by defining the three core signals—metrics, traces, and logs—addressing common enterprise pain points, and presenting an integrated platform that unifies data collection, storage, analysis, and visualization for efficient fault diagnosis and performance monitoring.

Cloud NativeData PlatformMetrics
0 likes · 22 min read
Building a System Observability Framework with YHP: Practices, Challenges, and Integrated Solutions
dbaplus Community
dbaplus Community
Jan 2, 2024 · Operations

How Xiaohongshu Scaled Its Metrics System Tenfold with Cloud‑Native Architecture

Facing exploding metric volumes, high resource consumption, and fragile operations, Xiaohongshu's observability team completely rebuilt its metrics pipeline using Victoriametrics, achieving ten‑fold performance gains, minute‑level scaling, high‑availability, cost reduction, and robust multi‑cloud active‑active deployment while preserving data safety and query speed.

MetricsObservabilityPrometheus
0 likes · 34 min read
How Xiaohongshu Scaled Its Metrics System Tenfold with Cloud‑Native Architecture
Weimob Technology Center
Weimob Technology Center
Dec 26, 2023 · Operations

Rebuilding Our APM: Scalable Metrics & Alerts with VictoriaMetrics & VMAlert

This article details the complete redesign of our internal APM system, covering the motivations, architecture choices, metric collection pipeline, integration of VictoriaMetrics and VMAlert, metric and alert design principles, implementation steps, visualizations, performance gains, and future plans for scaling and SaaS‑ification.

APMAlertingMetrics
0 likes · 17 min read
Rebuilding Our APM: Scalable Metrics & Alerts with VictoriaMetrics & VMAlert
Architecture and Beyond
Architecture and Beyond
Dec 23, 2023 · R&D Management

What Is R&D Efficiency and How to Systematically Improve It

The article defines R&D efficiency as the effective use of human and time resources to deliver high‑quality software quickly, and outlines a systematic, multi‑dimensional approach—including culture, structure, architecture, process design, engineering systems, and measurement—to enhance it.

Continuous DeliveryMetricsR&D efficiency
0 likes · 11 min read
What Is R&D Efficiency and How to Systematically Improve It
Data Thinking Notes
Data Thinking Notes
Dec 21, 2023 · Product Management

Mastering Growth Metrics: Methodologies, Frameworks, and Real‑World Cases

This article explains Douyin’s growth‑analysis methodology, how to construct a comprehensive growth‑metric system with North‑Star indicators and hierarchical metric layers, the end‑to‑end analysis loop, new scenario‑driven metric applications, and a detailed case study on improving video‑submission rates.

AB testingGrowthMetrics
0 likes · 24 min read
Mastering Growth Metrics: Methodologies, Frameworks, and Real‑World Cases
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Dec 14, 2023 · Cloud Native

Evolution of Xiaohongshu Metrics System: Cloud‑Native Observability, High Availability, and Performance Optimizations

Xiaohongshu’s observability team rebuilt its Prometheus‑based metrics platform using vmagent, dual‑active HA clusters, query push‑down, high‑cardinality governance and multi‑cloud active‑active design, delivering ten‑fold collection speed, up to 70× query capacity, massive CPU‑memory‑storage savings and fully automated scaling.

MetricsTime SeriesVictoriaMetrics
0 likes · 35 min read
Evolution of Xiaohongshu Metrics System: Cloud‑Native Observability, High Availability, and Performance Optimizations
DataFunTalk
DataFunTalk
Dec 14, 2023 · Fundamentals

Evaluating Long-Term vs Short-Term Effects in A/B Experiments

While A/B testing is widely used for data-driven decisions, short-term experimental results often diverge from long-term impacts, leading to misguided strategies; this article examines why such inconsistencies arise and reviews major methods—including extended experiments, holdout groups, post‑analysis, CCD, and surrogate‑metric modeling—to reliably estimate long‑term effects.

A/B testingData ScienceLong-term impact
0 likes · 13 min read
Evaluating Long-Term vs Short-Term Effects in A/B Experiments
FunTester
FunTester
Nov 28, 2023 · Operations

How to Adopt a DevOps Culture: Custom Strategies, CI/CD, Automation & Metrics

This article outlines the essential steps for embracing DevOps culture, emphasizing tailored strategies, deep understanding of CI/CD, clear role assignments, extensive automation, key performance metrics, and the critical role of quality assurance to achieve faster, reliable software delivery.

AutomationCultureDevOps
0 likes · 9 min read
How to Adopt a DevOps Culture: Custom Strategies, CI/CD, Automation & Metrics
Data Thinking Notes
Data Thinking Notes
Nov 23, 2023 · Big Data

How Data-Driven Metrics Transform Product Analytics and Decision-Making

This article explains how to build a data‑driven metric system—from defining end‑to‑start metrics and combining business and data drivers, to applying statistical analysis, machine‑learning, causal inference, and practical case studies for alerting, diagnosing, and strategizing product performance.

Data-drivenMetricscausal inference
0 likes · 22 min read
How Data-Driven Metrics Transform Product Analytics and Decision-Making
Python Programming Learning Circle
Python Programming Learning Circle
Nov 22, 2023 · Big Data

E‑commerce User Behavior Analysis and KPI Modeling with Python and SQL

This study analyzes JD e‑commerce operational data from February to April 2018, employing Python and SQL to compute key metrics such as PV, UV, conversion rates, attrition, purchase frequency, time‑based behavior, funnel analysis, retention, product sales, and RFM segmentation, and provides actionable recommendations for improving user engagement and sales performance.

MetricsRFMSQL
0 likes · 30 min read
E‑commerce User Behavior Analysis and KPI Modeling with Python and SQL
ZhongAn Tech Team
ZhongAn Tech Team
Nov 17, 2023 · Frontend Development

Understanding Interaction to Next Paint (INP) Metric and How to Measure It

This article explains the INP (Interaction to Next Paint) performance metric, its calculation method, satisfaction thresholds, differences from FID, and provides practical guidance on measuring INP using Chrome's CrUX, PageSpeed Insights, the web‑vitals library, Chrome extensions, and custom console scripts.

ChromeINPMetrics
0 likes · 11 min read
Understanding Interaction to Next Paint (INP) Metric and How to Measure It
NetEase Cloud Music Tech Team
NetEase Cloud Music Tech Team
Nov 7, 2023 · Operations

How NetEase Cloud Music Built Pylon APM: A Deep Dive into Tracing, Metrics, and Automated Diagnosis

This article details the design and implementation of the Pylon APM monitoring platform for NetEase Cloud Music, covering background challenges, the choice of Pinpoint, extensions to trace models, tail‑based exception sampling, Prometheus integration, automated JStack collection, and the resulting APM product features.

APMBackendJava Agent
0 likes · 12 min read
How NetEase Cloud Music Built Pylon APM: A Deep Dive into Tracing, Metrics, and Automated Diagnosis
Efficient Ops
Efficient Ops
Nov 1, 2023 · Operations

How Dongguan Securities Achieved Leading DevOps Maturity with Continuous Delivery Level 3

Dongguan Securities' "Zhangzhengbao" service platform passed the China Academy of Information and Communications Technology's DevOps Continuous Delivery Level 3 assessment, demonstrating how standardized processes, automation, and metric‑driven improvements can boost efficiency, reduce costs, and enhance competitiveness in the financial sector.

Agile TransformationContinuous DeliveryDevOps
0 likes · 12 min read
How Dongguan Securities Achieved Leading DevOps Maturity with Continuous Delivery Level 3
Architect
Architect
Oct 25, 2023 · Operations

The Importance of Logging and Distributed Log Operations in Modern Architecture

This article explores why logs are essential in software development, outlines when to record them, discusses the value of logging in large-scale distributed systems, and examines the capabilities required of log‑operation tools such as APM, metrics, tracing, ELK, Prometheus, and custom batch querying solutions.

APMDistributed SystemsELK
0 likes · 21 min read
The Importance of Logging and Distributed Log Operations in Modern Architecture
Efficient Ops
Efficient Ops
Oct 24, 2023 · Operations

How to Monitor Business Metrics with Prometheus in Kubernetes

This article explains how to use Prometheus to monitor business‑level metrics in a Kubernetes environment, covering observability fundamentals, metric definitions, metric types, exposing metrics via a /metrics endpoint, and practical Go code examples for defining, recording, and scraping custom metrics.

GoKubernetesMetrics
0 likes · 11 min read
How to Monitor Business Metrics with Prometheus in Kubernetes
DataFunSummit
DataFunSummit
Oct 22, 2023 · Big Data

How Kuaishou E‑commerce Leverages OLAP and a Unified Data Architecture to Solve Business Data Challenges

This article explains how Kuaishou's e‑commerce team built a unified OLAP‑based data platform—covering data ingestion, consistent dimensional and fact layers, metric management, and real‑time services—to address rapid growth, metric inconsistency, and operational inefficiencies across multiple business scenarios.

Big DataData ArchitectureData Warehouse
0 likes · 20 min read
How Kuaishou E‑commerce Leverages OLAP and a Unified Data Architecture to Solve Business Data Challenges
HomeTech
HomeTech
Oct 18, 2023 · Frontend Development

Web Page Performance Metrics and Optimization Practices

This article explains why web performance matters, introduces key user‑centric metrics such as First Contentful Paint, Largest Contentful Paint and Cumulative Layout Shift, describes how to measure them with tools like Chrome DevTools, Lighthouse and ftwo, and provides practical optimization techniques including gzip, HTTP/2, CDN, image handling, code splitting and Vue router lazy‑loading.

CDNMetricsVue
0 likes · 9 min read
Web Page Performance Metrics and Optimization Practices
DevOps
DevOps
Oct 16, 2023 · R&D Management

Integrating OKR with Agile Practices for Effective Value Delivery

This article explains how to combine OKR and agile activities to set measurable business goals, avoid common pitfalls, and create a continuous loop of planning, execution, review, and optimization that aligns strategic objectives with day‑to‑day value delivery in R&D projects.

Continuous ImprovementMetricsOKR
0 likes · 19 min read
Integrating OKR with Agile Practices for Effective Value Delivery
Architects Research Society
Architects Research Society
Oct 2, 2023 · Product Management

Value Realization in SaaS: A Customer Success Mantra

This article explains how SaaS companies can quantify and deliver customer value by defining, measuring, and optimizing key success metrics such as ROI, adoption rates, time‑to‑value, and lifecycle duration, while emphasizing continuous communication and expectation management throughout the customer journey.

Customer SuccessMetricsROI
0 likes · 8 min read
Value Realization in SaaS: A Customer Success Mantra
DevOps Cloud Academy
DevOps Cloud Academy
Sep 26, 2023 · Operations

DevOps Testing Best Practices: From Traditional Testing to Automated CI/CD Pipelines

DevOps testing integrates continuous, automated testing throughout the software development lifecycle, shifting left from traditional isolated testing, emphasizing automation, appropriate tool selection, metric tracking, documentation, and dedicated test automation engineers to ensure high‑quality, rapid software delivery.

AutomationDevOpsMetrics
0 likes · 10 min read
DevOps Testing Best Practices: From Traditional Testing to Automated CI/CD Pipelines
Liangxu Linux
Liangxu Linux
Sep 24, 2023 · Operations

Understanding Prometheus Metric Types: Counters, Gauges, Histograms, and Summaries

This article explains the fundamentals of metrics, introduces dimensional metrics, compares Prometheus, OpenMetrics, and OpenTelemetry standards, and provides detailed guidance on the four Prometheus metric types—Counters, Gauges, Histograms, and Summaries—including their use‑cases, PromQL queries, and Python client examples.

CountersGaugesHistograms
0 likes · 18 min read
Understanding Prometheus Metric Types: Counters, Gauges, Histograms, and Summaries
MaGe Linux Operations
MaGe Linux Operations
Sep 13, 2023 · Cloud Native

Mastering Prometheus Metrics: Counters, Gauges, Histograms & Summaries Explained

This article introduces the fundamentals of metrics in IT monitoring, explains the structure of metric data points, explores dimensional metrics, and provides an in‑depth guide to Prometheus metric types—Counters, Gauges, Histograms, and Summaries—along with practical code examples and usage considerations in cloud‑native environments.

MetricsPrometheusmonitoring
0 likes · 19 min read
Mastering Prometheus Metrics: Counters, Gauges, Histograms & Summaries Explained
Efficient Ops
Efficient Ops
Sep 12, 2023 · Operations

Understanding Prometheus Metric Types: Counters, Gauges, Histograms & Summaries

This article explains how metrics are used to monitor software performance, introduces basic metric components and dimensional metrics, compares Prometheus, OpenMetrics and OpenTelemetry standards, and provides detailed guidance on Prometheus metric types—Counter, Gauge, Histogram, and Summary—with code examples and query patterns.

MetricsObservabilityPrometheus
0 likes · 18 min read
Understanding Prometheus Metric Types: Counters, Gauges, Histograms & Summaries
Continuous Delivery 2.0
Continuous Delivery 2.0
Sep 1, 2023 · Operations

Project Health Metrics and Practices in Google’s SRE and Development Process

The article explains how Google measures and improves software quality before release by separating development and operations responsibilities, using monorepo and trunk‑based development, daily release candidates, automated testing, performance benchmarks, and a comprehensive Project Health (pH) metric system that balances speed, reliability, and quality.

GoogleMetricsOperations
0 likes · 11 min read
Project Health Metrics and Practices in Google’s SRE and Development Process
Sohu Tech Products
Sohu Tech Products
Aug 23, 2023 · Operations

Implementing Global Pulsar Client Monitoring with a SkyWalking Plugin

To give the business team a global, application‑level view of Pulsar performance, the team built a SkyWalking Java‑Agent plugin that automatically collects producer and consumer metrics from the Pulsar client, exposing latency, backlog and failure counts via Prometheus without modifying the client code.

JavaMetricsPrometheus
0 likes · 7 min read
Implementing Global Pulsar Client Monitoring with a SkyWalking Plugin
DevOps Cloud Academy
DevOps Cloud Academy
Aug 19, 2023 · Operations

Understanding DevOps Metrics and the Four DORA Indicators

This article explains why measuring software development productivity is challenging, introduces the concept of DevOps metrics, details the four DORA indicators and their performance levels, and discusses additional metrics such as cycle time, quality, customer and employee satisfaction, CI/CD health, uptime, and service level indicators to help teams monitor progress and identify problems.

DevOpsDoRAMetrics
0 likes · 14 min read
Understanding DevOps Metrics and the Four DORA Indicators
DeWu Technology
DeWu Technology
Aug 18, 2023 · R&D Management

How Dewu’s SQC Model Revolutionizes Quality Assurance in E‑Commerce

This article examines Dewu Technology’s SQC (Supplier‑Quality‑Customer) quality management model, detailing its AI‑driven inspection and authentication, the comprehensive quality assurance system built on mechanisms, processes, methods and tools, the “Quality Month” initiative, the iteration quality score model, automation testing and traffic‑replay platforms, and the measurable improvements in software quality and production reliability.

AI inspectionMetricsSQC Model
0 likes · 10 min read
How Dewu’s SQC Model Revolutionizes Quality Assurance in E‑Commerce
Continuous Delivery 2.0
Continuous Delivery 2.0
Aug 17, 2023 · Operations

Understanding and Overcoming the DevOps Adoption Gap

The article analyses why many organizations stall during DevOps adoption, explains the underlying causes, discusses metric design and second‑order changes, and proposes organizational and technical strategies to cross the gap and achieve sustainable continuous delivery.

MetricsSecond-Order Changeorganizational design
0 likes · 19 min read
Understanding and Overcoming the DevOps Adoption Gap
dbaplus Community
dbaplus Community
Aug 14, 2023 · Operations

Designing Business‑Focused Monitoring for Banking Systems: Metrics, Alerts, and Implementation Challenges

The article outlines a practical framework for business‑level monitoring in banking systems, describing three evolution stages, key metrics such as transaction success rates and volume spikes, concrete alert rules, and the technical challenges of data collection, standardization, and massive parameter management.

AlertingMetricsOperations
0 likes · 14 min read
Designing Business‑Focused Monitoring for Banking Systems: Metrics, Alerts, and Implementation Challenges
Alibaba Cloud Native
Alibaba Cloud Native
Aug 4, 2023 · Backend Development

Unlocking Dubbo3’s Cloud‑Native Observability: A Complete Guide

This article explains how Dubbo3’s new observability starter provides visual cluster metrics, full‑link tracing, multi‑dimensional monitoring, Prometheus/Grafana integration, and log management, offering practical steps and configurations for building a robust cloud‑native microservice observability platform.

BackendCloud NativeMetrics
0 likes · 10 min read
Unlocking Dubbo3’s Cloud‑Native Observability: A Complete Guide
58 Tech
58 Tech
Aug 3, 2023 · R&D Management

Design and Implementation of Anjuke's R&D Efficiency Measurement System

This article describes Anjuke's R&D efficiency measurement framework, detailing its quality and efficiency metrics across project phases, the data collection and processing architecture, visualization dashboards, and analysis methods used to monitor and improve development productivity, reliability, and continuous delivery.

MetricsR&D managementSoftware quality
0 likes · 15 min read
Design and Implementation of Anjuke's R&D Efficiency Measurement System
Data Thinking Notes
Data Thinking Notes
Jul 30, 2023 · Fundamentals

Why Data Analysis Is Essential for Product Success: Real-World Payment Case Studies

This article shares practical experience building a payment data analysis system from scratch, explaining why data analysis matters, outlining a five‑stage framework, detailing metric design, and presenting common analytical methods such as funnel, multi‑dimensional, trend, comparison, Pareto, and cross analysis to drive product decisions.

Business IntelligenceMetricsdata analysis
0 likes · 26 min read
Why Data Analysis Is Essential for Product Success: Real-World Payment Case Studies
dbaplus Community
dbaplus Community
Jul 27, 2023 · Operations

How to Build Scalable Observability for Cloud‑Native Environments: Lessons from SRE

This article summarizes a technical talk on the challenges of cloud‑native transformation, the design of an application‑centric observability platform using CMDB, Prometheus, Thanos and VictoriaMetrics, practical solutions for high‑cardinality metrics and alerting, and future directions such as eBPF and AI‑driven fault detection.

CMDBMetricsObservability
0 likes · 14 min read
How to Build Scalable Observability for Cloud‑Native Environments: Lessons from SRE
Architects Research Society
Architects Research Society
Jul 25, 2023 · Operations

Six Reasons to Invest in Enterprise Architecture Tools

Enterprise Architecture (EA) tools bring order to sprawling IT environments by cataloguing assets, breaking down silos, providing measurable metrics, automating data collection, and serving as a single source of truth, helping organizations make informed decisions while acknowledging the need for human oversight and data interpretation.

IT OperationsMetrics
0 likes · 8 min read
Six Reasons to Invest in Enterprise Architecture Tools