Tagged articles
168 articles
Page 1 of 2
Alibaba Cloud Native
Alibaba Cloud Native
Nov 4, 2025 · Cloud Native

How to Leverage ARMS Configuration Templates for Flexible Java Monitoring

This article explains the tiered monitoring needs of Java applications, introduces Alibaba Cloud ARMS configuration templates—including built‑in JVM JMX and full APM templates—shows how to create, customize, and apply these templates via the console or YAML labels, and outlines advanced extensions such as deep framework observation, performance profiling, and business‑level metric customization.

APMARMSConfiguration Templates
0 likes · 11 min read
How to Leverage ARMS Configuration Templates for Flexible Java Monitoring
Alibaba Cloud Observability
Alibaba Cloud Observability
Oct 27, 2025 · Mobile Development

How to Build a Zero‑Intrusion Android Data Collection SDK with Bytecode Instrumentation

This article explores the challenges of traditional Android APM integration and presents a comprehensive, non‑intrusive bytecode instrumentation approach—using Gradle plugins, AGP APIs, and ASM—to automatically collect user behavior, network, performance, crash, and WebView data without modifying application source code.

APMAndroidGradle
0 likes · 21 min read
How to Build a Zero‑Intrusion Android Data Collection SDK with Bytecode Instrumentation
Su San Talks Tech
Su San Talks Tech
Oct 20, 2025 · Backend Development

6 Proven Ways to Accurately Measure API Latency in Java Applications

This article explains why measuring API response time is crucial for performance optimization, monitoring, and user experience, and presents six practical methods—from simple System.currentTimeMillis() calls to Spring AOP, interceptors, filters, and production‑grade Micrometer/APM tools—complete with code examples, pros, cons, and suitable scenarios.

API latencyAPMInterceptor
0 likes · 23 min read
6 Proven Ways to Accurately Measure API Latency in Java Applications
macrozheng
macrozheng
Sep 2, 2025 · Operations

How to Master Microservice Performance Monitoring with SkyWalking APM

This tutorial walks you through installing SkyWalking, configuring Java agents, tracing microservice calls, profiling performance bottlenecks, creating custom trace annotations, logging with ActiveSpan, and using OpenTracing to achieve fine‑grained observability of Java‑based microservices.

APMSkyWalkingjava
0 likes · 10 min read
How to Master Microservice Performance Monitoring with SkyWalking APM
Nightwalker Tech
Nightwalker Tech
Aug 28, 2025 · Operations

How to Diagnose and Fix E‑commerce Order Failures with Observability, APM, and Distributed Tracing

This article explains the hierarchical relationship between APM, distributed tracing, and observability, walks through a real Double‑11 e‑commerce incident, and demonstrates how a well‑designed observability stack can pinpoint the root cause, apply emergency fixes, and restore system performance within minutes.

APMDistributed TracingFault Diagnosis
0 likes · 16 min read
How to Diagnose and Fix E‑commerce Order Failures with Observability, APM, and Distributed Tracing
ByteDance Cloud Native
ByteDance Cloud Native
Apr 3, 2025 · Operations

How to Seamlessly Integrate CloudWeGo with APMPlus for Full‑Stack Observability

This article explains the challenges of observability in distributed microservice and LLM architectures, introduces CloudWeGo and APMPlus, and provides step‑by‑step integration guides for Kitex, Hertz, and Eino frameworks, including code samples, data reporting methods, and advanced monitoring features such as RED metrics, LLM‑specific indicators, service topology, and future roadmap.

APMAPMPlusCloudWeGo
0 likes · 13 min read
How to Seamlessly Integrate CloudWeGo with APMPlus for Full‑Stack Observability
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
Mar 20, 2025 · Operations

Unlocking Application Reliability: Core APM Modules and Yunzhou’s OpenTelemetry Design

This article explains Application Performance Monitoring (APM), its key benefits such as business continuity, performance optimization, and cost reduction, outlines essential APM modules, and details Yunzhou Observation’s OpenTelemetry‑based design, data ingestion, processing, visualization, and future roadmap for observability.

APMObservabilityOpenTelemetry
0 likes · 10 min read
Unlocking Application Reliability: Core APM Modules and Yunzhou’s OpenTelemetry Design
Alibaba Cloud Observability
Alibaba Cloud Observability
Jan 27, 2025 · Backend Development

How to Quickly Diagnose and Fix CPU & JVM Memory Hotspots in Java Apps

This article explains common CPU and JVM memory bottlenecks in Java applications, shows how to detect abnormal usage with monitoring and alerts, and provides step‑by‑step methods—including top, jstack, flame‑graph analysis, and APM tools—to pinpoint and resolve performance hotspots efficiently.

APMCPU optimizationJVM Memory
0 likes · 11 min read
How to Quickly Diagnose and Fix CPU & JVM Memory Hotspots in Java Apps
Architect
Architect
Nov 29, 2024 · Operations

How to Combine SkyWalking and ELK for End-to-End Trace ID Logging

This article explains how to integrate SkyWalking's distributed tracing with an ELK logging stack, embed Trace IDs into logs via SkyWalking layouts or MDC, and use Kibana to query and visualize trace‑linked log data for comprehensive microservice observability.

APMELKMicroservices
0 likes · 11 min read
How to Combine SkyWalking and ELK for End-to-End Trace ID Logging
Architect
Architect
Nov 15, 2024 · Frontend Development

How Bilibili Built a Scalable Front‑End Error Monitoring System from Scratch

This article details Bilibili's end‑to‑end front‑end error monitoring solution, covering the custom SDK, error capture and classification, unique ID generation, filtering, white‑screen detection, data pipelines, APM visualisation, lifecycle plugins, one‑click alerts, and future roadmap, all backed by real‑world metrics and code examples.

APMAlertingBilibili
0 likes · 34 min read
How Bilibili Built a Scalable Front‑End Error Monitoring System from Scratch
ITPUB
ITPUB
Oct 27, 2024 · Operations

How to Combine SkyWalking and ELK for End‑to‑End Trace ID Logging

This article explains how to integrate SkyWalking with an ELK stack to embed Trace IDs into logs, compares the capabilities of both platforms, and provides step‑by‑step configurations—including Logback layout and MDC approaches—to achieve full distributed tracing in microservice environments.

APMELKSkyWalking
0 likes · 9 min read
How to Combine SkyWalking and ELK for End‑to‑End Trace ID Logging
Sohu Tech Products
Sohu Tech Products
Aug 14, 2024 · Operations

How to Combine SkyWalking and ELK for End-to-End Trace ID Logging

This article explains why ELK alone lacks Trace ID support, describes the architectures of SkyWalking and ELK, compares their capabilities, and provides step‑by‑step configurations—including a Logback layout and MDC approach—to embed Trace IDs into logs for full distributed tracing.

APMDistributed TracingELK
0 likes · 10 min read
How to Combine SkyWalking and ELK for End-to-End Trace ID Logging
dbaplus Community
dbaplus Community
Aug 12, 2024 · Cloud Native

Scaling Cloud‑Native Metric Monitoring: VictoriaMetrics, Flink, and Prometheus in Action

The article details how NetEase Cloud Music redesigned its APM stack by adopting VictoriaMetrics as a Prometheus‑compatible storage, adding Flink‑based pre‑aggregation, a query‑proxy for seamless Metric‑Trace correlation, and Grafana enhancements to achieve low‑cost, high‑performance observability at massive scale.

APMCloud Native
0 likes · 14 min read
Scaling Cloud‑Native Metric Monitoring: VictoriaMetrics, Flink, and Prometheus in Action
High Availability Architecture
High Availability Architecture
Jun 28, 2024 · Backend Development

Deep Dive into pfinder: Architecture, Bytecode Enhancement, and Tracing Mechanisms

This article provides a comprehensive technical overview of pfinder, JD's next‑generation APM system, covering its core concepts, feature set, comparison with other tracing tools, bytecode modification techniques using ASM, Javassist, ByteBuddy and ByteKit, Java agent injection via JVMTI and Instrumentation, plugin loading, trace‑ID propagation across threads, and a prototype hot‑deployment capability.

APMBytecodeInstrumentationPerformanceTracing
0 likes · 23 min read
Deep Dive into pfinder: Architecture, Bytecode Enhancement, and Tracing Mechanisms
JD Cloud Developers
JD Cloud Developers
Jun 28, 2024 · Backend Development

How JD’s pfinder Achieves Full‑Stack Java Monitoring with Bytecode Magic

pfinder, JD’s in‑house APM system, provides full‑link monitoring, multi‑dimensional metrics, automatic instrumentation, topology mapping, trace analysis, AI‑driven fault detection by leveraging bytecode enhancement techniques such as ASM, Javassist, ByteBuddy and ByteKit, and integrates with JVM agents for hot‑deployment and trace propagation.

APMJVMTIbytecode instrumentation
0 likes · 18 min read
How JD’s pfinder Achieves Full‑Stack Java Monitoring with Bytecode Magic
JD Tech Talk
JD Tech Talk
Jun 28, 2024 · Backend Development

Deep Dive into JD's PFinder: Architecture, Bytecode Instrumentation, and Monitoring Features

This article provides a comprehensive technical overview of JD's self‑built PFinder APM system, detailing its core concepts, multi‑dimensional monitoring capabilities, bytecode‑enhancement mechanisms using ASM, Javassist, ByteBuddy and ByteKit, JVMTI‑based agents, service and plugin loading, trace‑ID propagation across threads, and a prototype hot‑deployment solution.

APMAgentJVMTI
0 likes · 18 min read
Deep Dive into JD's PFinder: Architecture, Bytecode Instrumentation, and Monitoring Features
JD Tech
JD Tech
Jun 21, 2024 · Operations

Deep Dive into pfinder: Architecture, Features, and Bytecode Instrumentation

This article provides a comprehensive technical overview of pfinder, a Java‑based APM system, covering its core concepts, feature set, comparison with other tracing tools, bytecode instrumentation techniques, plugin architecture, trace‑ID propagation across threads, and a simple hot‑deployment implementation.

APMAgentBytecodeInstrumentation
0 likes · 22 min read
Deep Dive into pfinder: Architecture, Features, and Bytecode Instrumentation
dbaplus Community
dbaplus Community
Mar 7, 2024 · Operations

How We Built a Scalable Java‑Agent APM Platform Using Pinpoint

This article details the design and implementation of Pylon APM, a Java‑agent based monitoring platform built on Pinpoint, covering background challenges, architectural decisions, trace‑model extensions, tail‑based sampling, Prometheus integration, automatic JStack collection, and the resulting product features for fast issue diagnosis.

APMJava AgentPinpoint
0 likes · 13 min read
How We Built a Scalable Java‑Agent APM Platform Using Pinpoint
dbaplus Community
dbaplus Community
Jan 12, 2024 · Operations

How a Financial Firm Built a Scalable Edge‑Stored APM System for Microservices

This article describes how a securities company tackled the challenges of distributed‑system observability by designing and deploying a self‑developed application performance monitoring platform that supports flexible integration, dynamic metric collection, edge storage, and cross‑center synchronization, delivering measurable improvements in monitoring coverage, alert effectiveness, and bandwidth usage.

APMDistributed SystemsEdge Storage
0 likes · 16 min read
How a Financial Firm Built a Scalable Edge‑Stored APM System for Microservices
Tencent Cloud Developer
Tencent Cloud Developer
Jan 9, 2024 · Operations

Tencent Cloud APM Full-Link Tracing Implementation and Best Practices

The article explains how Tencent Cloud APM implements full‑link tracing using OpenTelemetry standards, addresses challenges such as protocol compatibility, massive trace storage, and bytecode overhead with solutions like conversion gateways, tail sampling and thread profiling, and showcases best‑practice scenarios for topology analysis, front‑end/back‑end integration, and log‑trace correlation within the broader TCOP observability suite.

APMFull‑Link TracingObservability
0 likes · 11 min read
Tencent Cloud APM Full-Link Tracing Implementation and Best Practices
Weimob Technology Center
Weimob Technology Center
Dec 26, 2023 · Operations

Rebuilding Our APM: Scalable Metrics & Alerts with VictoriaMetrics & VMAlert

This article details the complete redesign of our internal APM system, covering the motivations, architecture choices, metric collection pipeline, integration of VictoriaMetrics and VMAlert, metric and alert design principles, implementation steps, visualizations, performance gains, and future plans for scaling and SaaS‑ification.

APMAlertingMetrics
0 likes · 17 min read
Rebuilding Our APM: Scalable Metrics & Alerts with VictoriaMetrics & VMAlert
DevOps Coach
DevOps Coach
Dec 8, 2023 · Frontend Development

How to Add Elastic RUM Monitoring to a Hugo Site

This guide explains what Elastic Real User Monitoring (RUM) is, outlines its key benefits, and provides step‑by‑step instructions with code snippets for integrating the Elastic RUM JavaScript agent into a Hugo static site, including configuration parameters and how to view the collected data in Kibana.

APMFrontendHugo
0 likes · 14 min read
How to Add Elastic RUM Monitoring to a Hugo Site
NetEase Cloud Music Tech Team
NetEase Cloud Music Tech Team
Nov 7, 2023 · Operations

How NetEase Cloud Music Built Pylon APM: A Deep Dive into Tracing, Metrics, and Automated Diagnosis

This article details the design and implementation of the Pylon APM monitoring platform for NetEase Cloud Music, covering background challenges, the choice of Pinpoint, extensions to trace models, tail‑based exception sampling, Prometheus integration, automated JStack collection, and the resulting APM product features.

APMBackendJava Agent
0 likes · 12 min read
How NetEase Cloud Music Built Pylon APM: A Deep Dive into Tracing, Metrics, and Automated Diagnosis
Qunar Tech Salon
Qunar Tech Salon
Nov 7, 2023 · Big Data

Building and Optimizing a Distributed Tracing System for Qunar Travel: APM Architecture, Performance Bottlenecks, and Solutions

This article details Qunar Travel's end‑to‑end design and optimization of a distributed tracing system within its APM platform, covering architecture choices, log‑collection and Kafka transmission bottlenecks, Flink task tuning, and the business value derived from trace and metric analysis.

APMBig DataDistributed Tracing
0 likes · 22 min read
Building and Optimizing a Distributed Tracing System for Qunar Travel: APM Architecture, Performance Bottlenecks, and Solutions
Architect
Architect
Oct 25, 2023 · Operations

The Importance of Logging and Distributed Log Operations in Modern Architecture

This article explores why logs are essential in software development, outlines when to record them, discusses the value of logging in large-scale distributed systems, and examines the capabilities required of log‑operation tools such as APM, metrics, tracing, ELK, Prometheus, and custom batch querying solutions.

APMDistributed SystemsELK
0 likes · 21 min read
The Importance of Logging and Distributed Log Operations in Modern Architecture
Alibaba Cloud Native
Alibaba Cloud Native
Oct 21, 2023 · Operations

How to Reveal Tracing Blind Spots with Continuous Profiling and Code Hotspots

This article explains the evolution of observability, outlines a step‑by‑step diagnosis workflow using metrics, logs and tracing, highlights the blind spots of traditional tracing, and demonstrates how Alibaba Cloud ARMS continuous profiling and code‑hotspot features can pinpoint slow call‑chain issues in Java applications.

APMContinuous ProfilingObservability
0 likes · 14 min read
How to Reveal Tracing Blind Spots with Continuous Profiling and Code Hotspots
Architect's Guide
Architect's Guide
Sep 17, 2023 · Operations

Full‑Link Monitoring in Microservice Architectures: Concepts, Requirements, Architecture and Comparison of Zipkin, SkyWalking and Pinpoint

This article explains the need for full‑link monitoring in microservice systems, outlines its goals and functional modules, describes the core data structures of tracing such as Span and Annotation, and provides a detailed comparison of three popular APM solutions—Zipkin, SkyWalking and Pinpoint—covering performance impact, scalability, data analysis, developer transparency, topology visualization and community support.

APMFull‑Link MonitoringMicroservices
0 likes · 24 min read
Full‑Link Monitoring in Microservice Architectures: Concepts, Requirements, Architecture and Comparison of Zipkin, SkyWalking and Pinpoint
FunTester
FunTester
Aug 25, 2023 · Cloud Native

Introduction to SkyWalking: Architecture, Components, and Performance Tuning for Cloud‑Native Microservices

This article explains the background of cloud‑native microservices, introduces the open‑source SkyWalking observability platform and its core components (Agent, OAP server, storage, UI), and demonstrates how to extend SkyWalking with custom plugins and tune its performance to minimize monitoring overhead.

APMCloud NativeMicroservices
0 likes · 8 min read
Introduction to SkyWalking: Architecture, Components, and Performance Tuning for Cloud‑Native Microservices
dbaplus Community
dbaplus Community
Jul 29, 2023 · Operations

Which Distributed Tracing Tool Wins? Dapper, Zipkin, SkyWalking, or Pinpoint

This article examines the challenges of full‑link monitoring in micro‑service architectures, outlines the goals for an APM component, details core functional modules, explains Google Dapper’s Span‑Trace‑Annotation model, and compares Zipkin, SkyWalking, and Pinpoint across performance, scalability, data analysis, and deployment complexity.

APMDapperDistributed Tracing
0 likes · 25 min read
Which Distributed Tracing Tool Wins? Dapper, Zipkin, SkyWalking, or Pinpoint
Code Ape Tech Column
Code Ape Tech Column
Jul 10, 2023 · Operations

Fixing SkyWalking ThreadPool Plugin Enhancement Failure by Making AgentClassLoader a Singleton

This article details the investigation of a SkyWalking thread‑pool plugin enhancement failure caused by multiple AgentClassLoader instances, explains the debugging steps, class‑loading behavior, and provides two practical solutions to ensure proper bytecode instrumentation for ThreadPoolExecutor in Java applications.

APMAgentClassLoaderInstrumentation
0 likes · 16 min read
Fixing SkyWalking ThreadPool Plugin Enhancement Failure by Making AgentClassLoader a Singleton
dbaplus Community
dbaplus Community
Jul 6, 2023 · Operations

How Huya Built a Scalable APM Platform for Full‑Stack Observability

Facing explosive growth and increasingly complex distributed services, Huya designed and deployed a custom APM platform that unifies metric, trace, and log collection, provides zero‑cost integration, supports real‑time root‑cause analysis, and offers open APIs for cross‑team empowerment.

APMPerformance
0 likes · 14 min read
How Huya Built a Scalable APM Platform for Full‑Stack Observability
Qunar Tech Salon
Qunar Tech Salon
Jul 5, 2023 · Mobile Development

Long‑Term Client Crash Governance Mechanism at Qunar: Architecture, Detection, and Resolution Strategies

This article describes Qunar's systematic client crash governance framework, covering background challenges, APM‑based fast problem discovery, multi‑level alerting, common‑issue remediation, code‑level fixes for URL and Bundle size crashes, detection tools, code checks, automated testing, and the measurable improvements achieved in Android and iOS stability.

APMAndroidMobile
0 likes · 19 min read
Long‑Term Client Crash Governance Mechanism at Qunar: Architecture, Detection, and Resolution Strategies
dbaplus Community
dbaplus Community
Jul 3, 2023 · Cloud Native

How Qunar Built a Scalable Distributed Tracing System for Cloud‑Native Observability

This article details Qunar's end‑to‑end design and implementation of a distributed tracing platform, covering background, technology selection, architecture, data flow, performance bottlenecks, and concrete solutions such as Flume tuning, Kafka scaling, Flink back‑pressure handling, and JavaAgent instrumentation to achieve high trace connectivity and low failure rates.

APMCloud NativeFlink
0 likes · 18 min read
How Qunar Built a Scalable Distributed Tracing System for Cloud‑Native Observability
Code Ape Tech Column
Code Ape Tech Column
Jun 25, 2023 · Operations

Full-Link Monitoring and Distributed Tracing: Principles, Components, and Comparison of Zipkin, Pinpoint, and SkyWalking

This article explains the need for full‑link monitoring in micro‑service architectures, describes its core concepts and components such as spans, traces, and annotations, and compares three popular APM solutions—Zipkin, Pinpoint, and SkyWalking—across performance, scalability, data analysis, and ease of integration.

APMDistributed TracingPinpoint
0 likes · 24 min read
Full-Link Monitoring and Distributed Tracing: Principles, Components, and Comparison of Zipkin, Pinpoint, and SkyWalking
Qunar Tech Salon
Qunar Tech Salon
Jun 2, 2023 · Operations

Design and Implementation of a Distributed Tracing System at Qunar: Architecture, Technical Selection, and Performance Optimizations

This article describes the background, technology selection, architecture design, data flow, monitoring, logging, and trace collection mechanisms of Qunar's self‑built distributed tracing system, analyzes major performance problems such as Flume interruptions, Kafka bottlenecks, Flink back‑pressure, and presents concrete solutions including sliding‑window throttling, CGroup limits, and JavaAgent instrumentation, ultimately improving trace connectivity and system observability.

APMDistributed TracingFlink
0 likes · 18 min read
Design and Implementation of a Distributed Tracing System at Qunar: Architecture, Technical Selection, and Performance Optimizations
DaTaobao Tech
DaTaobao Tech
May 31, 2023 · Mobile Development

From Intern to Senior Engineer: Lessons on Writing Quality Android Code

This article shares a senior engineer’s journey from internships to three years at Taobao, offering practical advice on writing readable, high‑performance Android code, mastering design principles, handling performance metrics, and maintaining a growth mindset while contributing to a mobile‑focused tech team.

APMAndroidCareer Growth
0 likes · 13 min read
From Intern to Senior Engineer: Lessons on Writing Quality Android Code
Architecture Digest
Architecture Digest
Apr 4, 2023 · Operations

Understanding Logs, Their Value, and Practices for Observability and Operations

This article explains what logs are, when to record them, their importance in troubleshooting, performance optimization, security monitoring, and business decisions, and describes how centralized logging, metrics, tracing, and tools like ELK, Prometheus, and OpenTracing enable effective observability in modern distributed systems.

APMOperationstracing
0 likes · 19 min read
Understanding Logs, Their Value, and Practices for Observability and Operations
Top Architect
Top Architect
Mar 22, 2023 · Operations

Log Management, Observability, and APM: Concepts, Practices, and Tools

This article explains what logs are, when to record them, their value in large-scale systems, and how to build effective log‑management and observability platforms using APM concepts, including metrics, tracing, ELK, Prometheus, and custom tooling for distributed architectures.

APMELKObservability
0 likes · 20 min read
Log Management, Observability, and APM: Concepts, Practices, and Tools
Architect
Architect
Mar 21, 2023 · Operations

Log Management, Observability, and APM Practices in Distributed Systems

This article explains what logs are, when to record them, their value in large‑scale architectures, and how to build effective logging, metrics, and tracing platforms using tools such as ELK, Prometheus, and SkyWalking, while also presenting good and bad logging practices and sample batch‑log retrieval code.

APMDistributed SystemsELK
0 likes · 20 min read
Log Management, Observability, and APM Practices in Distributed Systems
Architecture Digest
Architecture Digest
Mar 18, 2023 · Operations

Understanding Log Importance and Operations in Distributed Architecture

This article explains what logs are, why they are crucial in large‑scale distributed systems, outlines the requirements for effective log operations, reviews common tooling such as ELK, Prometheus and tracing solutions, provides a Go example for batch log retrieval, and shares best‑practice guidelines to achieve observability.

APMmonitoring
0 likes · 19 min read
Understanding Log Importance and Operations in Distributed Architecture
NetEase Cloud Music Tech Team
NetEase Cloud Music Tech Team
Mar 15, 2023 · Mobile Development

Client-Side APM Monitoring System Implementation for NetEase Cloud Music

The article describes NetEase Cloud Music’s custom client‑side APM system that combats sliding stutter, heating, UI freezes and crashes by employing binary‑tree stack aggregation to halve storage, window‑based CPU analysis, run‑loop jank detection, ping‑based ANR monitoring, and malloc‑logger memory tracking with automated dump thresholds.

ANR detectionAPMCPU Monitoring
0 likes · 14 min read
Client-Side APM Monitoring System Implementation for NetEase Cloud Music
Baidu Geek Talk
Baidu Geek Talk
Feb 20, 2023 · Operations

Deep Dive into Logging Operations and Observability in Distributed Systems

The article examines logging’s critical role in distributed systems, detailing its purpose, severity levels, and value for debugging, performance, security, and auditing, while highlighting challenges of inconsistent formats and traceability, and reviewing observability pillars, ELK and tracing tools, and practical implementation best practices.

APMELKObservability
0 likes · 19 min read
Deep Dive into Logging Operations and Observability in Distributed Systems
ITPUB
ITPUB
Dec 20, 2022 · Operations

How We Scaled SkyWalking to Billions of Segments: A Full‑Stack Monitoring Journey

This article recounts a year‑long, hands‑on experience of deploying and continuously optimizing Apache SkyWalking for full‑link monitoring in a large micro‑service environment, covering the motivations, architecture choices, pre‑research, POC integration, and a series of performance‑tuning steps that reduced segment storage from billions to millisecond‑level query latency.

APMFull-Stack MonitoringObservability
0 likes · 21 min read
How We Scaled SkyWalking to Billions of Segments: A Full‑Stack Monitoring Journey
360 Quality & Efficiency
360 Quality & Efficiency
Oct 28, 2022 · Operations

Pinpoint APM Overview and PHP Full‑Stack Monitoring Setup

This article introduces the open‑source Pinpoint APM tool for Java micro‑services, explains its architecture and data model, demonstrates deployment options for Tomcat and SpringBoot, and provides a step‑by‑step guide to installing and configuring the Pinpoint PHP agent for end‑to‑end performance monitoring.

APMDistributed TracingMicroservices
0 likes · 8 min read
Pinpoint APM Overview and PHP Full‑Stack Monitoring Setup
Top Architect
Top Architect
Oct 18, 2022 · Operations

Apache SkyWalking APM: Concepts, Docker Installation, and UI Guide

This article introduces Application Performance Management (APM), explains the features of Apache SkyWalking for micro‑service and cloud‑native monitoring, and provides step‑by‑step Docker‑compose installation, agent configuration, and a detailed walkthrough of the SkyWalking UI components.

APMDockerMicroservices
0 likes · 13 min read
Apache SkyWalking APM: Concepts, Docker Installation, and UI Guide
IT Architects Alliance
IT Architects Alliance
Oct 15, 2022 · Operations

Introduction to Application Performance Management (APM) and Apache SkyWalking: Architecture, Installation, and UI Guide

This article explains APM concepts, compares traditional monitoring tools, introduces Apache SkyWalking as a cloud‑native APM solution, details its architecture and core modules, provides step‑by‑step Docker‑based installation of the server and agent, and walks through the SkyWalking UI features for monitoring microservice systems.

APMApache SkyWalkingDocker
0 likes · 13 min read
Introduction to Application Performance Management (APM) and Apache SkyWalking: Architecture, Installation, and UI Guide
Architect
Architect
Oct 13, 2022 · Operations

Introduction to Application Performance Management (APM) and Apache SkyWalking: Concepts, Architecture, and Installation Guide

This article introduces Application Performance Management (APM), explains distributed tracing fundamentals, provides an overview of Apache SkyWalking’s features and architecture, and offers step‑by‑step Docker‑based installation instructions for the SkyWalking server, UI, and Java agent.

APMApache SkyWalkingDistributed Tracing
0 likes · 12 min read
Introduction to Application Performance Management (APM) and Apache SkyWalking: Concepts, Architecture, and Installation Guide
dbaplus Community
dbaplus Community
Oct 9, 2022 · Operations

How Ping An Health Scaled SkyWalking to Billions of Traces: A Full‑Link Monitoring Journey

This article recounts the end‑to‑end design, implementation, and iterative optimization of a billion‑scale full‑link tracing system at Ping An Health using SkyWalking, covering why full‑link monitoring is needed, the selection of SkyWalking, architecture choices, performance bottlenecks, and the roadmap for future enhancements.

APMElasticsearchFull‑Link Tracing
0 likes · 21 min read
How Ping An Health Scaled SkyWalking to Billions of Traces: A Full‑Link Monitoring Journey
IT Architects Alliance
IT Architects Alliance
Sep 23, 2022 · Operations

Which APM Tool Wins? A Deep Comparison of Zipkin, SkyWalking, and Pinpoint

This article analyzes full‑link monitoring in micro‑service architectures, outlines the goals and functional modules of tracing systems, explains core concepts such as Span, Trace, and Annotation, and then compares Zipkin, SkyWalking, and Pinpoint across performance impact, scalability, data analysis depth, developer transparency, and topology visualization.

APMComparisonDistributed Tracing
0 likes · 27 min read
Which APM Tool Wins? A Deep Comparison of Zipkin, SkyWalking, and Pinpoint
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Sep 19, 2022 · Operations

Which Distributed Tracing Tool Wins? Zipkin vs Pinpoint vs SkyWalking Deep Dive

This article examines the challenges of full‑link monitoring in microservice architectures, outlines the goals for an effective tracing system, describes the four core functional modules, compares three popular APM solutions—Zipkin, Pinpoint, and SkyWalking—across performance, scalability, data analysis, developer transparency, and topology features, and clarifies the distinction between tracing and general monitoring.

APMDistributed TracingMicroservices
0 likes · 27 min read
Which Distributed Tracing Tool Wins? Zipkin vs Pinpoint vs SkyWalking Deep Dive
Tencent Cloud Developer
Tencent Cloud Developer
Sep 7, 2022 · Cloud Native

Why Build Probe Capabilities Based on OpenTelemetry for Cloud‑Native Observability

Building probe capabilities on OpenTelemetry gives cloud‑native teams a vendor‑neutral, standardized way to extend monitoring into full observability—supporting large‑scale, language‑specific instrumentation, plug‑and‑play plugins, and seamless integration with APM backends—so developers and operators can detect, debug, and predict faults across distributed containers.

APMCloud NativeNode.js
0 likes · 15 min read
Why Build Probe Capabilities Based on OpenTelemetry for Cloud‑Native Observability
Alipay Experience Technology
Alipay Experience Technology
Sep 1, 2022 · Mobile Development

How Alipay Optimizes Cold-Start Performance with Spider SDK and APM

Alipay’s client engineering team details a comprehensive approach to monitoring, measuring, and improving time‑consuming user experiences—especially cold‑start—by employing video frame analysis, ActivityTaskManager, extensive instrumentation, home‑page snapshot techniques, temperature control, patch‑APK injection, AOP‑based diagnostics, and the Spider SDK within a robust APM platform.

APMAndroidInstrumentation
0 likes · 21 min read
How Alipay Optimizes Cold-Start Performance with Spider SDK and APM
Java Architecture Diary
Java Architecture Diary
Aug 8, 2022 · Operations

How to Integrate Jaeger Tracing with Rainbond Using OpenTelemetry

This guide explains why distributed tracing is essential for micro‑service architectures, introduces Jaeger as an open‑source APM solution, and provides step‑by‑step instructions for deploying and configuring Jaeger on Rainbond with OpenTelemetry, including environment variables, service naming, and topology generation.

APMDistributed TracingObservability
0 likes · 11 min read
How to Integrate Jaeger Tracing with Rainbond Using OpenTelemetry
Zuoyebang Tech Team
Zuoyebang Tech Team
Jul 22, 2022 · Mobile Development

How to Slash Live‑Streaming App Memory & CPU Usage on Mobile Devices

This article analyzes the architecture and performance bottlenecks of a mobile live‑streaming classroom, defines measurable APM metrics, identifies root causes such as CPU, memory, GPU contention and signaling issues, and presents concrete optimization techniques—including independent processes, containerization, dedicated signaling channels, rendering and thread improvements—that dramatically reduce memory, CPU and frame‑rate problems.

APMResource Managementlive streaming
0 likes · 14 min read
How to Slash Live‑Streaming App Memory & CPU Usage on Mobile Devices
ELab Team
ELab Team
Apr 16, 2022 · Frontend Development

Master Front‑End Monitoring: From Data Collection to Performance Metrics

This article outlines the end‑to‑end workflow for front‑end monitoring in an APM platform, covering data collection, reporting, cleaning, storage, and consumption, and dives deep into environment info, exception handling, performance metrics, and efficient data upload strategies.

APMMetricsWeb
0 likes · 18 min read
Master Front‑End Monitoring: From Data Collection to Performance Metrics
YunZhu Net Technology Team
YunZhu Net Technology Team
Apr 15, 2022 · Operations

Design and Architecture of a Cloud‑Native Monitoring Platform for Business Systems

The document outlines the background, vision, current status, technical research, value, product and technical architecture, and functional design of a cloud‑native monitoring platform that integrates SkyWalking and Prometheus to provide comprehensive APM, resource utilization, alerting, and rapid fault localization for business and technical middle‑platform services.

APMMetricsObservability
0 likes · 10 min read
Design and Architecture of a Cloud‑Native Monitoring Platform for Business Systems
Efficient Ops
Efficient Ops
Mar 29, 2022 · Big Data

How Tencent Cloud Boosted APM Metric Computation Speed 2‑3× with Flink Optimizations

This article explains how Tencent Cloud's APM metric calculation, which transforms massive Span data into aggregated metrics using Flink, faced performance bottlenecks and was optimized through job splitting, batch merging, and dimension pruning, ultimately achieving a 2‑3× speed increase and cutting resource usage to about 30% of the original.

APMBig DataFlink
0 likes · 10 min read
How Tencent Cloud Boosted APM Metric Computation Speed 2‑3× with Flink Optimizations
IT Architects Alliance
IT Architects Alliance
Mar 5, 2022 · Operations

System Performance Issue Analysis Process and Optimization Practices

This article outlines a comprehensive process for diagnosing and optimizing business system performance problems, covering analysis workflows, influencing factors such as hardware, software, database and middleware, JVM tuning, code inefficiencies, and the use of monitoring and APM tools to improve system reliability.

APMPerformancejvm-tuning
0 likes · 16 min read
System Performance Issue Analysis Process and Optimization Practices
DaTaobao Tech
DaTaobao Tech
Feb 23, 2022 · Mobile Development

APM Page Load Time Calibration: From 0 to 2

The article chronicles Taobao Android’s APM page‑load‑time calibration, moving from the basic 8060 visible‑calculation algorithm to asynchronous, element‑aware methods with view tagging, page‑specific thresholds, custom root views, and H5/Weex support, dramatically improving accuracy by excluding non‑essential elements.

8060 algorithmAPMAndroid performance
0 likes · 16 min read
APM Page Load Time Calibration: From 0 to 2
IT Architects Alliance
IT Architects Alliance
Feb 15, 2022 · Operations

What Real-World Performance Tuning Taught Us About Legacy Web Apps

After a traffic surge exposed severe latency in a 15-year-old multi-service web platform, we used monitoring to discover a DB-connection leak caused by a liveness probe, corrected it, and distilled four practical lessons on latency metrics, tooling, legacy maintenance, and code vigilance.

APMLoad TestingOperations
0 likes · 9 min read
What Real-World Performance Tuning Taught Us About Legacy Web Apps
21CTO
21CTO
Feb 7, 2022 · Operations

Why Every Line of Code Matters: Boosting Performance by 3000% with a Simple DB Fix

This article shares hard‑won lessons from optimizing fifteen high‑load web applications, highlighting how a tiny DB‑connection leak in a pod probe caused severe slowdown and how fixing it, along with proper load testing, monitoring, and investment in tools and people, can dramatically improve system performance.

APMLoad TestingOperations
0 likes · 9 min read
Why Every Line of Code Matters: Boosting Performance by 3000% with a Simple DB Fix
21CTO
21CTO
Jan 30, 2022 · Operations

How Volcengine Violated Apache SkyWalking’s License: Evidence and Response

The article details how Volcengine redistributed the Apache SkyWalking Java agent without proper licensing, presenting file comparisons, code similarities, and package structures as evidence, and describes the community’s reaction and the company’s subsequent apology and corrective actions.

APMApache SkyWalkingJava Agent
0 likes · 4 min read
How Volcengine Violated Apache SkyWalking’s License: Evidence and Response
Alibaba Terminal Technology
Alibaba Terminal Technology
Jan 10, 2022 · Mobile Development

How to Accurately Measure and Optimize Android Frame Rate with APM

This article explains how APM provides frame‑rate data for Android, discusses the challenges of inaccurate FPS, introduces metrics such as scroll FPS, frozen‑frame ratio, scrollHitchRate and frame‑cause analysis, details the rendering pipeline, code implementations, optimization techniques, and integration with AB testing for performance improvement.

APMAndroidFrame Rate
0 likes · 20 min read
How to Accurately Measure and Optimize Android Frame Rate with APM
ByteDance SE Lab
ByteDance SE Lab
Jan 7, 2022 · Mobile Development

Systematic iOS Stability Management: From Crash Classification to Advanced Attribution

This article presents a comprehensive framework for identifying, classifying, and resolving iOS stability issues—covering crash types, governance methodology, deep-dive attribution techniques, real-world case studies, and practical tools such as Zombie monitoring, Coredump, MemoryGraph, and MetricKit—to dramatically improve app reliability.

APMcrash analysisiOS
0 likes · 30 min read
Systematic iOS Stability Management: From Crash Classification to Advanced Attribution
Alibaba Cloud Native
Alibaba Cloud Native
Dec 16, 2021 · Cloud Native

From Legacy Monitoring to Modern Observability: A Cloud‑Native Journey

This article traces the 30‑year evolution of system monitoring, explains the differences between monitoring, APM and observability, outlines key practices for building an observability platform, and provides a step‑by‑step guide to implementing Prometheus + Grafana in a cloud‑native environment.

APMARMSGrafana
0 likes · 18 min read
From Legacy Monitoring to Modern Observability: A Cloud‑Native Journey
Alibaba Cloud Native
Alibaba Cloud Native
Dec 7, 2021 · Cloud Native

Unlocking the Third Way of Distributed Tracing: Post‑Aggregation Link Analysis Explained

This article introduces the third, post‑aggregation approach to link tracing—link analysis—showing how real‑time aggregation of stored trace data can quickly pinpoint uneven traffic, single‑machine failures, slow interfaces, business‑level traffic shifts, and gray‑release anomalies while outlining its practical constraints.

APMCloud NativeLink Analysis
0 likes · 11 min read
Unlocking the Third Way of Distributed Tracing: Post‑Aggregation Link Analysis Explained
macrozheng
macrozheng
Nov 25, 2021 · Operations

Master SkyWalking: End‑to‑End Guide for Distributed Tracing & Monitoring

This article introduces SkyWalking, a Chinese open‑source APM framework, compares it with Spring Cloud Sleuth+Zipkin, explains server and client setup, storage configuration, log collection, performance profiling, and alerting, providing step‑by‑step instructions, code snippets, and screenshots to help developers implement comprehensive distributed tracing.

APMDistributed TracingSkyWalking
0 likes · 16 min read
Master SkyWalking: End‑to‑End Guide for Distributed Tracing & Monitoring
ByteDance Terminal Technology
ByteDance Terminal Technology
Nov 24, 2021 · Mobile Development

Systematic iOS Stability Issue Management: Classification, Methodology, and Root‑Cause Attribution

This article presents a comprehensive guide on systematically managing iOS stability problems, covering issue classification, a governance methodology, detailed root‑cause analysis for crashes, watchdogs, OOM, CPU and disk I/O anomalies, and practical tools and case studies from ByteDance’s APM platform.

APMMobile Developmentcrash analysis
0 likes · 27 min read
Systematic iOS Stability Issue Management: Classification, Methodology, and Root‑Cause Attribution
Top Architect
Top Architect
Nov 20, 2021 · Operations

Analyzing and Optimizing Business System Performance: A Comprehensive Guide

This article presents a thorough analysis of performance problems in production business systems, covering root causes such as concurrency, data growth, hardware limits, database and middleware bottlenecks, JVM tuning, code inefficiencies, and offers practical diagnostic steps, monitoring tools, and optimization strategies.

APMJVMdatabase
0 likes · 15 min read
Analyzing and Optimizing Business System Performance: A Comprehensive Guide
DevOps
DevOps
Oct 20, 2021 · Backend Development

Microservice Architecture: Stability, Service Degradation, Data Consistency, and Migration Practices

This article summarizes the author's extensive experience with microservice adoption, covering its benefits, the challenges of stability, service degradation strategies, distributed transaction patterns, data migration techniques, and practical monitoring using APM tools to help teams successfully transform to microservices.

APMData MigrationDistributed Transactions
0 likes · 35 min read
Microservice Architecture: Stability, Service Degradation, Data Consistency, and Migration Practices
IT Architects Alliance
IT Architects Alliance
Aug 30, 2021 · Operations

Which Distributed Tracing Tool Wins? Comparing Zipkin, SkyWalking, Pinpoint

As micro‑service architectures grow, tracing every request across thousands of services becomes essential; this article examines the need for full‑link monitoring, outlines core requirements and functional modules, explains Google Dapper’s Span/Trace model, and provides a detailed performance‑focused comparison of Zipkin, SkyWalking, and Pinpoint.

APMComparisonDistributed Tracing
0 likes · 26 min read
Which Distributed Tracing Tool Wins? Comparing Zipkin, SkyWalking, Pinpoint
Baidu Intelligent Testing
Baidu Intelligent Testing
Aug 10, 2021 · Backend Development

Evolution and Architecture of Baidu's Fengjing APM System

This article chronicles the four‑year evolution of Baidu's Fengjing performance‑monitoring platform, detailing its data collection, processing pipelines, successive architectural versions (1.0‑4.0), challenges such as probe intrusion and massive data volume, and the engineering solutions that enabled large‑scale, low‑cost, cloud‑native observability for thousands of Java services.

APMArchitectureBig Data
0 likes · 9 min read
Evolution and Architecture of Baidu's Fengjing APM System
Java Interview Crash Guide
Java Interview Crash Guide
Jul 23, 2021 · Operations

How to Build a Scalable APM System: Inside the Dog Architecture

This article explains what an APM system is, compares logs, traces and metrics, reviews popular tools, and then details the design and implementation of the in‑house Dog APM platform—including client data models, Kafka pipelines, processing pipelines, storage in ClickHouse/Cassandra, and UI visualizations.

APMClickHouseKafka
0 likes · 28 min read
How to Build a Scalable APM System: Inside the Dog Architecture
Architecture Digest
Architecture Digest
Jul 6, 2021 · Operations

Non‑Invasive Production Debugging: An Emerging DevOps Trend

The article explores the rising DevOps trend of non‑invasive production debugging, explaining its advantages over traditional log‑based methods, detailing instrumentation techniques, showing code examples, and highlighting its impact on key DevOps metrics and industry adoption.

APMDevOpsInstrumentation
0 likes · 13 min read
Non‑Invasive Production Debugging: An Emerging DevOps Trend
Code Ape Tech Column
Code Ape Tech Column
Jun 29, 2021 · Industry Insights

Which Distributed Tracing Tool Wins? A Deep Dive into Dapper, Zipkin, Pinpoint, and SkyWalking

This article examines the challenges of monitoring complex micro‑service architectures, outlines the objectives of full‑link tracing, explains the Span/Trace data model, describes core functional modules, and provides a detailed performance and feature comparison of Google Dapper, Zipkin, Pinpoint, and SkyWalking.

APMDistributed TracingFull‑Link Monitoring
0 likes · 22 min read
Which Distributed Tracing Tool Wins? A Deep Dive into Dapper, Zipkin, Pinpoint, and SkyWalking
ByteDance Terminal Technology
ByteDance Terminal Technology
Jun 1, 2021 · Frontend Development

Background and Problem Localization

The article discusses identifying and resolving ImageIO-related crash issues in iOS applications, particularly those occurring after iOS 14 updates, by analyzing crash logs and system behavior.

APMImage ProcessingImageIO
0 likes · 6 min read
Background and Problem Localization