Mobile Development 28 min read

How Alibaba’s Mobile Team Built a Full‑Stack Observability System to Boost App Performance

This article details Alibaba's mobile engineering team's original approach to full‑link observability, describing the challenges of the Taobao app architecture, the evolution of monitoring to observability, the Falco OpenTracing model, and practical performance optimizations that improve issue‑resolution efficiency and user experience.

Alibaba Terminal Technology
Alibaba Terminal Technology
Alibaba Terminal Technology
How Alibaba’s Mobile Team Built a Full‑Stack Observability System to Boost App Performance

App Architecture Challenges

Since 2013 Alibaba’s mobile technology has evolved through three stages: Atlas container framework for large‑scale concurrency, ACCS full‑duplex low‑latency channels, and dynamic cross‑platform frameworks such as Weex and Mini‑Programs, forming a three‑layer architecture of business, framework/container, and infrastructure. Common problems include low operational efficiency, incomplete end‑to‑end tracing, inconsistent performance metrics, and high cost of mobile PaaS troubleshooting.

(Figure 1 Taobao App architecture challenges)

Observability System

Observability is a philosophy rather than a concrete technology. Traditional monitoring provides high‑level alerts, while observability combines data to reveal why components fail, covering Traces, Loggings, and Metrics.

(Figure 2 Relationship between monitoring and observability)

Observability Key Data

Loggings are derived from the TLOG system and can be structured into traces; Metrics are aggregated values for macro analysis; Traces record parent‑child relationships with detailed operation data, enabling both fine‑grained debugging and high‑level metric extraction.

(Figure 3 Observability key data)

Full‑Link Observability Architecture

The architecture is divided into four layers: Data (metric definitions and OpenTracing reporting), Domain (problem discovery,定位, continuous performance optimization), Platform (benchmarking against competitors and driving performance), and Business (full‑link view across client and server).

(Figure 4 Full‑link observability architecture concept)

Mobile OpenTracing – Falco Architecture

Falco adopts the OpenTracing model to unify Logs, Metrics, and Traces on the client side. Its data model includes Span (core OpenTracing fields), Scene (business scenario), Layer (business, frameworkContainer, ability), Stages (standardized phases), Module (e.g., DX, MTOP), and Logs.

(Figure 6 Falco data table model)

Falco Key Points

Unique, fast, short trace IDs.

TraceID and hierarchical Span IDs propagate end‑to‑end.

Bidirectional mapping between client trace IDs and backend EagleEye IDs for precise failure diagnosis.

Layered measurement enables consistent cross‑module performance comparison.

Structured event logging with columnar storage supports large‑scale aggregation.

Domain‑level problem data is persisted for continuous analysis.

Operational Practices Based on Falco

Improving log upload reliability, classifying logs for quick filtering, visualizing full‑link topologies, and extending EagleEye trace retention from minutes to days dramatically reduce issue‑resolution time.

(Figure 9 Problem‑driven user flow and operations system)

Macro Metric System

APM upgrades focus on user‑perceived metrics such as page‑on‑screen time, click response, and scroll frame rate, aligning data with real user experience.

(Figure 10 Calibrated startup data trend)

Optimization Practices

Examples include simplifying MTOP network calls to reduce data copies and thread switches, enabling dual‑channel Wi‑Fi + cellular networking on Android to improve latency under weak networks, and applying image‑size grading for low‑end devices.

(Figure 21 Extreme‑call AB test results)

(Figure 22 Android dual‑channel network optimization)

Summary & Outlook

The article demonstrates how a full‑link observability system built on OpenTracing and Falco transforms Alibaba’s mobile operations from manual, low‑efficiency processes to data‑driven, automated performance optimization, while outlining remaining challenges and future directions for a comprehensive mobile observability ecosystem.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

AlibabaPerformance OptimizationFull‑Link TracingFalcoopen tracingmobile observability
Alibaba Terminal Technology
Written by

Alibaba Terminal Technology

Official public account of Alibaba Terminal

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.