Big Data 21 min read

How Real‑Time Interception and Bitmap UV Calculation Boost Mobile App Quality

This article explains how a performance middle‑platform for mobile apps uses real‑time change interception, unique color IDs, bitmap‑based UV counting, exception de‑obfuscation, and a multi‑stage data pipeline to detect and isolate problems early, reduce user impact, and improve overall app reliability.

Baidu Geek Talk
Baidu Geek Talk
Baidu Geek Talk
How Real‑Time Interception and Bitmap UV Calculation Boost Mobile App Quality

Background

In the fast‑moving mobile app market, frequent code changes are essential for competitiveness, but each release can introduce crashes, stalls, or functional regressions that degrade user experience. Traditional post‑mortem fixes cannot prevent the negative impact of these issues, prompting the need for early detection and automatic mitigation during rollout.

Technical Background

The performance middle‑platform introduces a multi‑level interception and problem‑distribution mechanism. Each rollout is assigned a unique color ID that tags logs from the affected user segment. Core performance metrics are monitored per color ID, and any anomaly triggers an immediate interception of the rollout and a real‑time distribution of the problem to the responsible module and developers.

Real‑Time UV Calculation

To compute the number of unique users (UV) affected by an anomaly without storing raw user IDs, the platform uses a bitmap data structure built on a string‑based SDS array. Each bit represents a user, dramatically reducing memory usage compared to hash‑based sets and enabling fast union/intersection operations for multi‑dimensional UV analysis.

Exception De‑Obfuscation

When an app is released, it is often obfuscated to protect code. Crash reports therefore contain mangled stack traces. The platform stores mapping files (one per app version and one per OS version) and uses them to translate obfuscated symbols back to readable names, allowing developers to pinpoint the exact cause of a crash.

System Design Overview

The new workflow replaces the old reliance on aggregate performance dashboards with two core modules:

Change Interception Module : Registers each change, generates a color ID via an HTTP service, distributes the ID to the client SDK, and attaches it to performance logs.

Problem Distribution Module : After an anomaly is detected, it automatically maps the issue to the responsible component, module, and the owning developers, eliminating manual triage.

Change Interception Process

1️⃣ Register the change in the color‑ID service before the configuration becomes effective.

2️⃣ Obtain a globally unique integer ID for the change.

3️⃣ Distribute the ID to the client SDK (e.g., AB‑SDK) for the targeted user group.

4️⃣ The SDK tags performance‑critical logs with the ID and reports them via the UBC‑SDK.

5️⃣ Real‑time metric service aggregates core metrics (crash count, start‑up count, etc.) per ID across dimensions such as product line, version, OS, and region.

6️⃣ Anomaly detection service monitors the aggregated metrics and raises alerts when thresholds are breached.

7️⃣ Upon alert, the system halts further rollout and notifies the responsible team for rollback.

ID Generation Service

The service maps internal CUIDs to integer IDs stored in a bitmap. It guarantees global uniqueness, high concurrency, low latency, and high availability (fallback to MySQL when the primary store is down). The mapping is cached locally, then in Redis, and finally persisted in MySQL only for the highest ID of each batch.

Real‑Time Metric Calculation Service

Logs flow from the logging middle‑platform to message queues. The metric service consumes these streams, performs a key‑by on CUID to keep related records on the same compute node, aggregates locally into bitmaps to reduce shuffle traffic, and then merges results globally via a state service that maintains both real‑time and historical bitmap states.

State Service for Dual‑Stream Correlation

Two streams—fast metric data and slower stack‑trace files—arrive asynchronously. The state service matches records within the same time window, falls back to historical windows if needed, and stores unmatched records for up to 30 minutes (TTL) to achieve >99.9 % correlation.

Multi‑Level Caching and W‑TinyLFU

Bitmap files are too large to keep entirely in memory, so a three‑tier cache (in‑memory, Redis, HBase) is used. To improve hit rates, the platform replaces the default LRU eviction with W‑TinyLFU, which better preserves hot items, protects low‑frequency entries, adapts to changing access patterns, and reduces memory footprint.

Summary and Outlook

The performance middle‑platform now provides minute‑level detection of abnormal rollouts, automatic problem routing, and scalable UV calculation. Future work includes expanding integration with more change‑management systems, covering additional performance scenarios (e.g., latency, network quality), and refining clustering and distribution algorithms for even higher accuracy.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

cachingreal-time monitoringmobile app performanceData Streamingbitmap UVchange interceptiondeobfuscation
Baidu Geek Talk
Written by

Baidu Geek Talk

Follow us to discover more Baidu tech insights.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.