Mobile Development 22 min read

UBC SDK Log Duplicate Packaging Optimization Practices

The article explains how the UBC SDK’s log‑center deduplication suffers from package and log duplication, identifies three root causes—database corruption, WAL write failures, and multi‑process conflicts—and presents concrete fixes that reduced duplicate rates from 0.3 % to under 0.1 %.

Baidu Geek Talk
Baidu Geek Talk
Baidu Geek Talk
UBC SDK Log Duplicate Packaging Optimization Practices

The article introduces the background of log‑center deduplication and the UBC (User Behavior Collection) SDK, which is responsible for collecting, packaging, and uploading user‑behavior logs from client devices.

Two kinds of duplication are defined: package duplication (the whole package is uploaded again because the SDK did not receive a result) and log duplication (individual log entries appear in multiple packages). The server‑side stream deduplication can only eliminate package‑level duplicates; therefore the SDK must guarantee that logs are not duplicated before they are sent.

UBC SDK’s architecture is described, including asynchronous staged processing, SQLite persistence for logs, file persistence for packages, and multiple upload triggers (real‑time, non‑real‑time, background/foreground switch, network recovery). Diagrams of the core module and the log flow are provided.

Identification methods rely on unique identifiers such as log UUID, package MD5, and package creation time, as well as auxiliary fields (timestamp, app version, process ID/name, trigger type, SQLite sync mode). These markers enable pinpointing whether a duplicate originates from package or log duplication.

Root‑cause analysis reveals three major sources of duplication:

Database corruption causing the delete‑transaction to report success while the underlying SQLite transaction rolls back. Evidence includes SQLiteDatabaseCorruptException stacks and corrupted‑database documentation.

WAL (Write‑Ahead Logging) write failures in NORMAL sync mode, where power loss or system crash prevents the WAL file from being fsynced, leading to a “successful” upload but the log remaining in the database.

Multi‑process safety issues: separate processes (main and sub‑processes) may each instantiate a UBC instance, causing concurrent reads and writes that result in the same logs being packaged twice. Failures in IPC or incorrect main‑process detection (e.g., using package name as process name) exacerbate the problem.

For each cause, concrete optimization solutions are proposed and evaluated:

Fix database transaction handling by resetting the result flag on commit failure and rebuild the database when corruption is detected.

In the WAL scenario, enforce a manual checkpoint after deleting packaged logs (or switch to FULL sync mode) to guarantee that the WAL is flushed to disk.

Restrict packaging triggers in non‑main processes and deprecate the old multi‑process API, ensuring only the main process performs packaging. Additionally, improve process‑name detection by using Android 28+ Application.getProcessName() as a fallback.

Experimental results show that after applying these fixes, the overall log duplicate rate dropped from 0.3 % to below 0.1 %, accounting for roughly 35 % of the improvement. Specific gains include:

Database‑corruption fixes eliminated a large tail of duplicate logs, reducing the duplicate rate by ~0.07 percentage points.

WAL‑related fixes removed ~70 % of the new‑version tail duplicates, cutting duplicate occurrences by over 200 per hour in experiments.

Multi‑process safeguards lowered the duplicate rate to below 0.1 % and halved the one‑second repeat packaging cases.

The article concludes that accurate and efficient log uploading is essential for the UBC SDK’s role as the data source of the log‑center. The systematic debugging, monitoring, and optimization practices described provide a solid foundation for future reliability improvements.

AndroidDatabase OptimizationMulti-Processlog deduplicationUBC SDK
Baidu Geek Talk
Written by

Baidu Geek Talk

Follow us to discover more Baidu tech insights.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.