How DingTalk Tackles Android Power Drain: A Deep Dive into Mobile Power Optimization
This article explains DingTalk's systematic approach to diagnosing and reducing excessive battery consumption on Android, covering background issues, challenges, monitoring techniques, component-level power tracking, abnormal power detection, health scoring, and practical optimizations such as low‑power mode and governance strategies.
Background
DingTalk, a billion‑user enterprise digital platform, relies on instant communication, making real‑time message delivery and background power consumption critical Android experiences. Rapid business growth has increased message signaling over DingTalk's long‑connection channel, raising user concerns about battery life.
Problems
Users reported three main power‑drain issues:
System battery manager warns "DingTalk consumes power too fast". Unexpected high‑power alerts degrade user experience.
Abnormal background power consumption. Users see DingTalk high in the system battery usage list even when not actively using the app.
Foreground high power usage. Video features cause high power draw, device heating, and occasional app crashes.
These problems affect user satisfaction and product reputation, prompting a dedicated power‑optimization project.
Challenges
Power‑consumption issues differ from crashes or ANRs because Android does not expose direct power‑usage metrics to apps. Challenges include:
Unclear system power‑alert rules
The system’s high‑power notifications are a black box to the app, making root‑cause identification difficult.
Lack of现场信息, hard to reproduce
User reports often lack detailed scenario data, providing only battery‑usage rankings, which hampers reproducibility.
Limited diagnostic tools
Battery usage rankings and details provide limited insight.
Bugreport + Battery Historian offer deeper data but have drawbacks: they do not map directly to system alerts, lack stack traces, and are cumbersome to collect in real time.
Thus, Bugreport is mainly useful for offline analysis.
Power‑Optimization Directions
Four key capabilities were defined:
Perception ability: Build online power‑monitoring and anomaly‑detection models to proactively discover abnormal consumption.
Rapid定位 ability: Use anomaly monitoring and power reports to quickly locate root causes.
Problem治理: Target identified high‑risk components for remediation.
Degradation‑prevention ability: Continuously monitor health scores and anomaly rates to prevent new power regressions.
Overall Power‑Optimization Design
Goals
Provide users with an ultra‑low‑power experience and eliminate power issues.
Establish a perception‑governance‑prevention framework.
Overall Architecture
Perception ability: Monitor component usage based on Android Vitals and system power‑stat principles to detect abnormal consumption.
Rapid定位 ability: Periodically sample component usage, generate power reports, and pinpoint abnormal drains.
Degradation‑prevention ability: Define a power‑health score that quantifies overall power experience and tracks trends.
Governance optimization: Combine perception and定位 to resolve root causes and deliver a low‑power experience.
Perception Ability – Component Monitoring
Understanding component‑level power usage is essential for identifying head‑room issues.
System Power‑Stat Principle
Power = Voltage × Current × Time
Since device voltage is constant, power can be approximated by current over time. Android’s BatteryStatsService aggregates current usage per hardware module using a power_profile.xml supplied by device manufacturers.
Main Strategy
By monitoring the same modules used in system power calculations, DingTalk can approximate its own consumption even without direct power values.
The following modules are monitored (illustrated below):
Network Usage Monitoring
Three metrics are tracked:
Traffic: Mobile and Wi‑Fi upload/download bytes and packet counts.
Network events: DingTalk long‑connection protocol uplink/downlink and HTTP requests.
Network change count: Number of distinct network‑state changes within a time window.
Implementation: Android 10+ uses TrafficStats APIs; older versions read /proc/net/xt_qtaguid/stats.
System Service Call Monitoring
Monitored services include WakeLock, Alarm, Bluetooth scan, Wi‑Fi scan, and Location. Metrics such as lock duration, alarm trigger count, scan counts, and location request frequency are recorded.
Implementation: Java Hook techniques capture service calls, with version‑specific adaptations.
CPU Usage Monitoring
Tracks process CPU time, thread CPU time, and detects long‑running loops.
Implementation: Reads /proc/[pid]/stat for process totals and /proc/[pid]/task/[tid]/stat for thread details, calculating usage over sampling periods. proc/[pid]/stat and proc/[pid]/task/[tid]/stat provide utime, stime, cutime, cstime values used to compute CPU percentages.
Self‑Start Monitoring
Records self‑start count, reasons, and recent process‑exit causes using ActivityManager APIs and hooks on component launch paths.
App & Device State Monitoring
Captures app foreground/background state, device charging status, screen on/off, battery level, and temperature to contextualize power usage.
Perception Ability – Abnormal Power Monitoring
Based on Android Vitals thresholds, DingTalk defines rules for detecting abnormal patterns such as excessive background traffic, long‑held WakeLocks, frequent alarms, Bluetooth/Wi‑Fi scans, and high CPU load.
When a metric exceeds its baseline, an anomaly is logged and attributed to the responsible component.
Monitoring Effects
The anomaly‑diagnosis model has surfaced numerous hidden power issues, enabling targeted fixes and providing component‑level consumption breakdowns (illustrated below).
Rapid定位 Ability – Power Report
The power report aggregates component usage over a time window, highlighting the dominant consumers and linking to detailed event logs for precise root‑cause analysis.
Degradation‑Prevention Ability
Since raw power values are unavailable, DingTalk defines a "Power Health Score" that normalizes component usage, applies weighted factors, and yields a single metric reflecting overall power experience.
The score, together with anomaly counts and component‑level metrics, forms a comprehensive power‑experience dashboard.
Power Governance Practices
Guided by monitoring insights, DingTalk has addressed dozens of power issues through two main strategies:
Low‑Power Mode: For heavy background messaging, the server applies tiered, delayed, and merged push strategies. Low‑priority messages are suppressed in background; medium‑priority messages are batched; high‑priority messages are delivered immediately.
Component‑Level Optimizations: Ongoing fixes target excessive background network traffic, frequent system‑service calls (Alarm, WakeLock, Bluetooth scan), self‑start spams, and high CPU load (long‑running threads, animation leaks).
These measures have reduced abnormal power events by over 50%, lifted the Power Health Score above 99.9 (from ~95), improved user feedback, and earned DingTalk the Gold‑Label power‑efficiency certification.
Conclusion
By establishing a systematic power‑monitoring, detection, and governance framework, DingTalk has dramatically improved battery life, reduced power‑related complaints, and delivered a low‑power user experience, with ongoing efforts to further refine the app.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
