JD Mall App Stutter Monitoring System: Architecture, Implementation, and Optimization Outcomes
This article details the design and deployment of JD Mall's mobile app stutter monitoring system, introducing the stutter rate metric, describing data collection, analysis, and reporting modules, presenting iOS and Android implementation code, and summarizing the performance improvements achieved across the platform.
Research Background As JD Mall's app grew in functionality, performance issues became more pronounced, with 32% of non‑business user feedback in 2019 related to performance. To accurately locate and resolve these problems, JD built a performance monitoring system focused on app stutter detection.
Stutter Rate & Monitoring Framework Traditional metrics like FPS or frame time lack a business‑oriented view, so JD introduced the "stutter rate" metric, defined as the number of reports indicating a stutter divided by total reports for a page. Similar definitions apply to overall app severe‑stutter rate and the proportion of users affected. The monitoring system consists of three modules: data collection, collection strategy, and data analysis & visualization.
Metric Definitions & Collection Scheme
Indicator 1: Frame Rate (FPS) FPS reflects overall smoothness. Sample collection code:
- (void)tick:(CADisplayLink *)link {
_count++;
NSTimeInterval delta = link.timestamp - _lastTime;
if (delta < 1) return;
_lastTime = link.timestamp;
_fps = _count / delta;
_count = 0;
} public class MyFrameCallback implements Choreographer.FrameCallback {
@Override
public void doFrame(long frameTimeNanos) {
long times = (frameTimeNanos - lastTime) / 1000000;
recordFrame(); // one frame rendered
lastTime = frameTimeNanos;
analyseStuck(frameTimeNanos); // stutter collection
Choreographer.getInstance().postFrameCallback(mFrameCallback);
}
}Two issues were identified: iOS "CPU FPS" vs. "GPU FPS" and the misconception that a high FPS always means smoothness. GPU‑related frame drops can be missed if only CADisplayLink is used; therefore JD also monitors GPU FPS via OpenGL synchronization and CAEAGLLayer.
Indicator 2: Stutter Information Stutter events are classified into three types based on frame duration and consecutive occurrences:
Severe stutter: a single frame > 240 ms (≈6 movie frames).
General stutter: ≥3 consecutive frames > 80 ms.
Suspected stutter: ≥2 consecutive frames > 50 ms.
The monitoring flow records stutter type and the associated call stack at the moment of detection, and reports the information when the user leaves the page.
Common fields used in the implementation:
lStuckThreshold // threshold for suspected stutter (2)
cStuckThreshold // threshold for general stutter (3)
lightBlockTime // suspected stutter time (50 ms)
criticalBlockTime // general stutter time (80 ms)
bigJankTime // severe stutter time (240 ms)iOS Implementation Example
// RunLoop state change callback
static void JDRunLoop_CFRunLoopObserverCallBack(CFRunLoopObserverRef observer, CFRunLoopActivity activity, void *info) {
JDUIStuckMonitor *monitor = (__bridge JDUIStuckMonitor *)info;
monitor->_runLoopActivity = activity;
dispatch_semaphore_t lock = monitor->_lock;
if (lock != NULL) {
dispatch_semaphore_signal(lock);
}
}
- (void)initMonitor {
while (YES) {
long result = dispatch_semaphore_wait(_lock, dispatch_time(DISPATCH_TIME_NOW, criticalBlockTime * NSEC_PER_MSEC));
analyseStuck(delta);
}
}
- (void)analyseStuck(NSTimeInterval integerDelta) {
if (_runLoopActivity == kCFRunLoopBeforeSources || _runLoopActivity == kCFRunLoopAfterWaiting) {
if (integerDelta < lightBlockTime) {
if (_lastIntegerDelta >= criticalBlockTime) {
[self p_cstuckTimeArrAppend:integerDelta];
} else if (_lastIntegerDelta < lightBlockTime) {
[self p_judgeStuck];
}
} else {
if (++_lstuckCount && _lstuckCount % lStuckThreshold == 0) [self recordStackBackTrace];
if (integerDelta < criticalBlockTime) {
if (_lastIntegerDelta >= criticalBlockTime) {
[self p_cstuckTimeArrAppend:integerDelta];
}
} else if (integerDelta >= criticalBlockTime) {
if (_lastIntegerDelta >= criticalBlockTime) {
[self p_cstuckTimeArrAppend:integerDelta];
}
if (++_cstuckCount && _cstuckCount % cStuckThreshold == 0) [self recordStackBackTrace];
}
_lastIntegerDelta = integerDelta;
}
} else if (_runLoopActivity == kCFRunLoopBeforeWaiting) {
if (_lstuckCount == 0 && _cstuckCount == 0) continue;
[self p_judgeStuck];
}
}Android Implementation Example
// Analyze frame time
private void analyseStuck(long frameTimeNanos){
if (mLastFrameTimeNanos != 0) {
long diffMs = TimeUnit.NANOSECONDS.toMillis(frameTimeNanos - mLastFrameTimeNanos);
if (diffMs >= bigJankTime) {
bJank = true;
} else if (diffMs >= lightBlockTime) {
mLightBlockCount++;
if (diffMs >= criticalBlockTime) {
mCriticalBlockCount++;
}
} else {
if (bJank) {
// report severe stutter
} else if (mCriticalBlockCount >= cStuckThreshold) {
// report general stutter
} else if (mLightBlockCount >= lStuckThreshold) {
// report suspected stutter
}
}
}
mLastFrameTimeNanos = frameTimeNanos;
}Indicator 3: Thread Deadlock The same monitoring logic can capture main‑thread watchdog (0x8badf00d) crashes. By recording busy‑thread duration and stack traces before a crash, JD can later identify deadlock occurrences on app restart.
Data Analysis The collected data is analyzed from three dimensions:
General dimension: overall app stutter rate, user impact proportion, system/device/version distribution.
Page dimension: per‑page basic info, FPS, stutter rate, enabling precise business‑level diagnosis.
Call‑stack dimension: clustering of stack traces to surface top‑impact stacks for rapid issue resolution.
JD Mall App Optimization Results After deploying the stutter monitoring system, JD identified over ten major performance problems and assisted more than thirty business teams in optimization. iOS stutter rate dropped by >50% and Android by >30%, markedly improving overall app smoothness.
Common Optimization Strategies
UI rendering: reduce view hierarchy, cache views, avoid off‑screen rendering, use asynchronous rendering.
I/O rationalization: lower I/O frequency, improve I/O efficiency.
Avoid heavy main‑thread computation: cache layout calculations, parse data off‑thread.
Case Study: Push‑Screen Capture Stutter The default -drawViewHierarchyInRect:afterScreenUpdates: method caused >300 ms delays on large iPhone models. Replacing it with snapshotViewAfterScreenUpdates: reduced capture time to <100 ms, cutting the related stutter rate dramatically.
Case Study: Thread Deadlock Blocking calls such as close() on BSD sockets or synchronous openURL: can freeze the main thread for seconds, leading to watchdog crashes. Moving these calls off the main thread or using asynchronous alternatives mitigates the issue.
Outlook With the app’s growing complexity, performance monitoring remains critical. JD plans to extend monitoring to additional metrics, continue refining data collection, and further reduce performance bottlenecks to ensure a smooth user experience.
References
WWDC2019 Metal for Pro Apps
https://www.khronos.org/opengl/wiki/Synchronization
WWDC2018 Metal Game Performance Optimization
https://opensource.apple.com/source/CF/CF-1151.16/CFRunLoop.c.auto.html
https://pubs.opengroup.org/onlinepubs/9699919799/functions/close.html
https://engineering.fb.com/ios/reducing-fooms-in-the-facebook-ios-app/
JD Retail Technology
Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.