Uncovering the Mystery Behind App ANR Caused by Unicode Bidi Strings
This article continues the investigation of a mysterious string that triggers Android app freezes, explaining Unicode line‑breaking and bidirectional algorithms, how runs end with double zeros, how to craft a reproducing string, and practical ways to avoid the resulting ANR.
Background
The previous article identified the location of a dead loop caused by a mysterious string; this continuation dives deeper into why the string behaves that way and how it leads to an ANR.
Unicode Line‑Breaking Algorithm
The line‑breaking algorithm decides where a long line can be split when no explicit newline exists, based on character categories and specific rules such as allowing a break after a space following English letters.
Unicode Bidirectional (Bidi) Algorithm
Most scripts are left‑to‑right (LTR), but languages like Arabic and Hebrew are right‑to‑left (RTL). When LTR and RTL characters appear together, the Bidi algorithm assigns each character a type (strong, weak, neutral) and an embedding level, then determines visual order.
Why Runs End with Double Zero
During layout, the method TextLine.getOffsetToLeftRightOf uses a runs array that stores visual direction information. When the cursor is at the line end, runIndex points past the last run, and a subsequent branch expects a non‑zero run value. Because the last run remains zero, the algorithm falls into an infinite loop, causing the ANR.
Constructing a String That Triggers ANR
The string must contain:
Arabic or Hebrew characters (RTL strong type)
The sequence LRI (U+2066, decimal 8294) followed by a space (U+0020)
Any number of LTR characters between these sequences
Repeating the LRI+space combination two or three times increases the chance of the line break occurring at the problematic position.
char arabicChar = 1766; arabicChar = 1727; char[] chars = new char[]{ arabicChar, 'A','A','A','A','A','A','A','A','A', arabicChar, 'A','A','A','A','A','A','A','A','A','A', 8294, 32, 'A','A','A', arabicChar, 'A','A','A','A','A','A','A','A','A','A', arabicChar, 'A','A','A','A','A','A','A','A','A','A', 8294, 32, 'A','A','A', arabicChar, 'A','A','A','A','A','A','A','A','A','A', arabicChar, 'A','A','A','A','A','A','A','A','A','A', 8294, 32, 'A','A','A', };
How to Avoid or Handle This ANR Scenario
Three practical approaches:
When internationalization is not required, disable RTL handling and strip directional formatting characters.
Detect a line ending with the LRI+space pattern (or a run array ending with double zero) and skip the getOffsetForHorizontal call for that line.
Use Paint.measureText to compute character widths and compare with the horizontal offset before invoking the problematic method.
Related Links
UAX #14: Unicode Line Breaking Algorithm
UAX #9: Unicode Bidirectional Algorithm
Android source code and test cases referenced in the analysis
Kuaishou Frontend Engineering
Explore the cutting‑edge tech behind Kuaishou's front‑end ecosystem
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.