Understanding String Encoding in Android JNI: From Native Crash to Source Code Analysis
This article investigates an Android JNI native crash caused by misusing NewString(), examines why a custom UTF‑8‑to‑UTF‑16 conversion was used instead of NewStringUTF(), compares Dalvik and ART string encodings, reveals a Dalvik UTF‑8 conversion bug fixed in ART, and advises developers on encoding nuances across Android versions.
This article analyzes a native crash in Android JNI development, tracing the issue from a string conversion function to underlying source code. The author begins by examining a crash caused by calling NewString() without clearing an exception, then explores why a custom UTF-8 to UTF-16 conversion function was used instead of the simpler NewStringUTF() method.
The article provides a comprehensive analysis of UTF-8, UTF-16, and UCS-2 encoding differences, explaining how Unicode characters are represented across different encoding schemes. It includes detailed source code analysis of both Dalvik and ART implementations, revealing key differences in how string objects are created and managed.
Through empirical testing and official documentation review, the author concludes that Dalvik uses UTF-16 encoding for all strings, while ART uses UTF-16 but switches to UTF-8 for ASCII-only strings on Android 8.0+. The article also identifies a critical bug in Dalvik's UTF-8 to UTF-16 conversion that fails to handle 4-byte UTF-8 characters, which was later fixed in ART.
The analysis provides valuable insights for Android developers working with JNI, highlighting the importance of understanding encoding differences between Java and native layers, and the evolution of string handling across Android versions.
Tencent Music Tech Team
Public account of Tencent Music's development team, focusing on technology sharing and communication.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.