Mobile Development 26 min read

Understanding String Encoding in Android JNI: From Native Crash to Source Code Analysis

This article investigates an Android JNI native crash caused by misusing NewString(), examines why a custom UTF‑8‑to‑UTF‑16 conversion was used instead of NewStringUTF(), compares Dalvik and ART string encodings, reveals a Dalvik UTF‑8 conversion bug fixed in ART, and advises developers on encoding nuances across Android versions.

Tencent Music Tech Team

Feb 9, 2018

Understanding String Encoding in Android JNI: From Native Crash to Source Code Analysis

This article analyzes a native crash in Android JNI development, tracing the issue from a string conversion function to underlying source code. The author begins by examining a crash caused by calling NewString() without clearing an exception, then explores why a custom UTF-8 to UTF-16 conversion function was used instead of the simpler NewStringUTF() method.

The article provides a comprehensive analysis of UTF-8, UTF-16, and UCS-2 encoding differences, explaining how Unicode characters are represented across different encoding schemes. It includes detailed source code analysis of both Dalvik and ART implementations, revealing key differences in how string objects are created and managed.

Through empirical testing and official documentation review, the author concludes that Dalvik uses UTF-16 encoding for all strings, while ART uses UTF-16 but switches to UTF-8 for ASCII-only strings on Android 8.0+. The article also identifies a critical bug in Dalvik's UTF-8 to UTF-16 conversion that fails to handle 4-byte UTF-8 characters, which was later fixed in ART.

The analysis provides valuable insights for Android developers working with JNI, highlighting the importance of understanding encoding differences between Java and native layers, and the evolution of string handling across Android versions.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Android ART crash analysis Unicode UTF-8 Dalvik JNI Native Development String Encoding UTF-16

Written by

Tencent Music Tech Team

Public account of Tencent Music's development team, focusing on technology sharing and communication.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.