Tagged articles
15 articles
Page 1 of 1
Architecture Digest
Architecture Digest
Jun 2, 2024 · Fundamentals

Understanding Unicode, UTF-16, and String Length Issues in JavaScript

This article explains why JavaScript string length behaves unexpectedly with Unicode characters, describes UTF‑16 encoding and surrogate pairs, and demonstrates ES6 techniques such as for‑of loops, spread syntax, the u regex flag, codePointAt, and normalize to handle Unicode correctly.

JavaScriptUTF-16Unicode
0 likes · 7 min read
Understanding Unicode, UTF-16, and String Length Issues in JavaScript
Sohu Tech Products
Sohu Tech Products
Dec 6, 2023 · Frontend Development

The Nuances of Base64 Encoding Strings in JavaScript

The article explains that JavaScript’s native btoa() and atob() functions only handle ASCII, so to correctly base64‑encode Unicode strings you must convert them with TextEncoder to UTF‑8 bytes, use Uint8Array, and decode with TextDecoder, while checking for malformed surrogate pairs via isWellFormed or encodeURIComponent to avoid silent data loss.

Base64JavaScriptTextDecoder
0 likes · 14 min read
The Nuances of Base64 Encoding Strings in JavaScript
Programmer DD
Programmer DD
Aug 16, 2022 · Fundamentals

Why Does JavaScript .length Miscount Emoji? A Deep Dive into UTF‑16 and Unicode

This article explains why JavaScript's string length property returns unexpected values for Unicode characters like emojis, explores UTF‑16 encoding rules, and demonstrates modern ES6 techniques—including for‑of loops, spread syntax, and the \u{…} and /u regex flags—to correctly handle Unicode strings.

EmojiJavaScriptUTF-16
0 likes · 9 min read
Why Does JavaScript .length Miscount Emoji? A Deep Dive into UTF‑16 and Unicode
Tencent Cloud Developer
Tencent Cloud Developer
May 17, 2022 · Fundamentals

A Comprehensive History and Overview of Character Encoding and Unicode

The article traces character encoding from early telegraph and Morse code through ASCII, ISO national sets and Chinese standards, explains Unicode’s unification and its UTF‑8/‑16/‑32 forms, and shows how modern languages—especially JavaScript—handle code points, highlighting the cultural and technical significance for developers.

ASCIIJavaScriptUTF-16
0 likes · 31 min read
A Comprehensive History and Overview of Character Encoding and Unicode
Programmer DD
Programmer DD
Apr 19, 2022 · Backend Development

Why Java 8 Switched String Storage to byte[] and How It Saves Memory

The article explains how Java 8 changed the internal representation of String from a char[] to a byte[] to reduce memory consumption, the role of Latin‑1 encoding, the impact on garbage collection, and why UTF‑16 remains the practical choice for Java strings.

JavaMemory OptimizationString
0 likes · 8 min read
Why Java 8 Switched String Storage to byte[] and How It Saves Memory
Programmer DD
Programmer DD
Jul 22, 2020 · Fundamentals

Why Java’s char Can’t Represent All Unicode Characters – Understanding UTF‑16 and Code Points

This article explains how Java’s char type stores Unicode code units in UTF‑16, why its range of \u0000 to \uffff limits direct representation of newer Unicode characters, and how methods like String.length, getBytes, and code‑point APIs help handle multi‑byte characters such as emojis and rare Chinese glyphs.

Code PointsStringUTF-16
0 likes · 10 min read
Why Java’s char Can’t Represent All Unicode Characters – Understanding UTF‑16 and Code Points
JD Tech
JD Tech
Dec 18, 2018 · Fundamentals

Understanding Character Encoding: Bits, Bytes, Unicode, UTF-8, UTF-16, and UTF-32

This article explains the origins of character sets, the relationships among various encodings such as ASCII, GB2312, GBK, GB18030, Unicode, UTF-8, UTF-16, and UTF-32, and shows how JavaScript handles Unicode and emoji characters, including practical code examples and solutions for length‑limited input fields.

UTF-16UTF-8Unicode
0 likes · 11 min read
Understanding Character Encoding: Bits, Bytes, Unicode, UTF-8, UTF-16, and UTF-32
Tencent Music Tech Team
Tencent Music Tech Team
Feb 9, 2018 · Mobile Development

Understanding String Encoding in Android JNI: From Native Crash to Source Code Analysis

This article investigates an Android JNI native crash caused by misusing NewString(), examines why a custom UTF‑8‑to‑UTF‑16 conversion was used instead of NewStringUTF(), compares Dalvik and ART string encodings, reveals a Dalvik UTF‑8 conversion bug fixed in ART, and advises developers on encoding nuances across Android versions.

ARTAndroidDalvik
0 likes · 26 min read
Understanding String Encoding in Android JNI: From Native Crash to Source Code Analysis