Tagged articles

UTF-16

16 articles · Page 1 of 1

Jun 25, 2026 · Fundamentals

A Programmer’s Intro to Unicode

This guide walks programmers through Unicode’s massive code space, its diverse scripts, encoding schemes like UTF‑8 and UTF‑16, combining marks, canonical equivalence, normalization forms, and grapheme clusters, explaining why the system is complex yet essential for global text handling.

UTF-16UTF-8Unicode

0 likes · 21 min read

Architecture Digest

Jun 2, 2024 · Fundamentals

Understanding Unicode, UTF-16, and String Length Issues in JavaScript

This article explains why JavaScript string length behaves unexpectedly with Unicode characters, describes UTF‑16 encoding and surrogate pairs, and demonstrates ES6 techniques such as for‑of loops, spread syntax, the u regex flag, codePointAt, and normalize to handle Unicode correctly.

JavaScriptProgramming FundamentalsUTF-16

0 likes · 7 min read

Understanding Unicode, UTF-16, and String Length Issues in JavaScript

Sohu Tech Products

Dec 6, 2023 · Frontend Development

The Nuances of Base64 Encoding Strings in JavaScript

The article explains that JavaScript’s native btoa() and atob() functions only handle ASCII, so to correctly base64‑encode Unicode strings you must convert them with TextEncoder to UTF‑8 bytes, use Uint8Array, and decode with TextDecoder, while checking for malformed surrogate pairs via isWellFormed or encodeURIComponent to avoid silent data loss.

Base64JavaScriptTextDecoder

0 likes · 14 min read

The Nuances of Base64 Encoding Strings in JavaScript

Programmer DD

Aug 16, 2022 · Fundamentals

Why Does JavaScript .length Miscount Emoji? A Deep Dive into UTF‑16 and Unicode

This article explains why JavaScript's string length property returns unexpected values for Unicode characters like emojis, explores UTF‑16 encoding rules, and demonstrates modern ES6 techniques—including for‑of loops, spread syntax, and the \u{…} and /u regex flags—to correctly handle Unicode strings.

EmojiJavaScriptUTF-16

0 likes · 9 min read

Why Does JavaScript .length Miscount Emoji? A Deep Dive into UTF‑16 and Unicode

Tencent Cloud Developer

May 17, 2022 · Fundamentals

A Comprehensive History and Overview of Character Encoding and Unicode

The article traces character encoding from early telegraph and Morse code through ASCII, ISO national sets and Chinese standards, explains Unicode’s unification and its UTF‑8/‑16/‑32 forms, and shows how modern languages—especially JavaScript—handle code points, highlighting the cultural and technical significance for developers.

ASCIIJavaScriptUTF-16

0 likes · 31 min read

A Comprehensive History and Overview of Character Encoding and Unicode

Programmer DD

Apr 19, 2022 · Backend Development

Why Java 8 Switched String Storage to byte[] and How It Saves Memory

The article explains how Java 8 changed the internal representation of String from a char[] to a byte[] to reduce memory consumption, the role of Latin‑1 encoding, the impact on garbage collection, and why UTF‑16 remains the practical choice for Java strings.

JavaMemory optimizationString

0 likes · 8 min read

Why Java 8 Switched String Storage to byte[] and How It Saves Memory

Open Source Linux

Jul 2, 2021 · Fundamentals

Why Unicode Matters: Understanding UTF‑8, UTF‑16, and UTF‑32 Encoding

This article explains the history and purpose of Unicode, describes how character sets differ from encodings, details the storage formats of UTF‑8, UTF‑16, and UTF‑32, discusses byte order and BOM, and shows common encoding pitfalls in Redis and MySQL with practical solutions.

MySQLUTF-16UTF-32

0 likes · 15 min read

Why Unicode Matters: Understanding UTF‑8, UTF‑16, and UTF‑32 Encoding

Liangxu Linux

Jun 19, 2021 · Fundamentals

Why Unicode Matters: Understanding UTF‑8, UTF‑16, and UTF‑32 Encoding

This article traces the evolution from ASCII to Unicode, explains how Unicode defines universal code points, compares the UTF‑8, UTF‑16 and UTF‑32 encoding schemes, discusses byte order and BOM, and shows practical fixes for common encoding problems in Redis and MySQL.

MySQLRedisUTF-16

0 likes · 15 min read

ELab Team

Mar 31, 2021 · Fundamentals

Why Do Emoji Lengths Differ in JavaScript? Understanding Unicode, UTF‑8 & UTF‑16

This article explains why strings containing emojis report different lengths in JavaScript, covering Unicode fundamentals, code points, UTF‑8 and UTF‑16 encodings, surrogate pairs, grapheme clusters, zero‑width joiners, and modern ES2015‑ESNext features that help handle Unicode correctly.

EmojiJavaScriptUTF-16

0 likes · 15 min read

Why Do Emoji Lengths Differ in JavaScript? Understanding Unicode, UTF‑8 & UTF‑16

Programmer DD

Jul 22, 2020 · Fundamentals

Why Java’s char Can’t Represent All Unicode Characters – Understanding UTF‑16 and Code Points

This article explains how Java’s char type stores Unicode code units in UTF‑16, why its range of \u0000 to \uffff limits direct representation of newer Unicode characters, and how methods like String.length, getBytes, and code‑point APIs help handle multi‑byte characters such as emojis and rare Chinese glyphs.

StringUTF-16Unicode

0 likes · 10 min read

Why Java’s char Can’t Represent All Unicode Characters – Understanding UTF‑16 and Code Points

360 Tech Engineering

Apr 22, 2020 · Fundamentals

Understanding Unicode Encoding and Implementing Emoji Detection in Java

This article explains Unicode's structure, encoding ranges, UTF-8/16/32 representations, byte order considerations, and provides Java code to detect emojis in strings, illustrating practical usage of Unicode concepts for text processing.

EmojiJavaUTF-16

0 likes · 14 min read

Understanding Unicode Encoding and Implementing Emoji Detection in Java

Huajiao Technology

Apr 21, 2020 · Fundamentals

Understanding Unicode Encoding (UTF-8, UTF-16, UTF-32) and Emoji Detection in Java

This article explains the Unicode standard, its code planes and ranges, the three UTF encoding forms (UTF-8, UTF-16, UTF-32), compares their storage characteristics, discusses byte order marks, and provides Java code for detecting emoji characters in strings.

EmojiJavaUTF-16

0 likes · 11 min read

Understanding Unicode Encoding (UTF-8, UTF-16, UTF-32) and Emoji Detection in Java

Senior Brother's Insights

Jan 10, 2020 · Fundamentals

Why Java’s char Can’t Represent All Unicode Characters – Code Units vs. Code Points

This article explains how Java stores characters as UTF‑16 code units, why the char type cannot cover the entire Unicode range, how surrogate pairs work, and demonstrates the differences in length, byte length, and char array size for regular Chinese characters, emojis, and rare Chinese glyphs.

Code PointJavaSurrogate Pair

0 likes · 9 min read

Why Java’s char Can’t Represent All Unicode Characters – Code Units vs. Code Points

JD Tech

Dec 18, 2018 · Fundamentals

Understanding Character Encoding: Bits, Bytes, Unicode, UTF-8, UTF-16, and UTF-32

This article explains the origins of character sets, the relationships among various encodings such as ASCII, GB2312, GBK, GB18030, Unicode, UTF-8, UTF-16, and UTF-32, and shows how JavaScript handles Unicode and emoji characters, including practical code examples and solutions for length‑limited input fields.

UTF-16UTF-8Unicode

0 likes · 11 min read

Understanding Character Encoding: Bits, Bytes, Unicode, UTF-8, UTF-16, and UTF-32

Tencent Music Tech Team

Feb 9, 2018 · Mobile Development

Understanding String Encoding in Android JNI: From Native Crash to Source Code Analysis

This article investigates an Android JNI native crash caused by misusing NewString(), examines why a custom UTF‑8‑to‑UTF‑16 conversion was used instead of NewStringUTF(), compares Dalvik and ART string encodings, reveals a Dalvik UTF‑8 conversion bug fixed in ART, and advises developers on encoding nuances across Android versions.

ARTAndroidDalvik

0 likes · 26 min read

Understanding String Encoding in Android JNI: From Native Crash to Source Code Analysis

WeChatFE

Oct 9, 2016 · Fundamentals

Why encodeURIComponent Throws “URI malformed” and How Unicode Encoding Works

This article explains why encodeURIComponent can raise a URI malformed error, clarifies the concepts of high and low surrogate pairs, and provides a comprehensive overview of character sets, Unicode, and the UTF‑8 and UTF‑16 encodings used in JavaScript.

JavaScriptUTF-16UTF-8

0 likes · 12 min read

Why encodeURIComponent Throws “URI malformed” and How Unicode Encoding Works