Tag

Unicode

0 views collected around this technical thread.

Full-Stack Internet Architecture
Full-Stack Internet Architecture
May 10, 2025 · Fundamentals

Strange Behaviors in Java: Integer Caching, String Comparison, Unary Plus, and Unicode Tricks

This article explains several counter‑intuitive Java behaviors—including integer caching with configurable range, string literal pool versus new objects, the unary plus being a sign rather than an operator, and Unicode escape processing that can make commented code execute—providing code examples and practical insights.

JavaUnicodeinteger caching
0 likes · 5 min read
Strange Behaviors in Java: Integer Caching, String Comparison, Unary Plus, and Unicode Tricks
Lobster Programming
Lobster Programming
Feb 27, 2025 · Fundamentals

Why Garbled Characters Appear: Exploring ASCII, GB2312, GBK & Unicode

This article explains how character encoding works—from ASCII and its extensions to Chinese GB2312 and GBK, through Unicode's UCS‑2, UCS‑4, and the versatile UTF‑8—showing why mismatched encodings produce garbled text and why UTF‑8 is the default in Spring Boot.

ASCIIGB2312GBK
0 likes · 9 min read
Why Garbled Characters Appear: Exploring ASCII, GB2312, GBK & Unicode
Code Mala Tang
Code Mala Tang
Aug 16, 2024 · Fundamentals

Why Emoji Turn into Question Marks? Master Unicode Encoding and Fix Socket Transmission

This article explains why emojis become garbled when transmitted via sockets, explores Unicode encoding fundamentals—including UTF‑8, BMP and high‑code‑point characters—and provides practical solutions using codePointAt, TextEncoder, and TextDecoder to ensure correct emoji handling.

SocketTextDecoderTextEncoder
0 likes · 11 min read
Why Emoji Turn into Question Marks? Master Unicode Encoding and Fix Socket Transmission
Java Tech Enthusiast
Java Tech Enthusiast
Jul 27, 2024 · Fundamentals

The Story Behind the Creation of UTF-8 and Its Advantages

Rob Pike and Ken Thompson devised UTF‑8 in 1992 at Bell Labs, turning a three‑day prototype into the web’s dominant Unicode encoding by using a variable‑length, ASCII‑compatible, length‑prefixed and prefix‑free scheme that maximizes efficiency, robustness, and universal adoption across more than 96 % of sites.

Computer ScienceHistoryUTF-8
0 likes · 6 min read
The Story Behind the Creation of UTF-8 and Its Advantages
macrozheng
macrozheng
Jul 19, 2024 · Backend Development

Master Java Obfuscation: 5 Crazy Tricks to Write Unreadable Code

This article reveals five advanced Java tricks—using Unicode escapes in comments, over‑complicating simple logic with bitwise shifts, tampering with Boolean.TRUE via reflection, forcing both branches of an if‑else to run, and leveraging the Unsafe class for low‑level memory manipulation—to deliberately make code hard to understand.

JavaUnicodeUnsafe
0 likes · 12 min read
Master Java Obfuscation: 5 Crazy Tricks to Write Unreadable Code
Architecture Digest
Architecture Digest
Jun 2, 2024 · Fundamentals

Understanding Unicode, UTF-16, and String Length Issues in JavaScript

This article explains why JavaScript string length behaves unexpectedly with Unicode characters, describes UTF‑16 encoding and surrogate pairs, and demonstrates ES6 techniques such as for‑of loops, spread syntax, the u regex flag, codePointAt, and normalize to handle Unicode correctly.

ES6JavaScriptUTF-16
0 likes · 7 min read
Understanding Unicode, UTF-16, and String Length Issues in JavaScript
Java Tech Enthusiast
Java Tech Enthusiast
Apr 21, 2024 · Fundamentals

Decoding Binary UTF-8 Signage in a Public Restroom Using Java

The article explains how a binary message on a multilingual public‑restroom sign was decoded by identifying UTF‑8 byte patterns, extracting the first 24 bits to reveal the Chinese character “向”, and providing a Java program that parses the entire bit string into readable Chinese text.

Binary EncodingJavaUTF-8
0 likes · 4 min read
Decoding Binary UTF-8 Signage in a Public Restroom Using Java
Aikesheng Open Source Community
Aikesheng Open Source Community
Apr 18, 2024 · Databases

MySQL ‘Disappearing Table’ Issue Caused by Zero‑Width Characters

This article explains how an invisible zero‑width Unicode character embedded in a MySQL table name can make the table appear to disappear, demonstrates the problem with reproducible examples, analyzes the root cause, and provides practical steps to detect and fix such issues.

Database TroubleshootingMySQLSQL
0 likes · 5 min read
MySQL ‘Disappearing Table’ Issue Caused by Zero‑Width Characters
Top Architecture Tech Stack
Top Architecture Tech Stack
Feb 23, 2024 · Fundamentals

Understanding Character Encoding: ASCII, GB2312, Unicode, and UTF-8

This article explains the history, purpose, and differences of major character encodings—including ASCII, GB2312, Unicode, and UTF-8—while showing how they are used and converted in modern computing environments.

ASCIIGB2312UTF-8
0 likes · 11 min read
Understanding Character Encoding: ASCII, GB2312, Unicode, and UTF-8
Aikesheng Open Source Community
Aikesheng Open Source Community
Dec 11, 2023 · Databases

Understanding utf8mb4 and Its Advantages in MySQL 8.0

This article explains the differences between utf8, utf8mb3 and utf8mb4 character sets in MySQL, demonstrates how utf8mb4 enables full Unicode support including emojis, and provides step‑by‑step SQL examples for creating tables, inserting data, and querying results with the proper character set.

DatabaseMySQLUnicode
0 likes · 12 min read
Understanding utf8mb4 and Its Advantages in MySQL 8.0
Sohu Tech Products
Sohu Tech Products
Dec 6, 2023 · Frontend Development

The Nuances of Base64 Encoding Strings in JavaScript

The article explains that JavaScript’s native btoa() and atob() functions only handle ASCII, so to correctly base64‑encode Unicode strings you must convert them with TextEncoder to UTF‑8 bytes, use Uint8Array, and decode with TextDecoder, while checking for malformed surrogate pairs via isWellFormed or encodeURIComponent to avoid silent data loss.

Base64JavaScriptTextDecoder
0 likes · 14 min read
The Nuances of Base64 Encoding Strings in JavaScript
Aikesheng Open Source Community
Aikesheng Open Source Community
Nov 1, 2023 · Databases

Troubleshooting MySQL Configuration Error Caused by Hidden Unicode Space

This article explains how a hidden non‑breaking space character copied from documentation caused the MySQL lower_case_table_names parameter to be unrecognizable, details the reproduction steps, and presents three methods (hexdump, od, editor) to detect and resolve such encoding issues.

MySQLTroubleshootingUnicode
0 likes · 7 min read
Troubleshooting MySQL Configuration Error Caused by Hidden Unicode Space
Test Development Learning Exchange
Test Development Learning Exchange
Sep 23, 2023 · Fundamentals

Understanding Python String Prefixes: u, r, b, and f

Python string prefixes such as u (Unicode), r (raw), b (bytes), and f (formatted) indicate special string types, with each prefix altering how the string is interpreted, and the article explains their meanings, usage, and provides code examples for each.

Formatted StringsRaw StringsString Prefixes
0 likes · 3 min read
Understanding Python String Prefixes: u, r, b, and f
360 Tech Engineering
360 Tech Engineering
Jul 18, 2023 · Fundamentals

Understanding Characters, Character Sets, and Encoding: From ASCII to Unicode

This article explains the concepts of characters, character sets, and character encoding, describes how computers store and render text using methods like ASCII, GB2312, Unicode, and UTF‑8/16/32, and discusses why garbled text occurs across different languages and systems.

ASCIIUTF-8Unicode
0 likes · 10 min read
Understanding Characters, Character Sets, and Encoding: From ASCII to Unicode
Sohu Tech Products
Sohu Tech Products
Jul 12, 2023 · Fundamentals

The Mystery of Character Encoding: Unicode, UTF‑8, UTF‑16, GBK and Emoji

This article explains the fundamentals of character encoding, covering Unicode’s universal character set, the structure of its planes and surrogate areas, the variable‑length UTF‑8 and UTF‑16 encodings, Chinese‑specific GBK encoding, and practical iOS code examples for handling Unicode, emojis and regular‑expression based Chinese character detection.

GBKUTF-8Unicode
0 likes · 12 min read
The Mystery of Character Encoding: Unicode, UTF‑8, UTF‑16, GBK and Emoji
php中文网 Courses
php中文网 Courses
Sep 26, 2022 · Backend Development

Understanding PHP's JSON_ERROR_UTF16: Unicode Decoding Issues and How to Resolve Them

This article explains why PHP 7.0's json_decode throws JSON_ERROR_UTF16 when encountering malformed Unicode surrogate pairs, demonstrates the problem with example code, and details the underlying re2c scanner logic that causes the error, offering insight for backend developers.

JSONUnicodedecoding
0 likes · 3 min read
Understanding PHP's JSON_ERROR_UTF16: Unicode Decoding Issues and How to Resolve Them
Xianyu Technology
Xianyu Technology
Aug 24, 2022 · Frontend Development

Implementing Length-Limited Input with Unicode and Emoji Support in JavaScript

The article explains how to enforce a custom length limit in JavaScript input fields that counts English letters and numbers as half a unit, Chinese characters as one unit, and each emoji as one unit, using Unicode code‑point detection, regex extraction, and automatic truncation to prevent overflow.

Front-endJavaScriptLength Calculation
0 likes · 10 min read
Implementing Length-Limited Input with Unicode and Emoji Support in JavaScript
IT Services Circle
IT Services Circle
Jun 27, 2022 · Fundamentals

Understanding the Cyrillic Variable Name е vs Latin e in Python

This article explains how the Cyrillic character е looks identical to the Latin e, why using it as a Python variable leads to NameError, demonstrates the Unicode code point differences, and warns about the potential bugs when unintentionally mixing these characters in code.

CyrillicPythonUnicode
0 likes · 3 min read
Understanding the Cyrillic Variable Name е vs Latin e in Python
Tencent Cloud Developer
Tencent Cloud Developer
May 17, 2022 · Fundamentals

A Comprehensive History and Overview of Character Encoding and Unicode

The article traces character encoding from early telegraph and Morse code through ASCII, ISO national sets and Chinese standards, explains Unicode’s unification and its UTF‑8/‑16/‑32 forms, and shows how modern languages—especially JavaScript—handle code points, highlighting the cultural and technical significance for developers.

ASCIIHistoryJavaScript
0 likes · 31 min read
A Comprehensive History and Overview of Character Encoding and Unicode