Tagged articles
102 articles
Page 1 of 2
37 Interactive Technology Team
37 Interactive Technology Team
Jan 14, 2026 · Frontend Development

Hidden Zero‑Width Characters: How They Sabotage Front‑End Apps and How to Detect Them

Zero‑width characters are invisible Unicode symbols that can silently break form validation, URL parsing, and data storage in web applications, but with proper detection, visualization, and input‑filtering techniques developers can mitigate these hidden risks and even use them for legitimate purposes.

Unicodedata integritydebugging
0 likes · 6 min read
Hidden Zero‑Width Characters: How They Sabotage Front‑End Apps and How to Detect Them
ITPUB
ITPUB
Dec 2, 2025 · Frontend Development

Why Do Some Emojis Count as Multiple Characters in JavaScript?

When debugging a web app, the author discovered that certain emojis occupy more than one character slot in JavaScript strings, revealing that emoji length varies because they are composed of multiple Unicode code points such as variation selectors and zero‑width joiners.

EmojiJavaScriptUnicode
0 likes · 4 min read
Why Do Some Emojis Count as Multiple Characters in JavaScript?
ITPUB
ITPUB
Nov 3, 2025 · Databases

Why MySQL’s “utf8” Isn’t Real UTF‑8 and How utf8mb4 Fixes It

Although MySQL historically labeled its three‑byte character set as “utf8”, it actually implements a truncated version (utf8mb3) that cannot store the full Unicode range, leading to bugs with emojis and rare characters; the newer utf8mb4 restores true UTF‑8 support and is now the default in MySQL 8.0.

Character SetUnicodemysql
0 likes · 7 min read
Why MySQL’s “utf8” Isn’t Real UTF‑8 and How utf8mb4 Fixes It
Java Tech Enthusiast
Java Tech Enthusiast
Jun 19, 2025 · Fundamentals

Why Apple Emojis Look Different on Android—and What It Means for Users

Apple's exclusive emojis often appear as unrelated symbols on Android devices, leading to misunderstandings; this article explains the technical reasons behind the display differences, shares real‑world examples, and traces the history and global adoption of emojis from their Japanese origins to today’s Unicode standards.

EmojiUnicodeUser experience
0 likes · 4 min read
Why Apple Emojis Look Different on Android—and What It Means for Users
Open Source Linux
Open Source Linux
Apr 30, 2025 · Information Security

Why Linus Torvalds Calls Case‑Insensitive Filesystems a Massive Mistake

Linus Torvalds sharply criticized the case‑insensitive feature in file systems as a huge mistake, warning that it introduces serious security vulnerabilities by allowing mismatched filenames and Unicode characters to be treated as equivalent, undermining user‑space security checks and exposing systems to attacks.

Case InsensitivityUnicodefile system
0 likes · 2 min read
Why Linus Torvalds Calls Case‑Insensitive Filesystems a Massive Mistake
Full-Stack Cultivation Path
Full-Stack Cultivation Path
Feb 10, 2025 · Frontend Development

Mastering Emoji in Front‑End Development

This article explains the Unicode foundation of Emoji, shows how to insert and style them in HTML, CSS and JavaScript, discusses common pitfalls with surrogate pairs and string slicing, and presents modern solutions such as Intl.Segmenter and libraries like grapheme‑splitter and emoji‑regex for reliable handling.

CSSEmojiIntl.Segmenter
0 likes · 14 min read
Mastering Emoji in Front‑End Development
Alibaba Cloud Developer
Alibaba Cloud Developer
Dec 23, 2024 · Databases

Why MySQL Strings Get Garbled: Mastering Charset and Collation

This article dives deep into MySQL's charset and collation system, explaining concepts, configuration levels, system variables, string literals, conversion rules, Unicode sorting algorithms, binary collations, and practical tips to avoid common encoding pitfalls and ensure correct string handling.

CharsetUnicodecollation
0 likes · 57 min read
Why MySQL Strings Get Garbled: Mastering Charset and Collation
21CTO
21CTO
Oct 7, 2024 · Frontend Development

What’s New in ECMAScript 2024? Exploring Six Major TC39 Proposals

This article reviews the six key ECMAScript 2024 proposals—including well‑formed Unicode strings, asynchronous atomic wait, the new RegExp v flag, ArrayBuffer transfer, array grouping, and Promise.withResolvers—explaining their purpose, API changes, and providing runnable code examples.

ArrayBufferECMAScriptJavaScript
0 likes · 9 min read
What’s New in ECMAScript 2024? Exploring Six Major TC39 Proposals
21CTO
21CTO
Jul 30, 2024 · Frontend Development

What’s New in ECMAScript 2024? Key Features and Their Impact on JavaScript Development

The article reviews ECMAScript 2024, highlighting new small‑scale features such as improved WebAssembly interop, enhanced Promise utilities, group‑by methods, better Unicode handling, async locking with Atomics.waitAsync, and resizable ArrayBuffers, while also discussing upcoming proposals for 2025.

AsyncECMAScript 2024JavaScript
0 likes · 17 min read
What’s New in ECMAScript 2024? Key Features and Their Impact on JavaScript Development
Java Tech Enthusiast
Java Tech Enthusiast
Jul 27, 2024 · Fundamentals

The Story Behind the Creation of UTF-8 and Its Advantages

Rob Pike and Ken Thompson devised UTF‑8 in 1992 at Bell Labs, turning a three‑day prototype into the web’s dominant Unicode encoding by using a variable‑length, ASCII‑compatible, length‑prefixed and prefix‑free scheme that maximizes efficiency, robustness, and universal adoption across more than 96 % of sites.

UTF-8Unicodeencoding
0 likes · 6 min read
The Story Behind the Creation of UTF-8 and Its Advantages
macrozheng
macrozheng
Jul 19, 2024 · Backend Development

Master Java Obfuscation: 5 Crazy Tricks to Write Unreadable Code

This article reveals five advanced Java tricks—using Unicode escapes in comments, over‑complicating simple logic with bitwise shifts, tampering with Boolean.TRUE via reflection, forcing both branches of an if‑else to run, and leveraging the Unsafe class for low‑level memory manipulation—to deliberately make code hard to understand.

ReflectionUnicodebitwise
0 likes · 12 min read
Master Java Obfuscation: 5 Crazy Tricks to Write Unreadable Code
Liangxu Linux
Liangxu Linux
Jul 14, 2024 · Fundamentals

Decoding Chinese Text: ASCII, GB2312, GBK, GB18030, and UTF‑8 Explained

This article explains how computer text is represented by assigning unique numeric codes to characters and converting those codes into binary, then compares the most common Chinese encodings—ASCII, GB2312, GBK, GB18030, and UTF‑8—detailing their compatibility, byte lengths, and practical impact on software development.

ASCIIGB18030GB2312
0 likes · 14 min read
Decoding Chinese Text: ASCII, GB2312, GBK, GB18030, and UTF‑8 Explained
Architecture Digest
Architecture Digest
Jun 2, 2024 · Fundamentals

Understanding Unicode, UTF-16, and String Length Issues in JavaScript

This article explains why JavaScript string length behaves unexpectedly with Unicode characters, describes UTF‑16 encoding and surrogate pairs, and demonstrates ES6 techniques such as for‑of loops, spread syntax, the u regex flag, codePointAt, and normalize to handle Unicode correctly.

JavaScriptUTF-16Unicode
0 likes · 7 min read
Understanding Unicode, UTF-16, and String Length Issues in JavaScript
Java Tech Enthusiast
Java Tech Enthusiast
Apr 21, 2024 · Fundamentals

Decoding Binary UTF-8 Signage in a Public Restroom Using Java

The article explains how a binary message on a multilingual public‑restroom sign was decoded by identifying UTF‑8 byte patterns, extracting the first 24 bits to reveal the Chinese character “向”, and providing a Java program that parses the entire bit string into readable Chinese text.

UTF-8Unicodebinary encoding
0 likes · 4 min read
Decoding Binary UTF-8 Signage in a Public Restroom Using Java
Aikesheng Open Source Community
Aikesheng Open Source Community
Dec 11, 2023 · Databases

Understanding utf8mb4 and Its Advantages in MySQL 8.0

This article explains the differences between utf8, utf8mb3 and utf8mb4 character sets in MySQL, demonstrates how utf8mb4 enables full Unicode support including emojis, and provides step‑by‑step SQL examples for creating tables, inserting data, and querying results with the proper character set.

EmojiUnicodemysql
0 likes · 12 min read
Understanding utf8mb4 and Its Advantages in MySQL 8.0
Sohu Tech Products
Sohu Tech Products
Dec 6, 2023 · Frontend Development

The Nuances of Base64 Encoding Strings in JavaScript

The article explains that JavaScript’s native btoa() and atob() functions only handle ASCII, so to correctly base64‑encode Unicode strings you must convert them with TextEncoder to UTF‑8 bytes, use Uint8Array, and decode with TextDecoder, while checking for malformed surrogate pairs via isWellFormed or encodeURIComponent to avoid silent data loss.

Base64JavaScriptTextDecoder
0 likes · 14 min read
The Nuances of Base64 Encoding Strings in JavaScript
Programmer DD
Programmer DD
Nov 23, 2023 · Fundamentals

Explore Java 21’s New Emoji Detection APIs

Java 21 introduces six static methods in java.lang.Character that let developers reliably detect emojis, emoji presentations, modifiers, modifier bases, components, and extended pictographic characters using Unicode code points, with practical code examples and regex usage.

EmojiUnicode
0 likes · 4 min read
Explore Java 21’s New Emoji Detection APIs
Liangxu Linux
Liangxu Linux
Jul 17, 2023 · Fundamentals

Mastering Character Encodings: From ANSI to UTF‑8 and Beyond

This guide explains the essential character set encodings—ANSI, ASCII, GB2312/GBK/GB18030, Unicode planes, UTF‑16, UTF‑32, and UTF‑8—and shows how they are used in MFC and Qt, providing code examples to avoid garbled text in software.

ANSIMFCQt
0 likes · 8 min read
Mastering Character Encodings: From ANSI to UTF‑8 and Beyond
Sohu Tech Products
Sohu Tech Products
Jul 12, 2023 · Fundamentals

The Mystery of Character Encoding: Unicode, UTF‑8, UTF‑16, GBK and Emoji

This article explains the fundamentals of character encoding, covering Unicode’s universal character set, the structure of its planes and surrogate areas, the variable‑length UTF‑8 and UTF‑16 encodings, Chinese‑specific GBK encoding, and practical iOS code examples for handling Unicode, emojis and regular‑expression based Chinese character detection.

EmojiGBKUTF-8
0 likes · 12 min read
The Mystery of Character Encoding: Unicode, UTF‑8, UTF‑16, GBK and Emoji
Alipay Experience Technology
Alipay Experience Technology
Apr 23, 2023 · Fundamentals

Decoding Emoji: Unicode, Variants, and JavaScript Handling

This article explains how emojis are represented in Unicode, covering basic code points, variation selectors, skin‑tone modifiers, zero‑width joiners, flag ligatures, tag sequences and keycap symbols, and shows how JavaScript can correctly process them using grapheme‑cluster techniques.

EmojiUnicodecharacter encoding
0 likes · 16 min read
Decoding Emoji: Unicode, Variants, and JavaScript Handling
Alibaba Terminal Technology
Alibaba Terminal Technology
Sep 28, 2022 · Fundamentals

What’s New in TC39? Exploring Array.fromAsync, Unicode Validation, and Extractors

The article reviews the latest TC39 meeting outcomes, explains the criteria for advancing proposals to higher stages, and dives into three key proposals—Array.fromAsync for async iteration, String.prototype.isWellFormed for Unicode scalar validation, and extractors for pattern matching—complete with code examples and usage notes.

Array.fromAsyncExtractorsTC39
0 likes · 11 min read
What’s New in TC39? Exploring Array.fromAsync, Unicode Validation, and Extractors
Xianyu Technology
Xianyu Technology
Aug 24, 2022 · Frontend Development

Implementing Length-Limited Input with Unicode and Emoji Support in JavaScript

The article explains how to enforce a custom length limit in JavaScript input fields that counts English letters and numbers as half a unit, Chinese characters as one unit, and each emoji as one unit, using Unicode code‑point detection, regex extraction, and automatic truncation to prevent overflow.

EmojiFront-endJavaScript
0 likes · 10 min read
Implementing Length-Limited Input with Unicode and Emoji Support in JavaScript
Programmer DD
Programmer DD
Aug 16, 2022 · Fundamentals

Why Does JavaScript .length Miscount Emoji? A Deep Dive into UTF‑16 and Unicode

This article explains why JavaScript's string length property returns unexpected values for Unicode characters like emojis, explores UTF‑16 encoding rules, and demonstrates modern ES6 techniques—including for‑of loops, spread syntax, and the \u{…} and /u regex flags—to correctly handle Unicode strings.

EmojiJavaScriptUTF-16
0 likes · 9 min read
Why Does JavaScript .length Miscount Emoji? A Deep Dive into UTF‑16 and Unicode
IT Services Circle
IT Services Circle
Jun 27, 2022 · Fundamentals

Understanding the Cyrillic Variable Name е vs Latin e in Python

This article explains how the Cyrillic character е looks identical to the Latin e, why using it as a Python variable leads to NameError, demonstrates the Unicode code point differences, and warns about the potential bugs when unintentionally mixing these characters in code.

CyrillicPythonUnicode
0 likes · 3 min read
Understanding the Cyrillic Variable Name е vs Latin e in Python
Tencent Cloud Developer
Tencent Cloud Developer
May 17, 2022 · Fundamentals

A Comprehensive History and Overview of Character Encoding and Unicode

The article traces character encoding from early telegraph and Morse code through ASCII, ISO national sets and Chinese standards, explains Unicode’s unification and its UTF‑8/‑16/‑32 forms, and shows how modern languages—especially JavaScript—handle code points, highlighting the cultural and technical significance for developers.

ASCIIJavaScriptUTF-16
0 likes · 31 min read
A Comprehensive History and Overview of Character Encoding and Unicode
Liangxu Linux
Liangxu Linux
May 13, 2021 · Fundamentals

Why Does Text Become Garbled? A Deep Dive into UTF‑8, GBK, and Unicode

This article explains why characters appear as garbled text when encoding and decoding methods mismatch, explores how Excel defaults to GBK, shows how to convert files with iconv, and walks through the evolution from ASCII to GB2312, GBK, GB18030, and finally Unicode's UTF‑8 encoding.

Character SetGBKUTF-8
0 likes · 6 min read
Why Does Text Become Garbled? A Deep Dive into UTF‑8, GBK, and Unicode
Python Programming Learning Circle
Python Programming Learning Circle
Apr 22, 2021 · Fundamentals

Common Built‑in String Methods in Python

This article introduces Python’s built‑in string class and explains ten commonly used string methods—including center, count, find, swapcase, startswith/endswith, split, case conversions, justification, strip, and zfill—detailing their syntax, parameters, and example usages while emphasizing that they return new strings without altering the original.

PythonString MethodsUnicode
0 likes · 9 min read
Common Built‑in String Methods in Python
ByteFE
ByteFE
Feb 10, 2021 · Frontend Development

Handling Unicode and Supplementary Characters in JavaScript

This article explains how JavaScript processes Unicode characters, demonstrates the limitations of legacy APIs like charCodeAt and fromCharCode with supplementary characters, and introduces modern methods such as codePointAt, fromCodePoint, Unicode escape syntax, surrogate pairs, and polyfills for full Unicode support.

JavaScriptSurrogate PairUTF-8
0 likes · 10 min read
Handling Unicode and Supplementary Characters in JavaScript
macrozheng
macrozheng
Feb 8, 2021 · Fundamentals

Why Do You See “锟斤拷” in Text? Uncover the Encoding Mystery

This article explains how character encoding works, using ASCII, Unicode, UTF‑8 and GBK examples to reveal why the garbled string “锟斤拷” appears when mismatched encodings are processed, and shows the underlying byte‑level transformations.

ASCIIGBKUTF-8
0 likes · 4 min read
Why Do You See “锟斤拷” in Text? Uncover the Encoding Mystery
FunTester
FunTester
Sep 8, 2020 · Fundamentals

Mastering Java String Whitespace: trim, strip, replace and More

This article explains the various Java string methods for removing leading and trailing whitespace—including trim, strip, stripLeading, stripTrailing, replace, replaceAll, and replaceFirst—showing their differences, Unicode handling, and practical code examples.

.trimStringUnicode
0 likes · 8 min read
Mastering Java String Whitespace: trim, strip, replace and More
Programmer DD
Programmer DD
Jul 22, 2020 · Fundamentals

Why Java’s char Can’t Represent All Unicode Characters – Understanding UTF‑16 and Code Points

This article explains how Java’s char type stores Unicode code units in UTF‑16, why its range of \u0000 to \uffff limits direct representation of newer Unicode characters, and how methods like String.length, getBytes, and code‑point APIs help handle multi‑byte characters such as emojis and rare Chinese glyphs.

Code PointsStringUTF-16
0 likes · 10 min read
Why Java’s char Can’t Represent All Unicode Characters – Understanding UTF‑16 and Code Points
FunTester
FunTester
Jul 3, 2020 · Backend Development

Handling Unicode Encoding Issues and Database Transaction Rollback in Java Services

The article explains a character‑encoding pitfall caused by a Python middle‑layer converting parameters to Unicode, provides a Java utility to decode escaped Unicode strings, and demonstrates how to use Spring's @Transactional annotation with rollbackFor to ensure database operations are rolled back on errors.

Unicodeencodingjava
0 likes · 4 min read
Handling Unicode Encoding Issues and Database Transaction Rollback in Java Services
Laravel Tech Community
Laravel Tech Community
Apr 25, 2020 · Backend Development

Integrating Emoji Support in Laravel Using the Laravel‑Emoji Package

This article explains how to add Unicode emoji rendering to a Laravel application by installing the Laravel‑Emoji extension, configuring Composer, registering the service provider and alias, defining a route, and using the Emoji class in a controller to convert aliases, names, and Unicode codes into emoji graphics.

BackendComposerEmoji
0 likes · 4 min read
Integrating Emoji Support in Laravel Using the Laravel‑Emoji Package
Architect's Tech Stack
Architect's Tech Stack
Dec 30, 2019 · Fundamentals

New Features and Enhancements in Java 11

The article provides a comprehensive overview of Java 11’s new capabilities, including Unicode 10 support, the standardization of HttpClient, collection API improvements, dynamic compiler threads, advanced garbage collectors, nest‑based access control, added cryptographic algorithms, lambda‑var syntax, single‑file execution, Flight Recorder integration, removed modules, upgrade recommendations, and useful migration tools.

FlightRecorderGarbageCollectionHttpClient
0 likes · 10 min read
New Features and Enhancements in Java 11
ITPUB
ITPUB
Oct 10, 2019 · Databases

Why MySQL’s “utf8” Isn’t Real UTF‑8 and How to Switch to utf8mb4

The article explains that MySQL’s legacy “utf8” charset only supports three‑byte characters, causing errors when storing true four‑byte UTF‑8 symbols like emojis, and shows why switching to the proper “utf8mb4” charset is essential for correct Unicode handling.

Character SetMariaDBUnicode
0 likes · 8 min read
Why MySQL’s “utf8” Isn’t Real UTF‑8 and How to Switch to utf8mb4
Seewo Tech Circle
Seewo Tech Circle
Aug 30, 2019 · Fundamentals

Demystifying Character Encoding: From ASCII to Unicode and Beyond

This article explains the fundamentals of character encoding, covering concepts such as information, symbols, character sets, various encoding schemes like ASCII, GB2312, UTF‑8, Unicode planes, common pitfalls, and practical examples to help developers avoid garbled text.

GB2312UTF-8Unicode
0 likes · 9 min read
Demystifying Character Encoding: From ASCII to Unicode and Beyond
JD Tech
JD Tech
Dec 18, 2018 · Fundamentals

Understanding Character Encoding: Bits, Bytes, Unicode, UTF-8, UTF-16, and UTF-32

This article explains the origins of character sets, the relationships among various encodings such as ASCII, GB2312, GBK, GB18030, Unicode, UTF-8, UTF-16, and UTF-32, and shows how JavaScript handles Unicode and emoji characters, including practical code examples and solutions for length‑limited input fields.

UTF-16UTF-8Unicode
0 likes · 11 min read
Understanding Character Encoding: Bits, Bytes, Unicode, UTF-8, UTF-16, and UTF-32
Tencent Music Tech Team
Tencent Music Tech Team
Feb 9, 2018 · Mobile Development

Understanding String Encoding in Android JNI: From Native Crash to Source Code Analysis

This article investigates an Android JNI native crash caused by misusing NewString(), examines why a custom UTF‑8‑to‑UTF‑16 conversion was used instead of NewStringUTF(), compares Dalvik and ART string encodings, reveals a Dalvik UTF‑8 conversion bug fixed in ART, and advises developers on encoding nuances across Android versions.

ARTAndroidDalvik
0 likes · 26 min read
Understanding String Encoding in Android JNI: From Native Crash to Source Code Analysis
MaGe Linux Operations
MaGe Linux Operations
Jan 15, 2018 · Fundamentals

Mastering Character Encoding in Python: From ASCII to UTF‑8

This article explains the fundamental concepts of characters, character sets, and encodings, compares common encodings such as ASCII, Unicode, and UTF‑8, and shows how Python 2 and Python 3 handle default encodings, string types, and common Unicode errors with practical code examples.

UTF-8Unicodecharacter encoding
0 likes · 14 min read
Mastering Character Encoding in Python: From ASCII to UTF‑8
ITPUB
ITPUB
Sep 28, 2017 · Databases

What’s New in MySQL 8.0 RC1? Key Features for Modern Apps

MySQL 8.0 RC1 introduces major enhancements such as improved JSON handling, a full‑featured Document Store with transaction support, expanded Unicode and GIS capabilities, plus a suite of modern SQL features designed to better serve mobile‑first and cloud‑native applications.

Database FeaturesDocument StoreJSON
0 likes · 5 min read
What’s New in MySQL 8.0 RC1? Key Features for Modern Apps
Tencent IMWeb Frontend Team
Tencent IMWeb Frontend Team
Aug 28, 2017 · Information Security

7 Surprising JavaScript Tricks to Bypass XSS Filters

This article reveals a collection of unconventional JavaScript techniques—including regex replacement, Unicode escapes, eval tricks, unusual operator combinations, custom getters/setters, and URL‑encoded payloads—that can evade common XSS filters and strengthen your understanding of web security.

BypassUnicodeXSS
0 likes · 10 min read
7 Surprising JavaScript Tricks to Bypass XSS Filters
Aotu Lab
Aotu Lab
Jun 30, 2017 · Fundamentals

Why Do escape and encodeURI Encode URLs Differently? Explore Percent-Encoding

This article explains the differences between JavaScript’s escape, encodeURI, and encodeURIComponent functions, detailing their encoding rules, percent‑encoding standards, reserved and unreserved characters, and how Unicode characters are transformed into UTF‑8 byte sequences, while also covering ASCII, Unicode, and UTF‑8 fundamentals.

JavaScriptURL encodingUTF-8
0 likes · 11 min read
Why Do escape and encodeURI Encode URLs Differently? Explore Percent-Encoding
MaGe Linux Operations
MaGe Linux Operations
Mar 2, 2017 · Fundamentals

What Changed in Python 3.0? Key Differences and Migration Tips

This article explains the major changes introduced in Python 3.0—including the new print() function, unified Unicode strings, altered division behavior, updated exception syntax, removal of xrange, revised literal formats, module renames, and data‑type updates—while offering guidance for migrating existing Python 2 code.

ExceptionsPython 3Unicode
0 likes · 10 min read
What Changed in Python 3.0? Key Differences and Migration Tips
ITPUB
ITPUB
Sep 19, 2016 · Fundamentals

Understanding Character Encoding: From ASCII to Unicode and UTF‑8

This article explains the fundamentals of character encoding, covering the evolution from the 7‑bit ASCII standard to Chinese GB2312, the development of Unicode and UTF‑8, and provides practical guidance for handling these encodings in Windows and Linux C programs, including a sample UTF‑8 detection function.

ASCIIC ProgrammingGB2312
0 likes · 13 min read
Understanding Character Encoding: From ASCII to Unicode and UTF‑8
ITPUB
ITPUB
Jan 22, 2016 · Fundamentals

Mastering Python Encoding Errors with Custom Error Handlers

This article explains Python's two-step encoding conversion, the built‑in error handling options for decode/encode, and how to register custom error handlers to gracefully process mixed‑encoding text and avoid UnicodeDecodeError exceptions.

Custom DecoderUnicodeencoding
0 likes · 4 min read
Mastering Python Encoding Errors with Custom Error Handlers
21CTO
21CTO
Jan 4, 2016 · Fundamentals

Why Chinese Text Gets Garbled and How to Fix It: A Deep Dive into Encoding Standards

This article explains why Chinese characters often appear as garbled text on Windows and Linux, introduces the history and hierarchy of Chinese encoding standards such as GB2312, GBK, GB18030 and Unicode, compares ASCII, UTF‑8/16/32, shows practical command‑line experiments, and offers guidance for handling Chinese text in C and Python programs.

GB2312PythonUTF-8
0 likes · 25 min read
Why Chinese Text Gets Garbled and How to Fix It: A Deep Dive into Encoding Standards