Fundamentals 9 min read

Why Garbled Characters Appear: Exploring ASCII, GB2312, GBK & Unicode

This article explains how character encoding works—from ASCII and its extensions to Chinese GB2312 and GBK, through Unicode's UCS‑2, UCS‑4, and the versatile UTF‑8—showing why mismatched encodings produce garbled text and why UTF‑8 is the default in Spring Boot.

Lobster Programming
Lobster Programming
Lobster Programming
Why Garbled Characters Appear: Exploring ASCII, GB2312, GBK & Unicode

In daily development, garbled characters often appear during data input and output. This article explains why garbled text occurs from the perspective of character encoding.

1. ASCII Code

ASCII stores visible characters (letters, digits, punctuation) and control characters as 7‑bit binary. The original 128 characters were later extended to 256 by setting the highest bit to 1, creating an extended character set.

2. GB2312

GB2312 uses 16‑bit encoding to cover Chinese characters. It divides the character set into 94 zones, each containing 94 positions, totaling 8 836 code points. Example: the character “白” is located at zone 16, row 3, column 7, giving code 1637, which after conversion becomes 0xB0C5.

3. GBK

GBK expands GB2312 by using previously unused code points and allowing high bytes >127 to indicate the start of a Chinese character, adding about 20 000 characters.

4. Unicode

Unicode standardizes character sets worldwide. Early Unicode used UCS‑2 (16‑bit) to represent up to 65 536 characters. Later UCS‑4 (32‑bit) extended the range to about 4.2 billion characters, though its size limited adoption.

4.1 UTF‑8

UTF‑8 is a variable‑length encoding compatible with ASCII. One‑byte sequences match ASCII, two‑byte sequences start with 110, three‑byte with 1110, and four‑byte with 11110. For example, the Chinese character “王” (U+738B) is encoded as three bytes in UTF‑8.

Spring Boot defaults to UTF‑8, which is recommended for Java web applications because it supports most languages and reduces transmission size.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Spring BootUnicodeUTF-8character encodingASCIIGBKGB2312
Lobster Programming
Written by

Lobster Programming

Sharing insights on technical analysis and exchange, making life better through technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.