Why Garbled Characters Appear: Exploring ASCII, GB2312, GBK & Unicode
This article explains how character encoding works—from ASCII and its extensions to Chinese GB2312 and GBK, through Unicode's UCS‑2, UCS‑4, and the versatile UTF‑8—showing why mismatched encodings produce garbled text and why UTF‑8 is the default in Spring Boot.
In daily development, garbled characters often appear during data input and output. This article explains why garbled text occurs from the perspective of character encoding.
1. ASCII Code
ASCII stores visible characters (letters, digits, punctuation) and control characters as 7‑bit binary. The original 128 characters were later extended to 256 by setting the highest bit to 1, creating an extended character set.
2. GB2312
GB2312 uses 16‑bit encoding to cover Chinese characters. It divides the character set into 94 zones, each containing 94 positions, totaling 8 836 code points. Example: the character “白” is located at zone 16, row 3, column 7, giving code 1637, which after conversion becomes 0xB0C5.
3. GBK
GBK expands GB2312 by using previously unused code points and allowing high bytes >127 to indicate the start of a Chinese character, adding about 20 000 characters.
4. Unicode
Unicode standardizes character sets worldwide. Early Unicode used UCS‑2 (16‑bit) to represent up to 65 536 characters. Later UCS‑4 (32‑bit) extended the range to about 4.2 billion characters, though its size limited adoption.
4.1 UTF‑8
UTF‑8 is a variable‑length encoding compatible with ASCII. One‑byte sequences match ASCII, two‑byte sequences start with 110, three‑byte with 1110, and four‑byte with 11110. For example, the Chinese character “王” (U+738B) is encoded as three bytes in UTF‑8.
Spring Boot defaults to UTF‑8, which is recommended for Java web applications because it supports most languages and reduces transmission size.
Lobster Programming
Sharing insights on technical analysis and exchange, making life better through technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.