Fundamentals 9 min read

Why Garbled Characters Appear: Exploring ASCII, GB2312, GBK & Unicode

This article explains how character encoding works—from ASCII and its extensions to Chinese GB2312 and GBK, through Unicode's UCS‑2, UCS‑4, and the versatile UTF‑8—showing why mismatched encodings produce garbled text and why UTF‑8 is the default in Spring Boot.

Lobster Programming
Lobster Programming
Lobster Programming
Why Garbled Characters Appear: Exploring ASCII, GB2312, GBK & Unicode

In daily development, garbled characters often appear during data input and output. This article explains why garbled text occurs from the perspective of character encoding.

1. ASCII Code

ASCII stores visible characters (letters, digits, punctuation) and control characters as 7‑bit binary. The original 128 characters were later extended to 256 by setting the highest bit to 1, creating an extended character set.

2. GB2312

GB2312 uses 16‑bit encoding to cover Chinese characters. It divides the character set into 94 zones, each containing 94 positions, totaling 8 836 code points. Example: the character “白” is located at zone 16, row 3, column 7, giving code 1637, which after conversion becomes 0xB0C5.

3. GBK

GBK expands GB2312 by using previously unused code points and allowing high bytes >127 to indicate the start of a Chinese character, adding about 20 000 characters.

4. Unicode

Unicode standardizes character sets worldwide. Early Unicode used UCS‑2 (16‑bit) to represent up to 65 536 characters. Later UCS‑4 (32‑bit) extended the range to about 4.2 billion characters, though its size limited adoption.

4.1 UTF‑8

UTF‑8 is a variable‑length encoding compatible with ASCII. One‑byte sequences match ASCII, two‑byte sequences start with 110, three‑byte with 1110, and four‑byte with 11110. For example, the Chinese character “王” (U+738B) is encoded as three bytes in UTF‑8.

Spring Boot defaults to UTF‑8, which is recommended for Java web applications because it supports most languages and reduces transmission size.

Spring BootUnicodeUTF-8character encodingASCIIGBKGB2312
Lobster Programming
Written by

Lobster Programming

Sharing insights on technical analysis and exchange, making life better through technology.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.