Why Do You See “锟斤拷” in Text? Uncover the Encoding Mystery
This article explains how character encoding works, using ASCII, Unicode, UTF‑8 and GBK examples to reveal why the garbled string “锟斤拷” appears when mismatched encodings are processed, and shows the underlying byte‑level transformations.
What is the mysterious “锟斤拷”?
In computing, every character is represented by a binary code. The article explains that encoding is simply a mapping from symbols to binary numbers.
ASCII example
For instance, the ASCII code
0100 0001(decimal 65) corresponds to the letter
A.
The Unicode replacement character
�(U+FFFD, 65533) is used when a decoder encounters an unknown byte sequence.
Why “锟斤拷” appears
When a UTF‑8 byte array such as
new byte[] {-25, -119, -25, -116}cannot be decoded, the decoder substitutes the replacement character, which is displayed as “�”.
In GBK, the same six‑byte sequence
0xEFBFBDEFBFBDis split into three two‑byte characters: 0xEFBF, 0xBDEF, 0xBFBD, which correspond to the Chinese characters “锟”, “斤”, and “拷”.
Thus the garbled “锟斤拷” you often see is the result of mismatched encoding between UTF‑8 and GBK.
Now you know the reason behind those strange symbols.
macrozheng
Dedicated to Java tech sharing and dissecting top open-source projects. Topics include Spring Boot, Spring Cloud, Docker, Kubernetes and more. Author’s GitHub project “mall” has 50K+ stars.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.