Fundamentals 5 min read

How I Decoded a Mysterious Binary Message into Chinese Characters

During a Spring Festival park visit, I discovered a binary‑encoded poster, used OCR and custom Go code to translate the bits, learned it was UTF‑8 Unicode for Chinese characters, and explained the byte‑pattern rules that reveal the hidden message.

Xiao Lou's Tech Notes

Feb 19, 2024

How I Decoded a Mysterious Binary Message into Chinese Characters

Hello, everyone, happy Spring Festival, I’m Xiao Lou.

While strolling in a park with my child, I found an intriguing poster in the restroom that displayed the phrase “Take one small step forward, civilization takes a giant leap” in several programming languages, but the binary version caught my eye.

To understand the binary, I first tried extracting the text with WeChat’s built‑in OCR, which missed many characters. After searching for OCR tools, I used a small WeChat mini‑program that successfully recognized the text and produced a 248‑bit binary string.

The length (248 bits) is divisible by 8, so I guessed it might be ASCII, with each byte representing a character. I wrote a simple Go program to convert every 8 bits to a character.

11100101100100001001000111100101100010011000110111100100101110001000000011100101101100001000111111100110101011011010010100100000111001101001011010000111111001101001100010001110111001001011100010000000111001011010010010100111
111001101010110110100101

func TestBinary(t *testing.T) {
    str := "11100101100100001001000111100101100010011000110111100100101110001000000011100101101100001000111111100110101011011010010100100000111001101001011010000111111001101001100010001110111001001011100010000000111001011010010010100111111001101010110110100101"
    for i := 0; i < len(str); i = i + 8 {
        code, _ := strconv.ParseInt(str[i:i+8], 2, 64)
        fmt.Printf("%c", int(code))
    }
}

The program produced garbled output, so I turned to Baidu’s AI, which returned a Python script that correctly decoded the binary. Running it revealed that the data is not ASCII but UTF‑8 encoded Chinese characters.

UTF‑8 stores Unicode characters using 1 to 4 bytes. The byte‑pattern rules are:

1‑byte: 0xxxxxxx

2‑byte: 110xxxxx 10xxxxxx

3‑byte: 1110xxxx 10xxxxxx 10xxxxxx

4‑byte: 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

Applying these rules to the binary, the first character’s bytes are 1110 0101 1001 0000 1001 0001, which matches the 3‑byte pattern. Removing the prefix yields the Unicode code point 0101 0100 0001 0001 (hex 0x5411), which corresponds to the Chinese character “向”.

Using the same method, the remaining characters can be extracted. Python’s chr function, which processes the data 8 bits at a time, also produces the correct result, demonstrating the flexibility of Unicode handling.

This exploration shows how a seemingly cryptic binary image can be decoded into meaningful text by understanding UTF‑8 encoding.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python Unicode UTF-8 Binary Decoding

Written by

Xiao Lou's Tech Notes

Backend technology sharing, architecture design, performance optimization, source code reading, troubleshooting, and pitfall practices

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.