Fundamentals 5 min read

Decoding Binary Signage: Converting UTF‑8 Binary to Chinese Characters in Java

A programmer discovers a binary message on a multilingual restroom sign, deduces it uses UTF‑8 encoding, explains the encoding rules, and provides a Java program that converts the binary sequence into the Chinese character "向", demonstrating practical UTF‑8 decoding.

Liangxu Linux

Apr 11, 2024

Decoding Binary Signage: Converting UTF‑8 Binary to Chinese Characters in Java

A public restroom sign displays the slogan "向前一小步，文明一大步" in Chinese, English, Japanese, and Korean, followed by a long string of 0s and 1s. The author, noticing the binary pattern, suspects it encodes the Chinese text using UTF‑8.

UTF‑8 is a variable‑length Unicode encoding where a single‑byte character starts with a leading 0, two‑byte characters start with 110, three‑byte characters with 1110, and four‑byte characters with 11110. All continuation bytes begin with 10. By examining the binary string, the first three bits are 1110, indicating a three‑byte UTF‑8 character (24 bits).

The author extracts the first 24 bits, removes the leading marker bits (the red‑highlighted bits in the original image), and obtains the payload 0101 010000 010001. Converting this binary to hexadecimal yields 0x5411, which corresponds to the Unicode code point for the Chinese character "向".

To decode the entire binary message automatically, the author provides a Java method that assembles the binary string, splits it into bytes, converts each byte to an integer, builds a byte array, and constructs a new String from the array. The code is:

private static void toChinese() {
    String bits = "11100101100100001001000111100101100010011000110"
                + "1111001001011100010000000111001011011000010001111111001"
                + "10101011011010010100100000111001101001011010000111111001"
                + "10100110001000111011100100101110001000000011100101101001"
                + "0010100111111001101010110110100101";
    int length = bits.length();
    byte[] bytes = new byte[length >> 3];
    for (int i = 0; i < length; i += 8) {
        String byteString = bits.substring(i, i + 8);
        bytes[i >> 3] = (byte) Integer.parseInt(byteString, 2);
    }
    System.out.println("转换之后的结果：" + new String(bytes));
}

Running the program prints the decoded Chinese text, confirming that the binary sequence indeed represents the character "向" and demonstrating a practical method for converting UTF‑8 binary data to readable characters.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java algorithm Unicode UTF-8 binary encoding character conversion

Written by

Liangxu Linux

Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.