Why MySQL Shows Garbled Text and How to Fix It
This article explains the root causes of MySQL character‑set garbling, details the encoding and decoding steps during data storage and retrieval, and provides practical methods—including dump/reload and binary conversion—to prevent and repair corrupted text.
MySQL garbled text causes
Garbling occurs when the character set used by the client, the MySQL server (character_set_client), and the table charset are not consistent during either insertion or retrieval, leading to mismatched encoding/decoding steps.
Encoding process when storing data
The data flow includes:
Input method in the terminal produces characters.
The terminal encodes characters into a binary stream.
The binary stream is sent to the MySQL server via the client.
The server decodes the stream using character_set_client.
If the client charset differs from the table charset, a conversion from client‑charset to table‑charset is performed.
The converted binary stream is written to the table file.
Encoding process when retrieving data
The server reads the binary stream from the table file.
The stream is decoded using the table charset.
The data is re‑encoded to character_set_client.
The server encodes it back to a binary stream.
The stream is transmitted to the client.
The client displays the result using its configured charset.
Root causes of garbled text
Inconsistent charsets between storage and retrieval steps (e.g., UTF‑8 input, GBK client).
Inconsistent charsets within a single three‑step process (e.g., terminal UTF‑8, server GBK, table UTF‑8).
Encoding/decoding in MySQL
MySQL must decode incoming binary streams to parse SQL syntax and must decode stored binary data to allow character‑level operations such as SELECT LEFT(col,2). Without these steps, the storage engine cannot manipulate character data.
Lossless vs. lossy conversion
A conversion is lossless only if the source character exists in the target charset and the target charset can represent unsupported characters without data loss (e.g., Latin‑1 may preserve original bytes). UTF‑8 to GBK is often lossy because GBK lacks many Unicode characters.
How to avoid garbled text
Ensure the client, character_set_client, and table charset are identical. This three‑way consistency eliminates garbling.
How to repair corrupted data
Two correct approaches are recommended:
Method 1: Dump & Reload
Export the data using the original (incorrect) charset, create a new table with the correct charset, then import the dump.
mysqldump -u root -p -d --skip-set-charset \
--default-character-set=utf8 test charset_test_latin1 > data.sql
# Verify the dump file is UTF‑8 encoded
mysql -uroot -p -e "CREATE TABLE charset_test_latin1 (id INT PRIMARY KEY AUTO_INCREMENT, char_col VARCHAR(50)) CHARSET=utf8" test
mysql -uroot -p --default-character-set=utf8 test < data.sqlMethod 2: Convert to Binary & Convert Back
Change the column to a binary type to preserve raw bytes, then alter it back to the desired charset.
ALTER TABLE charset_test_latin1 MODIFY COLUMN char_col VARBINARY(50);
ALTER TABLE charset_test_latin1 MODIFY COLUMN char_col VARCHAR(50) CHARACTER SET utf8;Incorrect methods to avoid
Using ALTER TABLE ... CHARSET=xxx only changes the default charset for future columns and does not affect existing data. Using ALTER TABLE ... CONVERT TO CHARACTER SET ... on a table that already contains garbled data can further corrupt the data.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
