Databases 13 min read

Why MySQL Shows Garbled Text and How to Fix It

This article explains the root causes of MySQL character‑set mojibake, illustrates the encoding/decoding steps during data insertion and retrieval, and provides reliable methods to prevent and repair corrupted text.

ITPUB

Mar 16, 2017

Why MySQL Shows Garbled Text and How to Fix It

This article details why MySQL can produce garbled characters (mojibake) and offers concrete solutions. It first explains the end‑to‑end encoding/decoding flow when data is stored in MySQL and when it is read back, highlighting three conversion points in each direction.

Encoding process when inserting data

The client (e.g., a Bash terminal, web page, or app) converts typed characters to a binary stream using its own character set, sends the stream to the MySQL server, the server decodes it according to character_set_client, checks whether this charset matches the target table’s charset, and if not performs a conversion before finally writing the binary data to the table file.

Decoding process when retrieving data

When MySQL reads the stored binary data, it decodes it using the table’s charset, re‑encodes it to character_set_client, sends the binary stream back to the client, and the client displays the result using its own terminal charset.

Common causes of mojibake

Inconsistent charsets between the client, character_set_client, and the table during either insertion or retrieval.

Three‑step conversion mismatches within a single flow (e.g., Bash uses UTF‑8, MySQL server expects GBK, table is UTF‑8).

When a character cannot be represented in the target charset, MySQL performs a lossy conversion, often replacing the unknown character with a placeholder such as ‘?’ or the Unicode replacement character U+FFFD.

Lossless vs. lossy conversion

A conversion is lossless only if the character exists in both source and target charsets and the target charset can preserve unsupported characters (some, like latin1, keep the original byte sequence). Otherwise, data loss occurs.

How to avoid mojibake

Ensure the three charsets—client, character_set_client, and table charset—are identical. This guarantees no corruption.

Repairing already corrupted data

Incorrect approaches such as ALTER TABLE ... CHARSET=xxx (which only changes defaults) or ALTER TABLE ... CONVERT TO CHARACTER SET ... (which can further damage already mismatched data) should be avoided.

Two reliable methods are recommended:

Dump & Reload : Export the data with the original (incorrect) charset, create a new table with the correct charset, then import the dump.

mysqldump -u root -p -d --skip-set-charset --default-character-set=utf8 test charset_test_latin1 > data.sql
mysql -uroot -p -e 'create table charset_test_latin1 (id int primary key auto_increment, char_col varchar(50)) charset=utf8' test
mysql -uroot -p --default-character-set=utf8 test < data.sql

Convert to Binary & Convert Back : Change the column to VARBINARY to preserve raw bytes, then modify it back to VARCHAR with the correct charset.

ALTER TABLE charset_test_latin1 MODIFY COLUMN char_col VARBINARY(50);
ALTER TABLE charset_test_latin1 MODIFY COLUMN char_col VARCHAR(50) CHARACTER SET utf8;

Both methods rely on extracting the original byte sequence and re‑encoding it correctly, effectively undoing the previous lossy conversion.

SQL encoding MySQL Data Corruption charset conversion mojibake