Databases 17 min read

Why Does MySQL Produce Garbled Text? Causes, Prevention, and Fixes

This article explains the multiple encoding/decoding steps MySQL performs when storing and retrieving data, identifies why mismatched character sets cause mojibake, and provides practical methods to avoid and repair corrupted text in MySQL tables.

21CTO
21CTO
21CTO
Why Does MySQL Produce Garbled Text? Causes, Prevention, and Fixes

MySQL Garbled Text Causes

Understanding why garbled text appears requires tracing the encoding/decoding steps from the client request, through MySQL Server, to the table storage and back to the client.

Encoding Process When Storing Data

The following diagram (originally shown) illustrates three encoding/decoding stages (red arrows): client encoding, MySQL Server decoding, and conversion from client charset to table charset if they differ. The terminal (Bash, web page, or app) inputs characters, which are converted to a binary stream, sent to the server, decoded using character-set-client, possibly re‑encoded to the table charset, and finally written to the data file.

Encoding Process When Retrieving Data

Retrieval also involves three steps: client decoding for display, MySQL Server encoding using character-set-client, and conversion from table charset to client charset. The binary stream is read from the file, decoded with the table charset, re‑encoded to the client charset, transmitted, and finally displayed according to the terminal’s configuration.

Root Causes of MySQL Mojibake

Inconsistent character sets between the storing and retrieving phases (e.g., storing with UTF‑8 but retrieving with GBK).

Inconsistent charset settings within a single workflow step (e.g., client uses UTF‑8 while character-set-client is GBK and the table is UTF‑8).

Lossy vs. Lossless Charset Conversion

A conversion is lossy when the target charset lacks a representation for a character from the source charset. For example, converting the Gurmukhi letter “ਅ” (U+0A05) from UTF‑8 to GBK results in a replacement character because GBK cannot represent it.

master [localhost] {msandbox} (test) > create table charset_test_gbk (id int primary key auto_increment, char_col varchar(50)) charset = gbk;
Query OK, 0 rows affected (0.00 sec)
master [localhost] {msandbox} (test) > set names utf8;
Query OK, 0 rows affected (0.00 sec)
master [localhost] {msandbox} (test) > insert into charset_test_gbk (char_col) values ('ਅ');
Query OK, 1 row affected, 1 warning (0.01 sec)
master [localhost] {msandbox} (test) > show warnings;
+---------+------+-------------------------------------------------------+
| Level   | Code | Message                                               |
+---------+------+-------------------------------------------------------+
| Warning | 1366 | Incorrect string value: '\xE0\xA8\x85' for column 'char_col' at row 1 |
+---------+------+-------------------------------------------------------+
master [localhost] {msandbox} (test) > select id,hex(char_col),char_col,char_length(char_col) from charset_test_gbk;
+----+---------------+----------+-----------------------+
| id | hex(char_col) | char_col | char_length(char_col) |
+----+---------------+----------+-----------------------+
|  1 | 3F            | ?        | 1                     |
+----+---------------+----------+-----------------------+

“Wrong In, Wrong Out” Phenomenon

When the client charset (C) differs from the table charset (S) but the conversion C→S and S→C are both lossless, data can appear correctly despite the mismatch. This requires that the characters exist in both charsets and that the target charset preserves unmapped characters (e.g., Latin‑1 retains original bytes).

How to Prevent Garbled Text

Ensure the three character sets—client, character-set-client, and table charset—are identical. Consistency eliminates the possibility of mojibake.

How to Repair Corrupted Data

Common but ineffective methods include:

ALTER TABLE … CHARSET=xxx : only changes the default charset for new columns, not existing data.

ALTER TABLE … CONVERT TO CHARACTER SET … : works only if the table currently contains no corrupted data; applying it to tables with “wrong‑in‑wrong‑out” data can further corrupt the data.

Effective methods:

Dump & Reload : Export the table with the original (incorrect) charset, create a new table with the correct charset, and import the dump using the proper client charset.

mysqldump -u root -p -d --skip-set-charset --default-character-set=utf8 test charset_test_latin1 > data.sql
# Verify the dump is UTF‑8 without garbled characters
mysql -uroot -p -e "create table charset_test_latin1 (id int primary key auto_increment, char_col varchar(50)) charset = utf8" test
mysql -uroot -p --default-character-set=utf8 test < data.sql

Convert to Binary & Convert Back : Change the column to VARBINARY to preserve raw bytes, then modify it back to VARCHAR with the correct charset.

ALTER TABLE charset_test_latin1 MODIFY COLUMN char_col VARBINARY(50);
ALTER TABLE charset_test_latin1 MODIFY COLUMN char_col VARCHAR(50) CHARACTER SET utf8;

These approaches safely restore the original characters without data loss.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

databaseMySQLcharacter encodinggarbled textcharset conversiondata repair
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.