Databases 19 min read

Why MySQL Shows Garbled Text and How to Fix It

This article explains the root causes of MySQL character‑set garbling, illustrates the encoding/decoding steps during data insertion and retrieval, and provides practical methods—including dump‑reload and binary conversion—to prevent and repair corrupted text.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
Why MySQL Shows Garbled Text and How to Fix It

This article details the causes of MySQL garbled characters and offers concrete solutions. Readers unfamiliar with character‑set concepts are encouraged to read the author’s earlier primer on character sets and encodings.

Reasons MySQL Produces Garbled Text

Understanding why garbling occurs requires tracing the encoding/decoding steps from the client request, through MySQL storage, and back to the client when data is retrieved. Two flowcharts illustrate the insertion and extraction phases.

Encoding Process When Storing Data in MySQL

The diagram shows three encoding/decoding steps (red arrows): client encoding, MySQL Server decoding, and conversion from client charset to table charset. The terminal may be a Bash shell, a web page, or an app. The steps are:

Input characters via an input method in the terminal.

The terminal converts characters to a binary stream according to its charset.

The binary stream is sent to MySQL Server through the client.

The server decodes the stream using character_set_client.

The server checks whether character_set_client matches the table’s charset.

If they differ, a conversion from client charset to table charset is performed.

The converted binary stream is written to the table file.

Encoding Process When Retrieving Data from MySQL

The retrieval diagram also shows three steps (red arrows): client decoding for display, MySQL Server encoding according to character_set_client, and conversion from table charset back to client charset.

Read the binary data from the table file.

Decode the binary using the table’s charset.

Convert the result to the client charset.

Encode the data as a binary stream using the client charset.

Transmit the stream over the network to the remote client.

The client displays the result using its configured charset.

Root Causes of MySQL Garbling

1. Mismatched charsets during insertion and retrieval. If the client uses UTF‑8 while the terminal uses GBK, or if set names utf8 is used on insert but set names gbk on select, the result will be garbled.

2. Inconsistent charsets within a single flow. Any two steps in the same direction using different charsets can cause loss‑y conversion, especially when the target charset cannot represent certain characters. The article demonstrates this with a Bash terminal set to UTF‑8, a MySQL server configured for GBK, and a table defined as utf8.

master [localhost] {msandbox} (test) > create table charset_test_utf8 (id int primary key auto_increment, char_col varchar(50)) charset = utf8;
Query OK, 0 rows affected (0.04 sec)

master [localhost] {msandbox} (test) > set names gbk;
Query OK, 0 rows affected (0.00 sec)

master [localhost] {msandbox} (test) > insert into charset_test_utf8 (char_col) values ('中文');
Query OK, 1 row affected, 1 warning (0.01 sec)

master [localhost] {msandbox} (test) > show warnings;
+---------+------+---------------------------------------------------------------+
| Level   | Code | Message                                                       |
+---------+------+---------------------------------------------------------------+
| Warning | 1366 | Incorrect string value: '\xAD\xE6\x96\x87' for column 'char_col' at row 1 |
+---------+------+---------------------------------------------------------------+

master [localhost] {msandbox} (test) > select id,hex(char_col),char_col from charset_test_utf8;
+----+----------------+----------+
| id | hex(char_col)  | char_col |
+----+----------------+----------+
|  1 | E6B6933FE69E83 | �?��    |
+----+----------------+----------+

Why MySQL Performs Encode/Decode

MySQL decodes the incoming binary stream to perform syntax and lexical analysis; without this, the server could not distinguish INSERT from UPDATE statements. After reading data from the file, the storage engine decodes the binary to enable character‑level operations such as LEFT(col,2).

Wrong‑In‑Wrong‑Out Phenomenon

The most common garbling scenario is “wrong‑in‑wrong‑out”: the client charset C differs from the table charset S, yet if the conversion C→S on insert and S→C on select are both loss‑less, the data appears correct. This only works when the two charsets can convert without loss.

MySQL must be able to decode the C‑encoded binary stream into S without loss, and later encode the S‑encoded binary stream back into C without loss.

Lossless vs. Lossy Conversion

A conversion is lossy when the target charset lacks a representation for a character present in the source charset. For example, UTF‑8 can represent many more characters than GBK, so converting a GURMUKHI LETTER A (U+0A05) to GBK results in a replacement character.

When MySQL cannot find a GBK equivalent, it substitutes a placeholder (often ‘?’). Some charsets, like latin1, preserve the original bytes for unsupported characters, enabling the “wrong‑in‑wrong‑out” effect.

How to Avoid Garbled Text

Ensuring that the client, character_set_client, and the table charset are all identical guarantees that no garbling occurs.

How to Repair Already Corrupted Data

Many “quick fixes” found online can worsen the problem.

Incorrect Method 1: ALTER TABLE … CHARSET=xxx

This command only changes the default charset for future columns; existing column definitions and data remain unchanged.

master [localhost] {msandbox} (test) > show create table charset_test;
+--------------+------------------------------------------------------------+
| Table        | Create Table                                               |
+--------------+------------------------------------------------------------+
| charset_test | CREATE TABLE `charset_test` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `char_col` varchar(50) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=latin1 |
+--------------+------------------------------------------------------------+

master [localhost] {msandbox} (test) > alter table charset_test charset=gbk;
Query OK, 0 rows affected (0.03 sec)

master [localhost] {msandbox} (test) > show create table charset_test;
+--------------+------------------------------------------------------------+
| Table        | Create Table                                               |
+--------------+------------------------------------------------------------+
| charset_test | CREATE TABLE `charset_test` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `char_col` varchar(50) CHARACTER SET latin1 DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=gbk |
+--------------+------------------------------------------------------------+

Incorrect Method 2: ALTER TABLE … CONVERT TO CHARACTER SET …

This command rewrites the data using the new charset, which only works when the table currently contains clean data. Applying it to a table that already suffered “wrong‑in‑wrong‑out” will corrupt the data further.

master [localhost] {msandbox} (test) > create table charset_test_latin1 (id int primary key auto_increment, char_col varchar(50)) charset = latin1;
... (insert Chinese text) ...
master [localhost] {msandbox} (test) > alter table charset_test_latin1 convert to character set utf8;
Query OK, 1 row affected (0.04 sec)

master [localhost] {msandbox} (test) > select id,hex(char_col),char_col from charset_test_latin1;
+----+--------------------------------------------------------+-----------------------------+-----------------------+
| id | hex(char_col)                                          | char_col                    | char_length(char_col) |
+----+--------------------------------------------------------+-----------------------------+-----------------------+
|  1 | C3A8C2BFE284A2C3A6CB9CC2AFC3A4C2B8C2ADC3A6E28093E280A1 | 这是中文                | 12 |
+----+--------------------------------------------------------+-----------------------------+-----------------------+

Correct Method 1: Dump & Reload

Export the data using the original (incorrect) charset, create a new table with the proper charset, and reload the dump.

shell> mysqldump -u root -p -d --skip-set-charset --default-character-set=utf8 test charset_test_latin1 > data.sql
# Verify the dump is UTF‑8 without garbling
shell> mysql -uroot -p -e 'create table charset_test_latin1 (id int primary key auto_increment, char_col varchar(50)) charset = utf8' test
shell> mysql -uroot -p --default-character-set=utf8 test < data.sql

Correct Method 2: Convert to Binary & Convert Back

Change the column to a binary type to preserve the raw bytes, then change it back to the desired charset.

mysql> ALTER TABLE charset_test_latin1 MODIFY COLUMN char_col VARBINARY(50);
mysql> ALTER TABLE charset_test_latin1 MODIFY COLUMN char_col varchar(50) CHARACTER SET utf8;
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

SQLmysqlcharacter encodinggarbled text
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.