Databases 15 min read

Understanding and Resolving MySQL Character Set and Data Corruption Issues

This article explains why MySQL data can become garbled due to mismatched character sets during insertion or retrieval, explores various scenarios such as client‑side encoding errors, mixed encodings within tables, and LATIN1 storage, and provides detailed solutions including setting proper client encodings, using strict SQL_MODE, and exporting/importing data to unify character sets.

Aikesheng Open Source Community
Aikesheng Open Source Community
Aikesheng Open Source Community
Understanding and Resolving MySQL Character Set and Data Corruption Issues

MySQL data corruption often occurs when character set conversions fail during data insertion, resulting in stored garbled text.

Typical causes include:

Client encoding (CHARACTER_SET_CLIENT) differs from connection encoding (CHARACTER_SET_CONNECTION) and table encoding (CHARACTER_SET_DATABASE).

Incompatible encodings between client and server lead to warnings like Invalid gb2312 character string and Incorrect string value.

Example of a failing insert:

set @a = "文本字符串";
insert into t1 values(@a);

To prevent this, ensure the client encoding matches the table encoding, e.g.:

set names utf8mb4;
truncate t1;
insert into t1 values ("病毒滚吧!");
select * from t1;

Alternatively, enforce strict mode so incompatible inserts raise errors:

set sql_mode = 'STRICT_TRANS_TABLES';
insert into t1(a1) values ("病毒滚吧!");
-- ERROR 1366 (HY000): Incorrect string value ...

Client‑side garbled output occurs when the server returns data in one charset (CHARACTER_SET_RESULTS) but the client expects another. Switching the session charset fixes the display:

set names default;  -- reset to utf8mb4
select * from t3;

When a table contains rows stored in different encodings, export the mismatched portions separately and re‑import them into a new table with a unified charset:

-- Export utf8mb4 rows
select * from t3 limit 0,3 into outfile '/var/lib/mysql-files/tx.txt';
-- Export gbk rows
set names gbk;
select * from t3 limit 3,3 into outfile '/var/lib/mysql-files/ty.txt';
-- Create unified table
create table t4 (a1 varchar(10), a2 varchar(10)) charset utf8mb4;
load data infile '/var/lib/mysql-files/tx.txt' into table t4 character set gbk;
load data infile '/var/lib/mysql-files/ty.txt' into table t4;

Column‑level encoding mismatches can be resolved by converting the column to a binary type, then back to the desired charset:

alter table t2 modify a1 binary(12);
alter table t2 modify a1 varchar(12) charset utf8mb4;

LATIN1 storage stores any byte unchanged, which can hide encoding problems but limits multi‑byte characters such as emojis. Switching to utf8mb4 allows proper storage of Unicode symbols:

create table t2(a1 varchar(12)) charset latin1;
insert into t2 values ('❤❤❤❤');  -- stores 4 emojis
alter table t2 charset utf8mb4;
-- After conversion, the same data displays correctly as emojis.

In summary, avoid MySQL garbled data by aligning client, connection, and table character sets, using strict SQL_MODE to reject incompatible inserts, and, when necessary, exporting and re‑importing data to unify encodings.

SQLencodingMySQLCharacter SetData Corruption
Aikesheng Open Source Community
Written by

Aikesheng Open Source Community

The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.