Databases 10 min read

Understanding MySQL Character Sets, Collations, and Common Encoding Issues

This article explains MySQL's various character sets and collations, describes important system variables such as character_set_client and character_set_connection, demonstrates how to query supported encodings, and provides practical examples for avoiding and fixing garbled text caused by mismatched encodings.

Aikesheng Open Source Community
Aikesheng Open Source Community
Aikesheng Open Source Community
Understanding MySQL Character Sets, Collations, and Common Encoding Issues

The author, a middleware developer at iKangSheng, expands on previous posts about MySQL character sets by detailing the fundamentals of different encodings (ASCII, Latin1, UCS‑2, UTF‑16, UTF‑32, GB series, UTF‑8) and how they occupy 1‑4 bytes per character.

1. Character Sets

ASCII uses 1 byte per character, with the high bit always zero.

Latin1 extends ASCII to cover Western European letters.

UCS‑2 uses 2 bytes (65,536 code points).

UTF‑16 builds on UCS‑2, allowing surrogate pairs for additional characters.

UTF‑32 uses a fixed 4‑byte length.

GB series and UTF‑8 are variable‑length encodings (1‑4 bytes).

When only ASCII characters (0‑127) are expressed, Latin1, GB series, and UTF‑8 behave identically, while UTF‑16/UTF‑32 add extra padding.

mysql> select
        hex(convert("sql" using latin1)) as a,
        hex(convert("sql" using gbk)) as b,
        hex(convert("sql" using utf8mb4)) as c,
        hex(convert("sql" using utf16)) as d,
        hex(convert("sql" using utf32)) as e;

| a      | b      | c      | d            | e                     |
+--------+--------+--------+--------------+-----------------------+
| 73716C | 73716C | 73716C | 00730071006C | 00000073000000710000006C |

You can list all character sets and collations supported by the current MySQL instance with:

mysql> SELECT CHARACTER_SET_NAME, COLLATION_NAME, ID FROM INFORMATION_SCHEMA.COLLATIONS
        ORDER BY CHARACTER_SET_NAME, ID;

What is a collation?

A collation is a set of rules that defines how to compare and sort character strings.

2. MySQL System Variables

Run show variables like 'character%' to see the relevant parameters:

character_set_client
character_set_connection
character_set_results
character_set_server
-- The following three are rarely needed
character_set_filesystem
character_set_database
character_set_system

2.1 character_set_client

It is a server‑side variable, not the client’s actual encoding.

The client sends data in whatever encoding the user typed; MySQL assumes that data is encoded with the value of character_set_client .

It cannot be set to multibyte encodings such as ucs2, utf16, or utf32.

ucs2
utf16 / utf16le
utf32

2.2 character_set_connection

According to the MySQL documentation, after receiving a statement the server converts it from character_set_client to character_set_connection , except for literals that specify their own charset (e.g., _utf8mb4"text").

2.3 SET NAMES

The SET NAMES statement sets three variables at once: character_set_client , character_set_connection , and character_set_results . It does not affect character_set_server .

mysql> SET NAMES gb2312;

3. A Common Garbled‑Text Mistake

A previous article claimed that mismatched client and table encodings cause garbled output, but even when both are set to gb2312 the problem can appear because the data sent to MySQL is actually encoded in UTF‑8. MySQL interprets the UTF‑8 bytes as gb2312, leading to corruption.

# Terminal encoding is UTF‑8; convert to GB2312 before sending
$ echo 'insert into ytt_new10.t1 val("病毒滚吧!");' | iconv -f utf8 -t gb2312 > insert-gb.txt
$ mysql -h ... --default-character-set=gb2312 < insert-gb.txt

When selecting data, MySQL converts it to character_set_results before sending it back; the client may need to convert it back to UTF‑8 for proper display.

$ mysql ... --default-character-set=utf8 -e 'select * from ytt_new10.t1'
病毒滚吧!
$ mysql ... --default-character-set=gb2312 -e 'select * ...' | iconv -f gb2312 -t utf8
病毒滚吧!

4. Why Convert from client to connection?

After receiving a statement, the server converts it from character_set_client to character_set_connection .

This extra layer allows the server to apply charset‑specific behavior (e.g., length calculations) that differs between encodings. Some encodings like UTF‑32 cannot be used directly by the client, so the conversion step enables those behaviors.

mysql> set character_set_connection='utf8';
mysql> select length("hello");
| 5 |

mysql> set character_set_connection='utf32';
mysql> select length("hello");
| 20 |

Thus, the conversion from client to connection is essential for supporting multibyte and special‑purpose character sets in MySQL.

SQLencodingMySQLcharacter setCollationUTF32UTF8
Aikesheng Open Source Community
Written by

Aikesheng Open Source Community

The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.