Databases 12 min read

MySQL Character Sets and Collations: Concepts, Usage, and Migration

This article explains MySQL character sets and collations, describes common character sets such as latin1, utf8mb3, utf8mb4 and gb18030, shows how to query available sets and collations, compares PAD SPACE and NO PAD behavior, and provides step‑by‑step migration examples between utf8, utf8mb3 and utf8mb4.

Aikesheng Open Source Community
Aikesheng Open Source Community
Aikesheng Open Source Community
MySQL Character Sets and Collations: Concepts, Usage, and Migration

Concept A character set defines the mapping between characters and their binary codes, while a collation (sorting rule) determines how characters are compared and ordered within that set.

Character Encoding Classification

• ASCII : single‑byte, covers English letters, digits, basic punctuation (U+0000‑U+007F). • Unicode : universal set covering all scripts (U+0000‑U+10FFFF); ASCII is a subset of Unicode.

MySQL Common Character Sets

• latin1 (alias cp1252, ISO‑8859‑1) – single‑byte, compatible with ASCII, stores any byte stream; range U+0000‑U+00FF. • gb18030 – Chinese national standard, supersets GBK/GB2312, uses 1, 2 or 4 bytes; ideal for Chinese text. • utf8 – implementation of Unicode using 1‑4 bytes; MySQL’s utf8 actually refers to utf8mb3 (max 3 bytes). • utf8mb4 – true 4‑byte UTF‑8, default in MySQL 8.0.

Viewing Available Character Sets

SQL query example:

-- Filter character sets containing "unicode"
show character set where description like '%unicode%' and charset like 'utf8%';

Result columns: charset name, default collation, description, max byte length.

Alternative via metadata table:

select * from information_schema.character_sets where description like '%Unicode%' and character_set_name like 'utf8%';

Viewing Collations

SQL query example for default collation of utf8mb4:

show collation where charset = 'utf8mb4' and `default` = 'yes';

Or via metadata:

select * from information_schema.collations where collation_name like 'utf8mb4%_bin';

PAD SPACE vs NO PAD

• PAD SPACE ignores trailing spaces during comparison. • NO PAD treats trailing spaces as significant characters.

Example demonstrating PAD SPACE:

SET NAMES utf8mb4 COLLATE utf8mb4_bin;  -- utf8mb4_bin uses PAD SPACE
set @a='mysql      ';
set @b='mysql';
select if(@a=@b,'@a and @b equal','@a and @b different') as result;

Result: @a and @b equal (trailing spaces ignored).

Changing to a NO PAD collation:

SET NAMES utf8mb4 COLLATE utf8mb4_0900_bin;  -- NO PAD
set @a='mysql      ';
set @b='mysql';
select if(@a=@b,'@a and @b equal','@a and @b different') as result;

Result: @a and @b different because trailing spaces are considered.

Migration Between UTF8MB3 and UTF8MB4

Upgrade (utf8 → utf8mb4) – lossless because utf8 is a subset of utf8mb4.

create table t1 (a varchar(10) charset utf8);
create table t2 (a varchar(10) charset utf8mb4);
insert into t1 values ('消灭病毒,中国无敌!');
insert into t2 select * from t1;  -- works
select * from t2;

Downgrade (utf8mb4 → utf8) – only possible if stored characters fit within the 3‑byte range.

truncate t1;
insert into t1 select * from t2;  -- succeeds if data fits utf8
select * from t1;

Inserting emoji (4‑byte) into an utf8mb3 column fails:

insert into t2 values ('哈哈🍀🍎💰📱');  -- succeeds in utf8mb4
insert into t1 select * from t2;  -- ERROR: Incorrect string value

MySQL Character Set System Variables

Various layers have their own variables:

Server layer : character_set_server , collation_server

Client layer : character_set_client

Database layer : character_set_database , collation_database

Metadata layer : fixed to UTF‑8 ( character_set_system )

Result set layer : character_set_results

Connection layer : character_set_connection , collation_connection

Filesystem layer : character_set_filesystem (affects file name parsing)

Typical practice is to set client, connection, and result‑set layers together, e.g. SET NAMES utf8; .

Conclusion

The article covered the basic concepts of MySQL character sets and collations, listed common sets, showed how to query them, explained PAD SPACE vs NO PAD behavior, and demonstrated practical migration steps between utf8, utf8mb3, and utf8mb4, helping developers choose appropriate encodings and avoid charset‑related issues.

MySQLcharacter setCollationutf8mb4
Aikesheng Open Source Community
Written by

Aikesheng Open Source Community

The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.