Databases 10 min read

Master MySQL Character Sets and Collations: Choose the Right Encoding for Optimal Performance

This article explains MySQL's supported character sets and collations, shows how to list them, demonstrates table‑level and column‑level definitions, clarifies naming conventions, and provides practical examples for verifying and altering collations to ensure correct string comparison and storage.

Senior Brother's Insights
Senior Brother's Insights
Senior Brother's Insights
Master MySQL Character Sets and Collations: Choose the Right Encoding for Optimal Performance

MySQL Character Sets

MySQL supports a wide range of character sets, including local ones such as GB2312, GBK, BIG5 and Unicode sets like utf8mb3, utf8, utf8mb4, ucs2, utf16, and utf32. The recommended set is utf8mb4 because it fully supports all Unicode characters, including emojis and rare Chinese characters.

SELECT * FROM information_schema.character_sets ORDER BY character_set_name;

The query lists every character set MySQL knows, together with its default collation and maximum byte length (MAXLEN). For example, utf8 has MAXLEN 3, while utf8mb4 has MAXLEN 4, reflecting the UTF‑8 standard that allows up to four bytes per character.

Since MySQL 8, utf8mb4 is the default character set; the older utf8 is retained only for backward compatibility and should generally be avoided.

Defining Character Sets

If a table or column does not explicitly specify a character set, the server’s default utf8mb4 is applied (unless overridden by a database‑level or server‑level setting).

CREATE TABLE no_charset (
  my_column VARCHAR(255)
);
SHOW CREATE TABLE no_charset;

The resulting CREATE TABLE statement shows that utf8mb4 and its default collation utf8mb4_0900_ai_ci have been used.

Table‑level definition

Specify the character set for an entire table with the CHARSET= clause:

CREATE TABLE no_charset (
  my_column VARCHAR(255)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

Column‑level definition

Set the character set directly on a column; this overrides any table‑level setting:

CREATE TABLE mixed_collations (
  explicitly_set VARCHAR(255) CHARACTER SET latin1,
  implicitly_set VARCHAR(255)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

Column‑level declarations take precedence over table‑level ones, which in turn override database and server defaults.

MySQL Collations

A collation determines how strings are compared and sorted. MySQL chooses a collation based on the character set of the column, unless a specific collation is declared.

SELECT * FROM information_schema.collations
WHERE character_set_name='utf8mb4'
ORDER BY collation_name;

For utf8mb4, the default collation is utf8mb4_0900_ai_ci. The suffixes in a collation name convey its behavior:

Collation naming rules

_ai

: accent‑insensitive _as: accent‑sensitive _ci: case‑insensitive _cs: case‑sensitive _ks: kana‑sensitive (Japanese) _bin: binary comparison

Thus, utf8mb4_0900_ai_ci means: character set utf8mb4, UCA 9.0.0 weight keys, accent‑insensitive, case‑insensitive.

Verifying collations

Use the COLLATE keyword to force a specific collation in a comparison:

SELECT "MySQL" COLLATE utf8mb4_0900_ai_ci = "mysql" COLLATE utf8mb4_0900_ai_ci;

This returns 1, indicating equality under a case‑insensitive collation. Using a case‑sensitive collation yields 0:

SELECT "MySQL" COLLATE utf8mb4_0900_as_cs = "mysql" COLLATE utf8mb4_0900_as_cs;

Similar logic applies to accent‑sensitivity; for example, resume and résumé compare equal under an accent‑insensitive collation.

Defining Collations

Collations can be set at the table or column level, just like character sets. If omitted, the column inherits the default collation of its character set.

Table‑level definition

CREATE TABLE table_with_collation (
  my_column VARCHAR(255)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;

Column‑level definition

CREATE TABLE table_with_collation (
  explicitly_set VARCHAR(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,
  implicitly_set VARCHAR(255)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

Existing tables can be altered with ALTER TABLE … CHANGE … CHARACTER SET … COLLATE … to modify a column’s collation.

Summary

Understanding character sets and collations is essential for correct storage, comparison, and sorting of string data in MySQL. The most specific level (column > table > database > server) wins. In most cases, using the default utf8mb4 character set with the utf8mb4_0900_ai_ci collation provides full Unicode support with case‑ and accent‑insensitive comparisons.

SQLMySQLCharacter Setcollationutf8mb4
Senior Brother's Insights
Written by

Senior Brother's Insights

A public account focused on workplace, career growth, team management, and self-improvement. The author is the writer of books including 'SpringBoot Technology Insider' and 'Drools 8 Rule Engine: Core Technology and Practice'.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.