Databases 8 min read

Why MySQL’s utf8 Isn’t Real UTF‑8 and How utf8mb4 Fixes Emoji Errors

This article explains why MySQL's default utf8 charset cannot store four‑byte characters like emojis, demonstrates the resulting insert errors, and shows how switching to the utf8mb4 charset resolves the issue while providing a brief history of MySQL's encoding decisions.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
Why MySQL’s utf8 Isn’t Real UTF‑8 and How utf8mb4 Fixes Emoji Errors

1. Error Review

Inserting an emoji character directly into a MySQL table caused an error:

INSERT INTO `csjdemo`.`student` (`ID`, `NAME`, `SEX`, `AGE`, `CLASS`, `GRADE`, `HOBBY`) VALUES ('20','陈哈哈😓','男','20','181班','9年级','看片儿');
[Err] 1366 - Incorrect string value: '\xF0\x9F\x98\x93' for column 'NAME' at row 1

After changing the database, system, and column collations to utf8mb4 , the insert succeeds:

INSERT INTO `student` (`ID`, `NAME`, `SEX`, `AGE`, `CLASS`, `GRADE`, `HOBBY`) VALUES (null,'陈哈哈😓😓','男','20','181班','9年级','看片儿');

2. MySQL’s utf8 Quirks

MySQL’s "utf8" is not true UTF‑8.

In MySQL, the "utf8" charset only supports up to three bytes per character, while true UTF‑8 supports up to four bytes.

Chinese characters occupy three bytes, while numbers, English letters, and symbols occupy one byte. Emoji symbols require four bytes, causing insert failures unless the charset is changed to utf8mb4 .

The following diagram shows the byte count before and after converting to utf8mb4, illustrating why four‑byte characters cannot be stored in the old utf8 charset.

MySQL introduced the utf8mb4 charset in 2010 to work around this limitation, but it was not widely advertised, leading many developers to continue using the inadequate utf8 charset.

2.1 utf8mb4 Is the Real UTF‑8

Yes, MySQL’s "utf8mb4" is the true UTF‑8 implementation.

The original "utf8" charset is a proprietary encoding that supports only a limited set of Unicode characters.

All MySQL and MariaDB users who still use "utf8" should migrate to "utf8mb4" and never use the old charset again.

2.2 A Brief History of MySQL utf8

MySQL began supporting UTF‑8 in version 4.1 (2003), but the modern UTF‑8 standard (RFC 3629) was released later.

The older RFC 2279 allowed up to six bytes per character; MySQL’s early implementation followed this, then limited it to three bytes in September 2002.

The reason for this change is unclear; source‑control history was lost during migration from BitKeeper to Git.

MySQL developers originally aimed to let users define fixed‑length CHAR columns for performance, assuming each character would occupy a constant number of bytes. They experimented with six‑byte CHAR definitions, but the approach was never officially released.

Consequently, documentation incorrectly described MySQL’s utf8 capabilities, leading many developers to believe it was full UTF‑8.

When users defined CHAR columns with the "utf8" charset, they either wasted space or encountered errors storing four‑byte characters like emojis.

Because fixing the charset would have required all users to rebuild their databases, MySQL instead introduced the proper utf8mb4 charset in 2010.

3. Conclusion

Most online articles still treat MySQL’s "utf8" as real UTF‑8, which is misleading. When setting up MySQL or MariaDB databases, always configure the database, tables, and columns to use utf8mb4 . Doing so will prevent the dreaded emoji insertion errors and demonstrate solid technical competence.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

EmojiSQLdatabasemysqlcharacter encodingutf8mb4
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.