Why MySQL’s utf8 Isn’t Real UTF‑8 and How utf8mb4 Fixes Emoji Errors
This article explains why MySQL's default utf8 charset cannot store four‑byte characters like emojis, demonstrates the resulting insert errors, and shows how switching to the utf8mb4 charset resolves the issue while providing a brief history of MySQL's encoding decisions.
1. Error Review
Inserting an emoji character directly into a MySQL table caused an error:
INSERT INTO `csjdemo`.`student` (`ID`, `NAME`, `SEX`, `AGE`, `CLASS`, `GRADE`, `HOBBY`) VALUES ('20','陈哈哈😓','男','20','181班','9年级','看片儿');[Err] 1366 - Incorrect string value: '\xF0\x9F\x98\x93' for column 'NAME' at row 1
After changing the database, system, and column collations to utf8mb4 , the insert succeeds:
INSERT INTO `student` (`ID`, `NAME`, `SEX`, `AGE`, `CLASS`, `GRADE`, `HOBBY`) VALUES (null,'陈哈哈😓😓','男','20','181班','9年级','看片儿');2. MySQL’s utf8 Quirks
MySQL’s "utf8" is not true UTF‑8.
In MySQL, the "utf8" charset only supports up to three bytes per character, while true UTF‑8 supports up to four bytes.
Chinese characters occupy three bytes, while numbers, English letters, and symbols occupy one byte. Emoji symbols require four bytes, causing insert failures unless the charset is changed to utf8mb4 .
The following diagram shows the byte count before and after converting to utf8mb4, illustrating why four‑byte characters cannot be stored in the old utf8 charset.
MySQL introduced the utf8mb4 charset in 2010 to work around this limitation, but it was not widely advertised, leading many developers to continue using the inadequate utf8 charset.
2.1 utf8mb4 Is the Real UTF‑8
Yes, MySQL’s "utf8mb4" is the true UTF‑8 implementation.
The original "utf8" charset is a proprietary encoding that supports only a limited set of Unicode characters.
All MySQL and MariaDB users who still use "utf8" should migrate to "utf8mb4" and never use the old charset again.
2.2 A Brief History of MySQL utf8
MySQL began supporting UTF‑8 in version 4.1 (2003), but the modern UTF‑8 standard (RFC 3629) was released later.
The older RFC 2279 allowed up to six bytes per character; MySQL’s early implementation followed this, then limited it to three bytes in September 2002.
The reason for this change is unclear; source‑control history was lost during migration from BitKeeper to Git.
MySQL developers originally aimed to let users define fixed‑length CHAR columns for performance, assuming each character would occupy a constant number of bytes. They experimented with six‑byte CHAR definitions, but the approach was never officially released.
Consequently, documentation incorrectly described MySQL’s utf8 capabilities, leading many developers to believe it was full UTF‑8.
When users defined CHAR columns with the "utf8" charset, they either wasted space or encountered errors storing four‑byte characters like emojis.
Because fixing the charset would have required all users to rebuild their databases, MySQL instead introduced the proper utf8mb4 charset in 2010.
3. Conclusion
Most online articles still treat MySQL’s "utf8" as real UTF‑8, which is misleading. When setting up MySQL or MariaDB databases, always configure the database, tables, and columns to use utf8mb4 . Doing so will prevent the dreaded emoji insertion errors and demonstrate solid technical competence.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
