Why MySQL’s “utf8” Isn’t Real UTF‑8 and How utf8mb4 Fixes Emoji Insert Errors
This article explains why MySQL’s legacy utf8 charset only supports three‑byte characters, causing emoji insertion failures, and shows how switching tables, columns, and the database to utf8mb4 resolves the issue while detailing the historical reasons behind MySQL’s limited utf8 implementation.
Error Recap
Inserting an emoji directly into a MySQL INSERT statement caused the error Incorrect string value: '\xF0\x9F\x98\x93' for column 'NAME' because the column used the default utf8 charset.
INSERT INTO `csjdemo`.`student` (`ID`,`NAME`,`SEX`,`AGE`,`CLASS`,`GRADE`,`HOBBY`)
VALUES ('20','陈哈哈😓','男','20','181班','9年级','看片儿');After changing the database, server, and column collations to utf8mb4, the insertion succeeds:
INSERT INTO `student` (`ID`,`NAME`,`SEX`,`AGE`,`CLASS`,`GRADE`,`HOBBY`)
VALUES (null,'陈哈哈😓😓','男','20','181班','9年级','看片儿');MySQL’s utf8 Quirks
MySQL’s utf8 charset is not true UTF‑8; it only supports characters up to three bytes. Regular Chinese characters fit in three bytes, but emojis and many complex symbols require four bytes, leading to insertion failures.
In MySQL, the "utf8" encoding supports a maximum of three bytes per character, while the official UTF‑8 standard allows up to four bytes.
Consequently, any four‑byte content must be stored using utf8mb4. The following images compare byte usage before and after switching to utf8mb4:
MySQL introduced the utf8mb4 charset in 2010 to work around this limitation, but documentation and many tutorials still incorrectly recommend the older utf8 charset.
1. utf8mb4 Is the Real UTF‑8
MySQL’s utf8mb4 fully implements the UTF‑8 standard, supporting all Unicode code points, including emojis. All users of MySQL or MariaDB should migrate to utf8mb4 and avoid the legacy utf8 charset.
2. Brief History of MySQL utf8
MySQL added UTF‑8 support in version 4.1 (2003) using the older RFC 2279 standard, which allowed up to six bytes per character. In September 2002, developers limited MySQL’s "utf8" to three‑byte sequences for performance reasons, effectively creating a proprietary charset.
Because the change was not well documented, many developers continued to use the misleading "utf8" label, unaware that it could not store four‑byte characters. This caused widespread confusion and data loss when inserting emojis.
Only in 2010 did MySQL release the proper utf8mb4 charset, but the older name persisted in many guides.
Conclusion
Most online articles still treat MySQL’s "utf8" as true UTF‑8, leading to repeated errors when handling emojis or other four‑byte characters. To ensure correct storage and avoid silent data loss, always configure databases, tables, and columns to use utf8mb4 instead of the legacy utf8 charset.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Liangxu Linux
Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
