Why MySQL’s utf8 Isn’t Real UTF‑8 and How utf8mb4 Solves Emoji Errors
The article explains why MySQL’s built‑in utf8 charset cannot store four‑byte characters such as emojis, demonstrates the resulting insert error, shows how switching the database, table and connection to utf8mb4 fixes the issue, and recounts the historical reasons behind MySQL’s limited utf8 implementation.
1. Error Review
When inserting an emoji character into a MySQL table, the following error occurs:
INSERT INTO `csjdemo`.`student` (`ID`,`NAME`,`SEX`,`AGE`,`CLASS`,`GRADE`,`HOBBY`)
VALUES ('20','陈哈哈😓','男','20','181班','9年级','看片儿');[Err] 1366 - Incorrect string value: '\xF0\x9F\x98\x93' for column 'NAME' at row 1
Changing the database, system, and column character set to utf8mb4 resolves the problem:
INSERT INTO `student` (`ID`,`NAME`,`SEX`,`AGE`,`CLASS`,`GRADE`,`HOBBY`)
VALUES (null,'陈哈哈😓😓','男','20','181班','9年级','看片儿');2. The utf8 Story in MySQL
MySQL’s "utf8" charset is not true UTF‑8; it only supports characters up to three bytes. Real UTF‑8 allows up to four bytes, which is required for emojis and many complex characters.
Consequences:
Chinese characters occupy 3 bytes, Latin letters and digits 1 byte.
Emoji symbols occupy 4 bytes, causing insert failures under utf8.
MySQL introduced the utf8mb4 charset in 2010 to bypass this limitation. The older utf8 implementation stems from early MySQL versions (4.1, 2003) that followed the outdated RFC 2279, which allowed up to six bytes per character but was later constrained to three bytes for performance reasons.
Historical notes:
MySQL 4.1 adopted the older RFC 2279, limiting utf8 to three‑byte sequences.
In September 2002 the source code was altered to enforce the 3‑byte limit.
The exact commit author is unknown; after migrating to Git many contributor names were lost.
Developers originally aimed to let users define CHAR columns with fixed byte lengths for speed, but this design broke true UTF‑8 support.
Because the buggy utf8 charset remained undocumented, many tutorials still recommend using "utf8", leading to widespread confusion.
3. Conclusion
All MySQL and MariaDB users should migrate to utf8mb4 and avoid the legacy "utf8" charset to ensure proper storage of emojis and other four‑byte Unicode characters.
Future database setups that use utf8mb4 will avoid the hidden bugs of MySQL’s historic utf8 implementation.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
