Understanding MySQL utf8 vs utf8mb4: Handling Emoji and Unicode Characters
This article explains why inserting emoji into MySQL fails with the default utf8 charset, demonstrates how switching to utf8mb4 resolves the issue, and provides a historical overview of MySQL's limited utf8 implementation and the proper use of true UTF‑8 encoding.
When trying to store emoji characters such as 😲😳 in a MySQL table, the INSERT statement fails with an "Incorrect string value" error because the database is using the legacy utf8 charset.
INSERT INTO `csjdemo`.`student` (`ID`,`NAME`,`SEX`,`AGE`,`CLASS`,`GRADE`,`HOBBY`) VALUES ('20','哈哈😓','男','20','181班','9年级','看片儿');[Err] 1366 - Incorrect string value: '\x9F\x98\x93' for column 'NAME' at row 1
Changing the database, table, and column character set to utf8mb4 allows the same INSERT to succeed:
INSERT INTO `student` (`ID`,`NAME`,`SEX`,`AGE`,`CLASS`,`GRADE`,`HOBBY`) VALUES (null,'哈哈😓😓','男','20','181班','9年级','看片儿');MySQL's "utf8" is not a true UTF‑8 implementation; it only supports up to three bytes per character, which excludes four‑byte characters like most emoji. The proper UTF‑8 charset that supports up to four bytes is utf8mb4 .
True UTF‑8 can represent any Unicode code point using one to four bytes, making it more space‑efficient than UTF‑32. MySQL introduced the utf8mb4 character set in 2010 to work around the historic limitation of its "utf8" charset.
Historically, MySQL adopted an early UTF‑8 draft (RFC 2279) that allowed up to six bytes, then later restricted it to three bytes for performance reasons, inadvertently breaking support for characters that require four bytes. The change was never fully documented, leading many developers to continue using the misleading "utf8" label.
For any modern MySQL or MariaDB deployment, it is strongly recommended to configure the server, databases, tables, and columns to use utf8mb4 and avoid the legacy "utf8" charset.
Understanding how characters are encoded—mapping bytes to Unicode code points and then to visual glyphs—helps explain why UTF‑8 is the dominant encoding for web applications and why proper charset selection is crucial for data integrity.
Laravel Tech Community
Specializing in Laravel development, we continuously publish fresh content and grow alongside the elegant, stable Laravel framework.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.