Databases 7 min read

Understanding MySQL utf8 vs utf8mb4: Handling Emoji and Unicode Characters

This article explains why inserting emoji into MySQL fails with the default utf8 charset, demonstrates how switching to utf8mb4 resolves the issue, and provides a historical overview of MySQL's limited utf8 implementation and the proper use of true UTF‑8 encoding.

Laravel Tech Community
Laravel Tech Community
Laravel Tech Community
Understanding MySQL utf8 vs utf8mb4: Handling Emoji and Unicode Characters

When trying to store emoji characters such as 😲😳 in a MySQL table, the INSERT statement fails with an "Incorrect string value" error because the database is using the legacy utf8 charset.

INSERT INTO `csjdemo`.`student` (`ID`,`NAME`,`SEX`,`AGE`,`CLASS`,`GRADE`,`HOBBY`) VALUES ('20','哈哈😓','男','20','181班','9年级','看片儿');
[Err] 1366 - Incorrect string value: '\x9F\x98\x93' for column 'NAME' at row 1

Changing the database, table, and column character set to utf8mb4 allows the same INSERT to succeed:

INSERT INTO `student` (`ID`,`NAME`,`SEX`,`AGE`,`CLASS`,`GRADE`,`HOBBY`) VALUES (null,'哈哈😓😓','男','20','181班','9年级','看片儿');

MySQL's "utf8" is not a true UTF‑8 implementation; it only supports up to three bytes per character, which excludes four‑byte characters like most emoji. The proper UTF‑8 charset that supports up to four bytes is utf8mb4 .

True UTF‑8 can represent any Unicode code point using one to four bytes, making it more space‑efficient than UTF‑32. MySQL introduced the utf8mb4 character set in 2010 to work around the historic limitation of its "utf8" charset.

Historically, MySQL adopted an early UTF‑8 draft (RFC 2279) that allowed up to six bytes, then later restricted it to three bytes for performance reasons, inadvertently breaking support for characters that require four bytes. The change was never fully documented, leading many developers to continue using the misleading "utf8" label.

For any modern MySQL or MariaDB deployment, it is strongly recommended to configure the server, databases, tables, and columns to use utf8mb4 and avoid the legacy "utf8" charset.

Understanding how characters are encoded—mapping bytes to Unicode code points and then to visual glyphs—helps explain why UTF‑8 is the dominant encoding for web applications and why proper charset selection is crucial for data integrity.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

EmojimysqlUnicodeCharacter Setutf8mb4database encoding
Laravel Tech Community
Written by

Laravel Tech Community

Specializing in Laravel development, we continuously publish fresh content and grow alongside the elegant, stable Laravel framework.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.