Databases 8 min read

Why MySQL’s utf8 Isn’t Real UTF‑8 and How utf8mb4 Fixes Emoji Insertion Errors

This article explains why MySQL's default utf8 charset cannot store four‑byte characters such as emojis, demonstrates the resulting insertion error, and shows how switching to the utf8mb4 charset resolves the issue while also covering the historical reasons behind MySQL's limited utf8 implementation.

Open Source Linux
Open Source Linux
Open Source Linux
Why MySQL’s utf8 Isn’t Real UTF‑8 and How utf8mb4 Fixes Emoji Insertion Errors

1. Error Review

Inserting an emoji directly into a MySQL INSERT statement caused the error:

INSERT INTO `csjdemo`.`student` (`ID`, `NAME`, `SEX`, `AGE`, `CLASS`, `GRADE`, `HOBBY`) VALUES ('20','陈哈哈😓','男','20','181班','9年级','看片儿');
[Err] 1366 - Incorrect string value: '\xF0\x9F\x98\x93' for column 'NAME' at row 1

After changing the database, connection, and column collations to utf8mb4 , the insertion succeeds:

INSERT INTO `student` (`ID`, `NAME`, `SEX`, `AGE`, `CLASS`, `GRADE`, `HOBBY`) VALUES (null,'陈哈哈😓😓','男','20','181班','9年级','看片儿');

2. Fun Facts About utf8 in MySQL

MySQL's "utf8" is not true UTF‑8.

In MySQL, the "utf8" charset only supports up to three bytes per character, while real UTF‑8 supports up to four bytes.

Chinese characters occupy three bytes, ASCII characters one byte, but emojis require four bytes, causing insertion failures unless the charset is changed to utf8mb4 .

The following diagram shows the byte count before and after switching to utf8mb4, illustrating why four‑byte characters cannot be stored in the old utf8 charset.

MySQL introduced the utf8mb4 charset in 2010 to work around this limitation, but the documentation still often incorrectly recommends using "utf8".

1. utf8mb4 Is the Real UTF‑8

MySQL's "utf8mb4" is the true UTF‑8 implementation. The older "utf8" charset is a proprietary subset that cannot represent many Unicode characters.

All MySQL and MariaDB users should migrate to utf8mb4 and stop using "utf8".

2. Brief History of utf8

MySQL added UTF‑8 support in version 4.1 (2003), but the standard UTF‑8 (RFC 3629) was defined later. The earlier RFC 2279 allowed up to six bytes per character.

In September 2002, MySQL limited its "utf8" to three‑byte sequences, effectively creating a non‑standard charset.

Developers likely made this change to improve performance for fixed‑length CHAR columns, assuming all rows would have the same byte count. However, this decision broke true UTF‑8 support, especially for emojis and some CJK characters.

Because fixing the charset would require users to rebuild their databases, MySQL kept the flawed "utf8" for years and only introduced the proper utf8mb4 charset in 2010.

3. Summary

Most online articles mistakenly treat MySQL's "utf8" as real UTF‑8, leading developers to encounter insertion errors with four‑byte characters. When creating new MySQL or MariaDB databases, always set the database, tables, and columns to utf8mb4 to ensure full Unicode compatibility.

Doing so will prevent future "Incorrect string value" errors and demonstrate a solid grasp of modern database encoding practices.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

EmojiSQLmysqlCharacter Setutf8mb4database encoding
Open Source Linux
Written by

Open Source Linux

Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.