Why MySQL’s “utf8” Isn’t Real UTF‑8 and How utf8mb4 Solves It
This article explains why MySQL’s built‑in utf8 charset only supports three‑byte characters, why emojis cause errors, and how switching to the proper utf8mb4 charset resolves insertion issues while preserving full Unicode support.
Error Review
Inserting an emoji directly into a MySQL table using the default
utf8charset caused the error:
<code>INSERT INTO `csjdemo`.`student` (`ID`, `NAME`, `SEX`, `AGE`, `CLASS`, `GRADE`, `HOBBY`)
VALUES ('20', '陈哈哈😓', '男', '20', '181班', '9年级', '看电影');</code>Result:
<code>[Err] 1366 - Incorrect string value: '\xF0\x9F\x98\x93' for column 'NAME' at row 1</code>After changing the database, system, and column collations to
utf8mb4, the insert succeeds:
<code>INSERT INTO `student` (`ID`, `NAME`, `SEX`, `AGE`, `CLASS`, `GRADE`, `HOBBY`)
VALUES (null, '陈哈哈😓😓', '男', '20', '181班', '9年级', '看电影');</code>Fun Facts About MySQL utf8
MySQL’s utf8 is not true UTF‑8; it only supports up to three bytes per character, while real UTF‑8 supports up to four bytes.
Chinese characters occupy three bytes, ASCII characters one byte, but emojis require four bytes, causing insertion failures unless
utf8mb4is used.
The comparison image shows how character count and byte size change after converting to
utf8mb4.
MySQL introduced
utf8mb4in 2010 to work around this limitation, but never officially announced it, leading many developers to mistakenly use
utf8as if it were full UTF‑8.
utf8mb4 Is the Real UTF‑8
Only
utf8mb4implements the full Unicode range. The older
utf8charset is a limited, MySQL‑specific encoding.
All MySQL and MariaDB users should migrate to
utf8mb4and stop using
utf8.
A Brief History of utf8 in MySQL
MySQL added UTF‑8 support in version 4.1 (2003), but at that time the UTF‑8 standard (RFC 3629) allowing four‑byte characters had not yet been adopted.
Earlier RFC 2279 allowed up to six bytes per character; MySQL initially used this version, limiting
utf8to three‑byte sequences in a 2002 update.
The change was likely motivated by a desire to improve performance by using fixed‑length
CHARcolumns, but it introduced the incompatibility with true UTF‑8.
Because the broken charset was already released, MySQL could not simply fix it without forcing users to rebuild databases, so it kept the limitation until the 2010 introduction of
utf8mb4.
Conclusion
Most online articles still treat MySQL’s
utf8as real UTF‑8, leading to widespread errors when storing emojis or other four‑byte characters. When setting up MySQL or MariaDB databases, always configure the server, database, tables, and columns to use
utf8mb4to ensure full Unicode compatibility.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.