Databases 9 min read

Understanding MySQL utf8 Limitations and Why You Should Use utf8mb4

This article explains why MySQL's traditional utf8 charset cannot store 4‑byte characters such as emojis, demonstrates the resulting insert errors, and shows how switching the database, system, and column collations to utf8mb4 resolves the issue while also providing a brief history of MySQL's charset implementation.

Top Architect
Top Architect
Top Architect
Understanding MySQL utf8 Limitations and Why You Should Use utf8mb4

Error Review

Inserting an emoji directly into a MySQL table using the default utf8 charset caused an error:

INSERT INTO `csjdemo`.`student` (`ID`,`NAME`,`SEX`,`AGE`,`CLASS`,`GRADE`,`HOBBY`) VALUES ('20','陈哈哈😓','男','20','181班','9年级','看片儿');
[Err] 1366 - Incorrect string value: '\xF0\x9F\x98\x93' for column 'NAME' at row 1

After changing the database, system, and column collations to utf8mb4 , the insert succeeds:

INSERT INTO `student` (`ID`,`NAME`,`SEX`,`AGE`,`CLASS`,`GRADE`,`HOBBY`) VALUES (null,'陈哈哈😓😓','男','20','181班','9年级','看片儿');

MySQL utf8 Trivia

MySQL's utf8 is not true UTF‑8; it only supports up to three bytes per character, so 4‑byte characters like emojis cannot be stored.

In true UTF‑8, Chinese characters occupy three bytes, while emojis require four bytes. Therefore, attempts to store emojis in a column defined with utf8 fail.

MySQL introduced utf8mb4 in 2010 to provide full UTF‑8 support, effectively working around this limitation.

1. utf8mb4 Is the Real UTF‑8

All MySQL and MariaDB users should migrate from utf8 to utf8mb4 to avoid data loss and ensure proper handling of 4‑byte characters.

2. Brief History of utf8 in MySQL

MySQL added UTF‑8 support in version 4.1 (2003) using the older RFC 2279, which allowed up to six bytes per character. Later, the implementation was restricted to three‑byte sequences, effectively limiting the charset.

Developers likely chose this restriction to improve performance with fixed‑length CHAR columns, but it resulted in an incomplete UTF‑8 implementation that could not store emojis.

Conclusion

Most online articles mistakenly treat MySQL's utf8 as true UTF‑8. To store emojis and other 4‑byte characters correctly, always configure databases, tables, and columns to use utf8mb4 .

SQLDatabaseMySQLcharacter encodingutf8mb4
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.