Why MySQL’s utf8 Fails with Emojis and How utf8mb4 Solves It
This article explains the difference between MySQL’s utf8 and utf8mb4 character sets, why utf8 cannot store emojis or complex Chinese characters, and provides step‑by‑step examples showing how to configure tables and columns with utf8mb4 to avoid encoding errors.
What Is a Character Set?
Characters include letters, symbols, emojis, numbers, etc. A character set is a collection of characters that can be represented, and each set defines a range of characters it can encode.
Computers store data as binary; the process of mapping characters to binary is called character encoding , and the reverse is character decoding .
Common Character Sets
ASCII – 128 characters, mainly English.
GB2312 – ~6,700 Chinese characters, does not cover rare or traditional characters.
GBK – Extension of GB2312, >20,000 Chinese characters.
GB18030 – Fully compatible with GB2312 and GBK, includes minority scripts and over 70,000 Chinese characters.
BIG5 – Focused on Traditional Chinese, ~13,000 characters.
Unicode & UTF‑8 – Aim to cover virtually all known characters.
Using the wrong encoding to view a file causes garbled text; for example, interpreting GB2312‑encoded data with UTF‑8 yields nonsense characters.
MySQL Character Sets
MySQL supports many encodings such as UTF‑8, GB2312, GBK, BIG5. You can list them with the SHOW CHARSET command.
It is recommended to use UTF‑8 as the default, but MySQL provides two UTF‑8 implementations: utf8: Supports 1‑3 bytes per character. Chinese characters use 3 bytes, while emojis and many complex characters require 4 bytes and therefore cannot be stored. utf8mb4: Full UTF‑8 implementation supporting up to 4 bytes, capable of storing emojis and all Unicode characters.
If you need to store emojis or complex Chinese characters, set the database/table/column charset to utf8mb4 instead of utf8 to avoid errors.
Demonstration (MySQL 5.7+)
Creating a table with utf8mb4 charset:
CREATE TABLE `user` (
`id` varchar(66) CHARACTER SET utf8mb4 NOT NULL,
`name` varchar(33) CHARACTER SET utf8mb4 NOT NULL,
`phone` varchar(33) CHARACTER SET utf8mb4 DEFAULT NULL,
`password` varchar(100) CHARACTER SET utf8mb4 DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;Inserting a row that contains emojis while the table uses utf8 results in an error:
INSERT INTO `user` (`id`,`name`,`phone`,`password`) VALUES
('A00003','guide哥😘😘😘','181631312312','123456');MySQL reports:
Incorrect string value: '\xF0\x9F\x98\x98\xF0\x9F...' for column 'name' at row 1Changing the charset to utf8mb4 resolves the issue.
References
Charset & Encoding: https://www.cnblogs.com/skynet/archive/2011/05/03/2035105.html
Character set basics: http://cenalulu.github.io/linux/character-encoding/
Unicode Wikipedia: https://zh.wikipedia.org/wiki/Unicode
GB2312 Wikipedia: https://zh.wikipedia.org/wiki/GB_2312
UTF‑8 Wikipedia: https://zh.wikipedia.org/wiki/UTF-8
GB18030 Wikipedia: https://zh.wikipedia.org/wiki/GB_18030
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Backend Technology
Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
