Understanding utf8mb4 and Its Advantages in MySQL 8.0
This article explains the differences between utf8, utf8mb3 and utf8mb4 character sets in MySQL, demonstrates how utf8mb4 enables full Unicode support including emojis, and provides step‑by‑step SQL examples for creating tables, inserting data, and querying results with the proper character set.
1 Understanding utf8mb4
In modern web applications, supporting multiple languages and character sets is increasingly important. MySQL, as one of the most popular relational database management systems, introduced utf8mb4 in version 8.0 to store the full Unicode character set, including emojis and supplementary characters.
Before utf8mb4 , MySQL’s utf8 implementation used up to three bytes per character and only covered the Basic Multilingual Plane (BMP), which is about 90 % of Unicode. utf8mb4 extends this to four bytes per character, allowing storage of the entire Unicode range.
Feature
UTF8
utf8mb3
utf8mb4
Maximum bytes per character
3
3
4
Supported characters
BMP
BMP
BMP + Supplementary Plane
MySQL default
Yes
Yes
Yes (from 8.0)
Status
Deprecated
Deprecated
Not deprecated
Note: Historically, MySQL used the name utf8 as an alias for utf8mb3 . Starting with MySQL 8.0.28, utf8mb3 is only referenced in SHOW statements and information‑schema tables; the plain utf8 name is expected to become an alias for utf8mb4 . To avoid ambiguity, explicitly specify utf8mb4 when defining character sets.
In summary, the main difference among utf8 , utf8mb3 and utf8mb4 is the maximum number of bytes per character. utf8 and utf8mb3 can store only BMP characters, while utf8mb4 can also store supplementary characters such as emojis, mathematical symbols, and other special symbols.
MySQL 5.7 and earlier used utf8 as the default character set; MySQL 8.0 used utf8mb3 by default, but from 8.0.28 onward the default is utf8mb4 . Both utf8 and utf8mb3 are deprecated and will eventually be removed, so new applications should use utf8mb4 .
2 Comparison Examples
MySQL 5.7
mysql> select version();
+-------------------+
| version() |
+-------------------+
| 5.7.42-46 |
+-------------------+Table
mysql> CREATE TABLE users (
id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(255) CHARACTER SET utf8,
email VARCHAR(255) CHARACTER SET utf8
);
Query OK, 0 rows affected (0.03 sec)
mysql> SHOW CREATE TABLE users;
*************************** 1. row ***************************
Table: users
Create Table: CREATE TABLE `users` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
`email` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.01 sec)Inserting three rows, the third containing an emoji, fails because the utf8 character set cannot store the supplementary character:
mysql> INSERT INTO users (name, email) VALUES
('Arun Jith', '[email protected]'),
('Jane Doe', '[email protected]'),
('𝌆', '[email protected]');
ERROR 1366 (HY000): Incorrect string value: 'xF0x9Dx8Cx86' for column 'name' at row 3When the same statements are run on MySQL 8.0 with the default character set still set to utf8mb3 , the error persists:
mysql> SELECT version();
+--------------------------+
| version() |
+--------------------------+
| 8.0.33-0ubuntu0.22.04.2 |
+--------------------------+
CREATE TABLE users (
id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(255) CHARACTER SET utf8mb3 COLLATE utf8mb3_general_ci DEFAULT NULL,
email VARCHAR(255) CHARACTER SET utf8mb3 COLLATE utf8mb3_general_ci DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
INSERT INTO users (name, email) VALUES
('Arun Jith', '[email protected]'),
('Jane Doe', '[email protected]'),
('𝌆', '[email protected]');
ERROR 1366 (HY000): Incorrect string value: 'xF0x9Dx8Cx86' for column 'name' at row 3After correcting the table definition to use utf8mb4 for both columns, the insertion succeeds:
mysql> CREATE TABLE users (
id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(255) CHARACTER SET utf8mb4,
email VARCHAR(255) CHARACTER SET utf8mb4
);
Query OK, 0 rows affected (0.03 sec)
INSERT INTO users (name, email) VALUES
('Arun Jith', '[email protected]'),
('Jane Doe', '[email protected]'),
('𝌆', '[email protected]');
Query OK, 3 rows affected (0.01 sec)
SELECT * FROM users;
+----+-----------+----------------------+
| id | name | email |
+----+-----------+----------------------+
| 1 | Arun Jith | [email protected] |
| 2 | Jane Doe | [email protected] |
| 3 | 𝌆 | [email protected] |
+----+-----------+----------------------+3 Summary
As demonstrated, the utf8mb4 character set can store the full Unicode range, including emojis and other supplementary characters, making it the preferred choice for modern applications that handle multilingual or complex text data. In contrast, utf8 (or utf8mb3 ) is limited to BMP characters and cannot store emojis.
It is generally recommended that all new applications use utf8mb4 to ensure correct storage and processing of any Unicode characters.
Aikesheng Open Source Community
The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.