MySQL Character Set Implementation: From System Tables to Source Code
This article explores MySQL character set implementation by analyzing system tables (CHARACTER_SETS and COLLATIONS) in information_schema and tracing internal source code structures like CHARSET_INFO, initialization logic, and client connection handling.
In this article, the author continues the 'X Detective Bureau' series by diving into MySQL character set implementation, building on prior exploration of character set variables in CentOS. The focus shifts to MySQL 5.7.36 internals, aiming to provide beginners with an accessible entry point into character set source code.
First, the article examines two key system tables in the information_schema : CHARACTER_SETS and COLLATIONS . These store metadata about supported character sets and their collations, respectively. Screenshots detail the schema and sample data for GB2312 , prompting readers to consider why one character set may have multiple collation entries.
Next, the internal structure CHARSET_INFO is introduced, which holds character set metadata such as name and ID. The article traces initialization via all_charsets array and default_charset_info , showing how compiled character sets (e.g., my_charset_latin1_german2_ci ) are initialized in source files like ctype-latin1.c . The initialization process involves functions like add_compiled_collation that populate the global array.
Finally, the article covers client connection-time character set negotiation. It walks through the MySQL protocol handshake, specifically how the client’s charset code is parsed in parse_client_handshake_packet , and how the server uses this to set the connection’s character set via get_charset and related functions.
The piece ends with open-ended questions (e.g., why GB2312 appears once in CHARACTER_SETS but twice in COLLATIONS ) to encourage deeper exploration, reinforcing the educational goal of demystifying MySQL internals for newcomers.
Tencent Database Technology
Tencent's Database R&D team supports internal services such as WeChat Pay, WeChat Red Packets, Tencent Advertising, and Tencent Music, and provides external support on Tencent Cloud for TencentDB products like CynosDB, CDB, and TDSQL. This public account aims to promote and share professional database knowledge, growing together with database enthusiasts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.