Understanding MySQL Character Set Parameters and Troubleshooting Encoding Issues
This article explains MySQL's character set and collation variables, illustrates how they interact during data import, analyzes a failure caused by mismatched charset settings, and provides guidance on proper configuration and troubleshooting of encoding problems.
The article, part of the "Illustrated MySQL" series, introduces MySQL character set and collation parameters and uses a charset‑related failure as a case study.
Fault description : When importing a large GBK‑encoded SQL file after executing set names gbk , the client bundles multiple statements into a single packet, causing MySQL to misinterpret a quoted character ("璡") and treat the following SQL as an unterminated string.
MySQL character set parameters : MySQL distinguishes between character set (encoding) and collation (comparison). The relevant variables are grouped as follows:
Three pairs that set both charset and collation: character_set_connection/collation_connection , character_set_server/collation_server , character_set_database/collation_database (the latter is deprecated).
Four variables that set only the charset: character_set_client , character_set_results , character_set_filesystem , character_set_system (fixed to utf8).
Client memory variable charset_info (referred to as mysql.client.charset in the article).
Storage‑layer charset settings for databases, tables, and columns, configurable via CREATE statements.
The article includes a diagram (shown below) that maps the flow of character set conversion from the client reading the SQL file to the server storing data.
Character set settings : The charset can be set via client handshake ( --default-character-set ), SET NAMES , or SET CHARSET . SET NAMES does not affect mysql.client.charset , while SET CHARSET changes cs_connection to the server's default charset.
Fault analysis : The failure occurs because SET NAMES leaves the client’s file‑reading charset at the default utf8, so the GBK byte sequence for "璡" (0xad5c) is misinterpreted, causing an escaped single‑quote and breaking SQL parsing.
Review : MySQL distinguishes between charset‑plus‑collation settings used for comparisons (e.g., cs_connection ) and charset‑only settings that affect external interfaces (e.g., cs_client , cs_results , cs_filesystem ). The undocumented mysql.client.charset influences how the client reads SQL files.
Further reading :
MySQL Character Set and Collation Documentation
Chinese blog on MySQL charset and collation
MySQL Large Packet Protocol
Aikesheng Open Source Community
The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.