Master Character Encodings: From ANSI to Unicode and Qt Implementation
This guide explains common character encodings such as ANSI, ASCII, GB2312/GBK/GB18030, Unicode, UTF‑8, UTF‑16, and UTF‑32, and shows how to handle them in MFC and Qt with practical code examples.
Why Encoding Matters
Software developers often encounter garbled Chinese characters or missing Japanese text because they lack a systematic understanding of character set encodings.
Common Encodings Overview
Typical encodings include GB2312, GBK, BIG5, UTF‑8, UTF‑16, and the older terms ANSI and Unicode. The article provides concise details for each.
1. ANSI
ANSI is a multibyte character set (MBCS) that supports variable‑length encoding, compatible with both single‑byte (SBCS) and double‑byte (DBCS) character sets, and aligns with EUC/EUC‑CN. Different regions use different code pages.
1.1 ASCII
ASCII defines 128 characters for English and Western European languages. First published in 1967, it corresponds to ISO/IEC 646.
1.2 GB2312 and Extensions
GB2312 is a simplified‑Chinese encoding using two bytes per character and is compatible with ASCII. GBK extends GB2312 to include traditional Chinese characters, and GB18030 further adds support for minority scripts, Japanese, and Korean.
2. Unicode
Unicode aims to provide a unique code point for every character worldwide, organized into 17 planes (0‑16) with a total of 1,114,112 possible code points (U+000000 – U+10FFFF). Not every code point is assigned to a character.
2.1 UTF‑16
UTF‑16 derives from UCS‑2 and uses two bytes for the Basic Multilingual Plane (BMP). For supplementary planes, it employs surrogate pairs (four bytes), making it a variable‑length encoding. Byte order can be big‑endian (UTF‑16BE) or little‑endian (UTF‑16LE), each optionally with a BOM.
2.2 UTF‑32
UTF‑32 uses a fixed 32‑bit (four‑byte) representation for each Unicode code point, offering simplicity at the cost of higher memory usage. It also supports both big‑ and little‑endian byte orders.
2.3 UTF‑8
UTF‑8 encodes characters in 1‑6 bytes, preserving ASCII compatibility and becoming the dominant encoding on the web.
3. Encoding in MFC
When MFC is set to multibyte mode, it uses GBK; when set to Unicode mode, it uses UTF‑16.
4. Encoding in Qt
QString stores text as UTF‑16. The following code snippets demonstrate how to set the locale codec and create strings in different encodings.
QTextCodec *codec = QTextCodec::codecForName("UTF-8");
QTextCodec::setCodecForLocale(codec);
QString str = "右边是UFT-8编码的字符串";For GBK encoding:
QTextCodec *codec = QTextCodec::codecForName("GBK");
QTextCodec::setCodecForLocale(codec);
QString str = "右边是GBK编码的字符串";Direct conversion methods:
QString str1 = QString::fromLocal8Bit("GBK编码字符串");
QString str2 = QString::fromUtf8("UTF-8编码字符串");Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Liangxu Linux
Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
