Mastering Character Encoding in Node.js: Avoid Garbled Text
This article introduces the fundamentals of character encoding and decoding, explains the difference between character sets and encodings, and demonstrates practical Node.js examples using the iconv‑lite library to correctly encode and decode Chinese text, helping developers prevent common garbled‑text issues.
Introduction
In web server development, character encoding and decoding are encountered daily. Improper handling leads to garbled text. Many Node.js developers lack sufficient knowledge and spend a lot of time troubleshooting.
This article first introduces basic concepts of character encoding, then shows how to perform encoding and decoding in Node.js, and finally provides a server‑side code example.
About Character Encoding and Decoding
During network communication, data is transmitted as binary bits, regardless of whether the content is text or images, Chinese or English.
Client --- 你好 ---> Server
The process involves two key steps: encoding on the client side and decoding on the server side.
Client: encode the string "你好" into binary bits required by the network.
Server: decode the received binary bits back into the string "你好".
In summary:
Encoding: convert data to binary bits for transmission.
Decoding: convert binary bits back to the original data.
Character Sets and Encodings
The conversion between characters and binary follows defined rules, known as character sets and character encodings.
A character set is a collection of characters, such as ASCII, Unicode, GBK, differing mainly in the number of characters they contain.
Character encoding defines how characters in a set are represented as bytes. For example, the Unicode set can be encoded as UTF‑8, UTF‑16, or UTF‑32.
Key points:
Character set: a collection of characters.
Character encoding: the specific byte representation of characters in a set.
A character set may have multiple encodings.
Encoding can be viewed as a mapping table used by client and server to transform characters to binary and back.
Example: the character "你" occupies three bytes in UTF‑8 (
0xe4 0xbd 0xa0) and two bytes in GBK (
0xc4 0xe3).
Encoding Example in Node.js
The following example uses the
iconv-litelibrary to encode and decode Chinese text.
Encoding with GBK and decoding with GBK works correctly, while decoding with UTF‑8 produces garbled output.
<code>var iconv = require('iconv-lite');
var oriText = '你';
var encodedBuff = iconv.encode(oriText, 'gbk');
console.log(encodedBuff); // <Buffer c4 e3>
var decodedText = iconv.decode(encodedBuff, 'gbk');
console.log(decodedText); // 你
var wrongText = iconv.decode(encodedBuff, 'utf8');
console.log(wrongText); // ��
</code>Tencent IMWeb Frontend Team
IMWeb Frontend Community gathering frontend development enthusiasts. Follow us for refined live courses by top experts, cutting‑edge technical posts, and to sharpen your frontend skills.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.