Backend Development 7 min read

Mastering Character Encoding in Node.js: Avoid Garbled Text

This article introduces the fundamentals of character encoding and decoding, explains the difference between character sets and encodings, and demonstrates practical Node.js examples using the iconv‑lite library to correctly encode and decode Chinese text, helping developers prevent common garbled‑text issues.

Tencent IMWeb Frontend Team

Sep 13, 2017

Mastering Character Encoding in Node.js: Avoid Garbled Text

Introduction

In web server development, character encoding and decoding are encountered daily. Improper handling leads to garbled text. Many Node.js developers lack sufficient knowledge and spend a lot of time troubleshooting.

This article first introduces basic concepts of character encoding, then shows how to perform encoding and decoding in Node.js, and finally provides a server‑side code example.

About Character Encoding and Decoding

During network communication, data is transmitted as binary bits, regardless of whether the content is text or images, Chinese or English.

Client --- 你好 ---> Server

The process involves two key steps: encoding on the client side and decoding on the server side.

Client: encode the string "你好" into binary bits required by the network.

Server: decode the received binary bits back into the string "你好".

In summary:

Encoding: convert data to binary bits for transmission.

Decoding: convert binary bits back to the original data.

Character Sets and Encodings

The conversion between characters and binary follows defined rules, known as character sets and character encodings.

A character set is a collection of characters, such as ASCII, Unicode, GBK, differing mainly in the number of characters they contain.

Character encoding defines how characters in a set are represented as bytes. For example, the Unicode set can be encoded as UTF‑8, UTF‑16, or UTF‑32.

Key points:

Character set: a collection of characters.

Character encoding: the specific byte representation of characters in a set.

A character set may have multiple encodings.

Encoding can be viewed as a mapping table used by client and server to transform characters to binary and back.

Example: the character "你" occupies three bytes in UTF‑8 ( 0xe4 0xbd 0xa0) and two bytes in GBK ( 0xc4 0xe3).

Encoding Example in Node.js

The following example uses the iconv-lite library to encode and decode Chinese text.

Encoding with GBK and decoding with GBK works correctly, while decoding with UTF‑8 produces garbled output.

var iconv = require('iconv-lite');
var oriText = '你';
var encodedBuff = iconv.encode(oriText, 'gbk');
console.log(encodedBuff); // <Buffer c4 e3>

var decodedText = iconv.decode(encodedBuff, 'gbk');
console.log(decodedText); // 你

var wrongText = iconv.decode(encodedBuff, 'utf8');
console.log(wrongText); // ��

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Backend Development Node.js UTF-8 character encoding GBK iconv-lite

Written by

Tencent IMWeb Frontend Team

IMWeb Frontend Community gathering frontend development enthusiasts. Follow us for refined live courses by top experts, cutting‑edge technical posts, and to sharpen your frontend skills.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.