How to Fix Chinese Character Encoding Issues in Python Crawlers

This article walks through a real‑world Python encoding problem where Unicode \u sequences cause garbled Chinese text, demonstrates the faulty code, and provides a corrected solution using json.dumps with ensure_ascii=False and GBK encoding to produce proper Base64 output.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
How to Fix Chinese Character Encoding Issues in Python Crawlers

Introduction

The author received a question in a Python community about decoding JSON data that contains Chinese characters, which were appearing as escaped \u sequences and resulting in unreadable output.

Problem

The original script serialized a dictionary with Chinese keys, removed spaces, and then Base64‑encoded the UTF‑8 string. The resulting output displayed garbled characters when decoded on the asker’s side.

Initial Attempt

import json
import base64

d = {"小明":55, "小爱":111, "嘎嘎":True}

s = json.dumps(d).replace(' ', '')
print(s)
print(base64.b64encode(s.encode('utf-8')))

This code produced the expected Base64 string on the author's machine but showed garbled output for the asker.

Solution

By ensuring the JSON serializer does not escape non‑ASCII characters and by encoding the string with the GBK charset before Base64 conversion, the problem is resolved.

import json
import base64

d = {"小明": 55, "小爱": 111, "嘎嘎": True}

s = json.dumps(d, ensure_ascii=False).replace(' ', '')
print(s)
print(base64.b64encode(s.encode('gbk')))

The corrected script outputs a readable Base64 string that decodes back to the original Chinese characters.

Conclusion

The article demonstrates how to handle Chinese character encoding in Python by using ensure_ascii=False and the appropriate charset (GBK) before Base64 encoding, enabling fans to correctly decode and display the data.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PythonencodingJSONBase64GBK
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.