Fix Chinese Encoding Issues in Python Web Scraping: Practical Tips & Code

This article walks through a common Chinese character garbling problem in Python web crawlers, explains why automatic encoding detection can fail, and provides clear code examples—including manual GBK setting and re‑encoding tricks—to reliably extract readable text.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
Fix Chinese Encoding Issues in Python Web Scraping: Practical Tips & Code

Introduction

The author encountered garbled Chinese characters when using a Python web crawler and posted the original request code and a screenshot of the malformed output.

Implementation

The initial request code is shown below:

import requests
import parsel

url = 'https://news.p2peye.com/article-514723-1.html'
headers = {
    'Accept-Language': 'zh-CN,zh;q=0.9',
    'Accept': 'application/json, text/javascript, */*; q=0.01',
    'Accept-Encoding': 'gzip, deflate, br',
    'Cookie': '... (omitted for brevity) ...',
    'Host': 'news.p2peye.com',
    'Referer': 'https://news.p2peye.com/article-514723-1.html',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.0.0 Safari/537.36',
    'X-Requested-With': 'XMLHttpRequest'
}
res = requests.get(url=url, headers=headers)
print(res.status_code)
res.encoding = res.apparent_encoding
selector_1 = parsel.Selector(res.text)
title = selector_1.css('#plat-title').get()
print(title)

The response shows garbled text, as illustrated in the following screenshot:

To fix the issue, the author manually set the response encoding to gbk: res.encoding = 'gbk' After this change, the page renders correctly, as shown in the next screenshot:

The author also explains why using res.encoding = res.apparent_encoding can be unreliable, because automatic detection may choose the wrong charset.

An alternative fix is to re‑encode the extracted title string: title.encode('iso-8859-1').decode('gbk') This method also yields the correct result, demonstrated by another screenshot:

Conclusion

The article summarizes the Chinese encoding problem in Python web crawling, provides concrete code solutions—setting res.encoding = 'gbk' or re‑encoding the extracted text—and demonstrates that these approaches restore readable Chinese characters.

PythonUnicodeweb-scrapingrequestsParsel
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.