Fix Chinese Character Garbling in Python Web Scraping: Simple Encoding Hacks

This article explains why Chinese characters become garbled during Python web scraping, demonstrates the problematic code, and provides clear encoding adjustments and alternative solutions to reliably extract readable text.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
Fix Chinese Character Garbling in Python Web Scraping: Simple Encoding Hacks

1. Introduction

Hello, I'm PiPi. Recently a question was raised in a Python community about Chinese garbled characters when using a web crawler.

Original code:

import requests
import parsel

url='https://news.p2peye.com/article-514723-1.html'
headers={
    'Accept-Language': 'zh-CN,zh;q=0.9',
    'Accept': 'application/json, text/javascript, */*; q=0.01',
    'Accept-Encoding': 'gzip, deflate, br',
    'Cookie': '...',
    'Host': 'news.p2peye.com',
    'Referer': 'https://news.p2peye.com/article-514723-1.html',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.0.0 Safari/537.36',
    'X-Requested-With': 'XMLHttpRequest'
}
res = requests.get(url=url, headers=headers)
print(res.status_code)

res.encoding = res.apparent_encoding
# print(res.text)
selector_1 = parsel.Selector(res.text)
title = selector_1.css('#plat-title').get()
print(title)

The output shows garbled characters:

Garbled output
Garbled output

2. Implementation Process

Manually setting the response encoding to the correct charset resolves the issue: res.encoding = 'gbk' After this change the title is displayed correctly:

Correct title after setting encoding
Correct title after setting encoding

Using res.encoding = res.apparent_encoding may fail because automatic detection is less reliable than explicitly specifying the charset.

An alternative approach is to re‑encode the extracted title: title.encode('iso-8859-1').decode('gbk') This also yields the correct result:

Result using encode/decode
Result using encode/decode

3. Conclusion

The article identifies the cause of Chinese character garbling in Python web scraping and provides concrete code solutions—setting the correct encoding manually or re‑encoding the extracted text—to ensure readable output.

encodingrequestsChinese charactersParsel
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.