Backend Development 4 min read

Fixing Python XPath Errors: A Step‑by‑Step Web Scraping Guide

This article walks through a Python web‑scraping problem where an incorrect XPath selector caused unexpected results, demonstrates the faulty code, explains why the XPath was wrong, and provides a corrected script with proper headers to reliably extract the desired text.

Python Crawling & Data Mining

Sep 3, 2025

Fixing Python XPath Errors: A Step‑by‑Step Web Scraping Guide

The author received a question about a Python web‑crawler selector extraction issue and shared the original code that used an incorrect XPath expression.

from lxml import etree
import requests
url = "http://zw.hainan.gov.cn/wssc/emalls.html"
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36"}
html = requests.get(url, headers=headers)
html = html.content.decode('utf-8')
doc = etree.HTML(html)
res = doc.xpath('/html/body/div[5]/ul/text()')
print('*-*--'*20)
for item in res:
    print(type(item))
    print(item[0])
print('*-*--'*20)

The problem was that the XPath expression was incorrect, leading to wrong output.

Solution

After clarifying the requirement, the XPath was updated to .//div/ul/li/a[2]/text(), and the script produced the expected text.

from lxml import etree
import requests
url = "http://zw.hainan.gov.cn/wssc/emalls.html"
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36"}
html = requests.get(url, headers=headers)
html = html.content.decode('utf-8')
doc = etree.HTML(html)
res = doc.xpath('.//div/ul/li/a[2]/text()')
print('*-*--'*20)
for item in res:
    print(type(item))
    print(item)
print('*-*--'*20)

When crawling, remember to include appropriate request headers.

Conclusion

The article identified a Python web‑crawling issue, explained why the original XPath was faulty, and provided corrected code that successfully extracted the desired information.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

XPath lxml

Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.