How to Fix Common XPath Errors in Python Web Scraping: A Step-by-Step Guide

This article walks through a real‑world Python web‑scraping problem, shows why the original XPath selector fails, provides corrected code with a working XPath expression, and highlights best practices such as adding request headers for reliable crawling.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
How to Fix Common XPath Errors in Python Web Scraping: A Step-by-Step Guide

1. Introduction

The author shares a question from a Python community about an XPath selector that returns incorrect results when crawling a jokes website.

2. Problem Code

The original script uses requests and lxml.etree with an XPath expression that does not match the desired elements, leading to unexpected output.

from lxml import etree
import requests
url = "http://www.xiaohua.com/duanzi/"
resp = requests.get(url)
html = etree.HTML(resp.text)
print('*---*' * 20)
result = html.xpath("/html/body/div[@class='main']/div[@class='content']/div[@class='grid clearfix']/div[@class='content-left']/div[@class='one-cont'][*]/p[@class='fonts']")
print(type(result))
print(result)
print('*-*' * 20)
b = 0
for i in result:
    b += 1
    print(i, len(result))
    print(b, etree.tostring(i).decode('utf-8'))
    if b > 1:
        break

The output confirms the XPath is incorrect.

3. Solution

A community member provided a revised XPath and minor adjustments, allowing the script to extract the joke text correctly.

Running the revised script now yields the expected joke text.

Another participant shared notes summarizing the fix.

4. Conclusion

The article demonstrates how to diagnose and correct XPath issues in Python web crawlers and reminds readers to set appropriate request headers and follow good scraping practices.

Pythontutorialweb-scrapinglxml
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.