How to Fix Common XPath Errors in Python Web Scraping – A Step-by-Step Guide

An experienced Python developer walks through a real-world web‑scraping issue, showing how a faulty XPath selector caused empty results, then provides corrected code, execution screenshots, and best practices like adding request headers, helping readers quickly resolve similar problems.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
How to Fix Common XPath Errors in Python Web Scraping – A Step-by-Step Guide

Introduction

In a Python community a member asked about a web‑scraping selector issue; the original XPath did not return the expected results.

Problem

The original code used the following XPath (shown in the image) and printed the result, but the output was incorrect.

from lxml import etree
import requests
url = "http://www.xiaohua.com/duanzi/"

resp = requests.get(url)
html = etree.HTML(resp.text)

print('*---*'*20)

result = html.xpath("/html/body/div[@class='main']/div[@class='content']/div[@class='grid clearfix']/div[@class='content-left']/div[@class='one-cont'][*]/p[@class='fonts']")
print(type(result))
print(result)
print('*-*'*20)
b = 0
for i in result:
    b += 1
    print(i,len(result))
    print(b,etree.tostring(i).decode('utf-8'))
    if b > 1:
        break

The issue was identified as an incorrect XPath expression.

Solution

A community member provided a corrected XPath and updated code (illustrated in the image). After running the revised script the desired joke text is extracted correctly.

Key points include adding appropriate request headers to avoid being blocked.

Conclusion

The article demonstrates how to diagnose and fix XPath problems in Python web scraping, offering a concrete example and best‑practice tips for reliable data extraction.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Backend DevelopmentXPathlxml
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.