Master Regex Extraction in Python Web Scraping: From Numbers to Strings

This article walks through a Python web‑scraping issue, shows how to clean scraped data using regular expressions, provides sample code for extracting both floating‑point and integer numbers from mixed strings, and offers a concise solution that readers can apply to their own projects.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
Master Regex Extraction in Python Web Scraping: From Numbers to Strings

Introduction

Hello, I am PiPi. In a Python community a member asked about a web‑scraping problem, and a teacher provided the following code:

url = "http://zw.hainan.gov.cn/wssc/ec/jlyhnkj.html"
resp = requests.get(url)
text = resp.text
parse = etree.HTML(text)
price = parse.xpath("//div[@class='productlist']/ul/li/div[4]/text()")
price = [i.strip() for i in price if i.strip()]
print(price)

Followers raised a question about handling the extracted data with regular expressions, which is shared below.

Implementation

An example string like “身高180.3cm” needs the numeric part 180.3. The basic regex for a floating‑point number is:

\d+\.\d+

A more robust solution that captures both floats and integers was later optimized:

d = ["身高180.3cm", "身高180.3", "身高180.3厘米", "higt180.3cm", "higt180.3厘米", "身高180.3cm", "higt180cm"]
for s in d:
    r = re.findall(r'\d+\.\d+|\d+', s)
    print(r)

With this slight modification the pattern works for any numeric format.

Conclusion

The article presented a Python web‑crawling issue, demonstrated how to clean the scraped data with regular expressions, and provided a reusable solution for extracting numeric values from mixed strings, helping readers solve similar problems efficiently.

PythonParsingData Extractionregexweb-scrapingre
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.