Fundamentals 5 min read

How to Extract and Convert Price Ranges with Python Regex and Pandas

This article demonstrates how to use Python regular expressions together with pandas to parse complex price‑range strings, extract numeric values, normalize missing bounds, split them into separate columns, and optionally convert the results to numeric types for further analysis.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
How to Extract and Convert Price Ranges with Python Regex and Pandas

1. Introduction

The author encountered a pandas DataFrame column containing price‑range strings such as "R32 ($16,500,00.01 to $20,000,00)" and needed to extract the numeric minimum and maximum values. The issue arose because the extraction produced a single column while four columns were expected.

2. Implementation

Using re.findall to capture the dollar amounts, the solution first creates a temporary column with the extracted numbers, pads missing values with a zero, removes commas, joins the numbers with a delimiter, and finally splits the string into two separate columns for min_price and max_price. The full code is shown below:

df = pd.DataFrame({'price_range': ['R32 ($16,500,00.01 to $20,000,00)',
                                 'R43 ($5,000,000.00 to $8,000,000.50)',
                                 'R15 (below $1,000,000)']})
# re extract numeric amounts
df['temp'] = df['price_range'].map(lambda x: re.findall(r'\$([0-9,.]+)', x))
# pad missing min value
df['temp'] = df['temp'].map(lambda x: ['0'] + x if len(x) == 1 else x)
# remove commas and join
df['temp'] = df['temp'].map(lambda x: '。'.join(x).replace(',', ''))
# split into two columns
df2 = df['temp'].str.split('。', expand=True)
df2.columns = ['min_price', 'max_price']
print(df2)

An alternative approach uses pd.to_numeric to convert the extracted strings directly to floating‑point numbers after cleaning commas:

# re extract numeric amounts
df['temp'] = df['price_range'].map(lambda x: re.findall(r'\$([0-9,.]+)', x))
# pad missing min value
df['temp'] = df['temp'].map(lambda x: ['0'] + x if len(x) == 1 else x)
# split and clean commas, then convert
df2 = df['temp'].str.split('。', expand=True)
df2 = df2.replace({',': ''}, regex=True).apply(pd.to_numeric)
df2.columns = ['min_price', 'max_price']
print(df2)

3. Conclusion

The provided snippets successfully parse the price‑range strings, handle cases with a single amount by inserting a zero for the missing bound, and produce a clean DataFrame with separate numeric columns for minimum and maximum prices, ready for further analysis.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Pythonregexpandasprice parsing
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.