How to Extract and Convert Price Ranges with Python Regex and Pandas
This article demonstrates how to use Python regular expressions together with pandas to parse complex price‑range strings, extract numeric values, normalize missing bounds, split them into separate columns, and optionally convert the results to numeric types for further analysis.
1. Introduction
The author encountered a pandas DataFrame column containing price‑range strings such as "R32 ($16,500,00.01 to $20,000,00)" and needed to extract the numeric minimum and maximum values. The issue arose because the extraction produced a single column while four columns were expected.
2. Implementation
Using re.findall to capture the dollar amounts, the solution first creates a temporary column with the extracted numbers, pads missing values with a zero, removes commas, joins the numbers with a delimiter, and finally splits the string into two separate columns for min_price and max_price. The full code is shown below:
df = pd.DataFrame({'price_range': ['R32 ($16,500,00.01 to $20,000,00)',
'R43 ($5,000,000.00 to $8,000,000.50)',
'R15 (below $1,000,000)']})
# re extract numeric amounts
df['temp'] = df['price_range'].map(lambda x: re.findall(r'\$([0-9,.]+)', x))
# pad missing min value
df['temp'] = df['temp'].map(lambda x: ['0'] + x if len(x) == 1 else x)
# remove commas and join
df['temp'] = df['temp'].map(lambda x: '。'.join(x).replace(',', ''))
# split into two columns
df2 = df['temp'].str.split('。', expand=True)
df2.columns = ['min_price', 'max_price']
print(df2)An alternative approach uses pd.to_numeric to convert the extracted strings directly to floating‑point numbers after cleaning commas:
# re extract numeric amounts
df['temp'] = df['price_range'].map(lambda x: re.findall(r'\$([0-9,.]+)', x))
# pad missing min value
df['temp'] = df['temp'].map(lambda x: ['0'] + x if len(x) == 1 else x)
# split and clean commas, then convert
df2 = df['temp'].str.split('。', expand=True)
df2 = df2.replace({',': ''}, regex=True).apply(pd.to_numeric)
df2.columns = ['min_price', 'max_price']
print(df2)3. Conclusion
The provided snippets successfully parse the price‑range strings, handle cases with a single amount by inserting a zero for the missing bound, and produce a clean DataFrame with separate numeric columns for minimum and maximum prices, ready for further analysis.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
