How to Split and Extract Numeric Data from Complex Excel Columns Using Pandas
This article walks through a real‑world Python data‑processing problem where a messy Excel column is parsed into separate numeric fields, presenting two Pandas solutions—one using regular expressions and another using string splitting—along with complete code examples and practical tips.
1. Introduction
Hello, I am a Python enthusiast. Recently, a member of a Python community asked why a piece of code that extracts numbers from a column sometimes fails.
2. Solution Approach
A mentor suggested a straightforward method, but the code looked confusing. Another participant offered an alternative that avoids regular expressions.
Solution 1: Using Regular Expressions
test = pd.read_excel("测试数据.xlsx")
extract_cols = test.columns.drop('费用明细')
for c in extract_cols:
test[c] = test['费用明细'].str.extract(fr'{c}.*?(\d+\.?\d*)').astype('float64')
test.to_excel("测试数据-结果.xlsx", index=False)The above code reads the Excel file, drops the original "费用明细" column from the list of columns to extract, then uses str.extract with a dynamic regular expression to pull numeric values into new columns, finally saving the result.
Solution 2: Splitting Without Regular Expressions
test = pd.read_excel("测试数据.xlsx")
extract_cols = test.columns.drop('费用明细')
test['费用明细-c'] = test['费用明细'].str.split(',')
test = test.explode('费用明细-c')
test[['费用明细-c', '费用明细-d']] = test['费用明细-c'].str.split(' ', expand=True)
test['费用明细-d'] = test['费用明细-d'].str.strip('元').astype('float64')
test.loc[test['费用明细-c'].str.contains('平台加价'), '费用明细-c'] = '平台加价'
test = test[test['费用明细-d'].notna()]
testc = test.groupby('费用明细', sort=False)[['费用明细-c', '费用明细-d']].apply(lambda x: x.set_index('费用明细-c').T).reset_index(level=-1, drop=True)
testc.reindex(columns=extract_cols).reset_index().to_excel("测试数据-结果.xlsx", index=False)This method splits the original column by commas, explodes the list into rows, then splits each part by spaces to separate the label and the numeric value, cleans the numeric column, and finally reconstructs the dataframe to match the original column order.
Both solutions successfully resolved the community member's issue.
If you encounter similar Python data‑processing questions, feel free to join the discussion group for help.
3. Summary
The post presented a Python data‑handling problem, offered two concrete Pandas implementations—one regex‑based and one split‑based—and demonstrated how to clean and restructure Excel data efficiently.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
