How to Clean Merged Rows in Pandas: Regex Extraction & Fill‑Forward
This article walks through extracting platform, merchant, and account information from merged Excel rows using pandas, applying regex, dropping empty rows, forward‑filling missing values, and presenting the complete Python code to produce a clean, structured dataframe.
Introduction
In a recent Python community discussion, a user asked how to merge and clean data using pandas. The original Excel data contains merged rows that result in NaN values when read into pandas.
The desired output is a tidy table with columns for product, price, payment method, sales region, sales volume, account, platform, and merchant.
Implementation
The guide explains that the required fields (platform, merchant, account) are embedded in merged rows and can be extracted with regular expressions. After extraction, rows containing only NaN should be dropped, and missing values forward‑filled.
Key steps include using dropna() to remove empty rows and fillna(method='ffill') to propagate previous values.
import pandas as pd
# Read Excel file
df = pd.read_excel('20230812.xlsx', header=None)
# Remove duplicate rows
df = df.drop_duplicates(ignore_index=False).reset_index(drop=True)
# Extract account, platform, merchant using regex
df['账号'] = df[0].str.extract(r'账号:(\d+)', expand=False).fillna(method='ffill')
df['平台'] = df[0].str.extract(r'平台:(.*?),', expand=False).fillna(method='ffill')
df['商户'] = df[0].str.extract(r'商户:(.*?),', expand=False).fillna(method='ffill')
# Drop rows with NaN (merged rows)
df = df.dropna().reset_index(drop=True)
# Set column names
df.columns = ['商品','单价','支付方式','销售地','销量','账号','平台','商户']
df = df[['平台','商户','账号','商品','单价','支付方式','销售地','销量']]
print(df)The script produces the expected clean dataframe, fulfilling the original request.
Conclusion
This article demonstrates how to resolve pandas data‑merge issues by extracting embedded information with regular expressions, removing merged rows, and forward‑filling missing values, providing a practical solution for data cleaning tasks.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
