Fundamentals 5 min read

How to Clean Merged Rows in Pandas: Regex Extraction & Fill‑Forward

This article walks through extracting platform, merchant, and account information from merged Excel rows using pandas, applying regex, dropping empty rows, forward‑filling missing values, and presenting the complete Python code to produce a clean, structured dataframe.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
How to Clean Merged Rows in Pandas: Regex Extraction & Fill‑Forward

Introduction

In a recent Python community discussion, a user asked how to merge and clean data using pandas. The original Excel data contains merged rows that result in NaN values when read into pandas.

The desired output is a tidy table with columns for product, price, payment method, sales region, sales volume, account, platform, and merchant.

Implementation

The guide explains that the required fields (platform, merchant, account) are embedded in merged rows and can be extracted with regular expressions. After extraction, rows containing only NaN should be dropped, and missing values forward‑filled.

Key steps include using dropna() to remove empty rows and fillna(method='ffill') to propagate previous values.

import pandas as pd

# Read Excel file
df = pd.read_excel('20230812.xlsx', header=None)

# Remove duplicate rows
df = df.drop_duplicates(ignore_index=False).reset_index(drop=True)

# Extract account, platform, merchant using regex
df['账号'] = df[0].str.extract(r'账号:(\d+)', expand=False).fillna(method='ffill')
df['平台'] = df[0].str.extract(r'平台:(.*?),', expand=False).fillna(method='ffill')
df['商户'] = df[0].str.extract(r'商户:(.*?),', expand=False).fillna(method='ffill')

# Drop rows with NaN (merged rows)
df = df.dropna().reset_index(drop=True)

# Set column names
df.columns = ['商品','单价','支付方式','销售地','销量','账号','平台','商户']
df = df[['平台','商户','账号','商品','单价','支付方式','销售地','销量']]
print(df)

The script produces the expected clean dataframe, fulfilling the original request.

Conclusion

This article demonstrates how to resolve pandas data‑merge issues by extracting embedded information with regular expressions, removing merged rows, and forward‑filling missing values, providing a practical solution for data cleaning tasks.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

data cleaningregexpandasdata merging
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.