How to Split and Explode Multiple Columns in Pandas to Flatten Excel Data
This article walks through a real‑world Pandas problem where product descriptions and quantities stored as comma‑separated strings in Excel are split and exploded into separate rows, explains why NaN values appear, and shows how converting columns to string resolves the issue.
1. Introduction
Hello everyone, I’m PiPi. A fan asked a Pandas data‑processing question: the Excel file contains a "商品内容" (product description) column and a "数量" (quantity) column where each cell holds multiple values separated by commas, and they need to be expanded into separate rows.
Below is the original code they tried:
jigou_df = pd.read_excel(jigou_path)
data = {'商品内容':jigou_df['商品内容'], '数量':jigou_df['数量']}
df = pd.DataFrame(data)
df_expanded = df.assign(数量=df['数量'].astype(str).str.split(',')).explode('数量')
df_expanded.reset_index(drop=True, inplace=True)
df_expandedThe result was not as expected.
2. Solution Process
A hint was given: first split both columns, then explode them.
Using
df = df.explode(['商品内容', '数量']).reset_index(drop=True)works, but the following complete implementation was provided:
import pandas as pd
# Read the Excel file
df = pd.read_excel("机构订单_202401091514_1.0.xlsx")
# Split the "商品内容" column
df['商品内容'] = df['商品内容'].str.split('、')
# Split the "数量" column
df['数量'] = df['数量'].str.split('、')
# Explode both columns
result = df.explode(['商品内容', '数量']).reset_index(drop=True)
print(result)The output shows the data correctly expanded, but the last two rows appear as nan. This happens because those cells are numeric in the original Excel file, and after splitting they become nan.
Fix: convert the "数量" column to string before splitting:
df['数量'] = df['数量'].astype("string").str.split('、')Alternatively, specify dtype=str when reading the file, or use astype or a lambda conversion during map.
3. Summary
The article demonstrates how to handle a common Pandas issue: splitting multi‑value cells and exploding them into separate rows, and why datatype mismatches can produce nan values. Converting columns to string before splitting resolves the problem.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
