How to Clean Messy CSV Data with Python Pandas: A Step-by-Step Guide
This article walks through cleaning irregular fan data—removing spaces, Chinese characters, asterisks, and missing brackets—using Python's pandas library, provides the full code snippet, demonstrates the resulting DataFrame, and shares practical tips for preparing data before analysis.
Hello, I'm Pipi.
Introduction
Recently I asked a practical Pandas question in a Python community group; the AI-generated answer was unsatisfactory, so I consulted a senior teacher and will share the provided code.
Implementation Process
The original fan data contains irregularities such as spaces, Chinese characters, asterisks, and missing brackets, which must be cleaned before further analysis.
The teacher's specific code is shown below:
import re
import pandas as pd
result = []
# Open original txt file
with open('data.txt', 'r', encoding='utf-8') as f:
# Read line by line
for line in f:
if '[' in line and ']' in line:
result.append(eval(line))
elif '数据' in line:
if "备注" not in line:
line = re.sub(r"[\u4e00-\u9fa5【】!\]", "", line).strip()
result.append(eval(line))
elif "*" not in line:
line = line.replace("]", "")
result.append(eval(line))
res = [result[i:i+3] for i in range(0, len(result), 3)]
df = pd.DataFrame(res)
df = df.explode([1, 2], ignore_index=True)
df.columns = ["日期", "股票代码", "data"]
print(df)
df.to_excel("data.xlsx")Running the script produces the expected DataFrame, as shown:
Summary
This article presented a practical Pandas problem, detailed the data cleaning steps, and provided a complete code solution to help the fan successfully resolve the issue.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
