Fundamentals 5 min read

How to Clean Messy CSV Data with Python Pandas: A Step-by-Step Guide

This article walks through cleaning irregular fan data—removing spaces, Chinese characters, asterisks, and missing brackets—using Python's pandas library, provides the full code snippet, demonstrates the resulting DataFrame, and shares practical tips for preparing data before analysis.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
How to Clean Messy CSV Data with Python Pandas: A Step-by-Step Guide

Hello, I'm Pipi.

Introduction

Recently I asked a practical Pandas question in a Python community group; the AI-generated answer was unsatisfactory, so I consulted a senior teacher and will share the provided code.

Implementation Process

The original fan data contains irregularities such as spaces, Chinese characters, asterisks, and missing brackets, which must be cleaned before further analysis.

Raw data example
Raw data example

The teacher's specific code is shown below:

Code screenshot
Code screenshot
import re
import pandas as pd

result = []
# Open original txt file
with open('data.txt', 'r', encoding='utf-8') as f:
    # Read line by line
    for line in f:
        if '[' in line and ']' in line:
            result.append(eval(line))
        elif '数据' in line:
            if "备注" not in line:
                line = re.sub(r"[\u4e00-\u9fa5【】!\]", "", line).strip()
                result.append(eval(line))
        elif "*" not in line:
            line = line.replace("]", "")
            result.append(eval(line))

res = [result[i:i+3] for i in range(0, len(result), 3)]
df = pd.DataFrame(res)
df = df.explode([1, 2], ignore_index=True)
df.columns = ["日期", "股票代码", "data"]
print(df)
df.to_excel("data.xlsx")

Running the script produces the expected DataFrame, as shown:

Result screenshot
Result screenshot

Summary

This article presented a practical Pandas problem, detailed the data cleaning steps, and provided a complete code solution to help the fan successfully resolve the issue.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

data cleaningdata preprocessingpandasCode Tutorial
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.