CSV Trimming: A Python Package for Cleaning Messy CSV Files
CSV Trimming is a lightweight Python library that transforms irregular, poorly formatted CSV files into clean, well‑structured tables with a single line of code, supporting basic trimming as well as advanced row‑correlation handling for complex datasets.
CSV Trimming is a Python package designed to convert chaotic CSV files—often obtained from websites, legacy systems, or poorly managed data—into clean, well‑formatted CSVs using just one line of code, without requiring complex configuration or large language models.
Installation
pip install csv_trimmingBasic Usage
from csv_trimming import CSVTrimmer
# Load your csv
csv = pd.read_csv("path/to/csv.csv")
# Instantiate the trimmer
trimmer = CSVTrimmer()
# And trim it
trimmed_csv = trimmer.trim(csv)
# That's it!The package can clean a messy input CSV such as the example shown, removing stray symbols, empty cells, and misaligned rows, producing a tidy table with only the relevant columns.
Advanced Feature – Row Correlation
When rows are split across multiple lines (a common issue in real‑world CSVs), CSV Trimmer can merge them by providing a callback that defines which rows are related.
def simple_correlation_callback(current_row: pd.Series, next_row: pd.Series) -> Tuple[bool, pd.Series]:
"""Return the correlation between two rows."""
# All of the rows that have a subsequent correlated row are
# non‑empty, and the subsequent correlated rows are always
# with the first cell empty.
if pd.isna(next_row.iloc[0]) and all(pd.notna(current_row)):
return True, pd.concat([
current_row,
pd.Series({"surname": next_row.iloc[-1]}),
])
return False, current_row
trimmer = CSVTrimmer(simple_correlation_callback)
result = trimmer.trim(csv)Using this callback, the library merges split rows and produces a final CSV where each logical record occupies a single row, as demonstrated by the before‑and‑after tables in the original article.
For more details and the source code, visit the project repository at https://github.com/LucaCappelletti94/csv_trimming .
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.