How to Use Pandas for Data Processing and Office Automation
This tutorial introduces Pandas installation, basic DataFrame creation, CSV reading, merging, filtering, chunked processing, value modification, column management, and exporting, providing beginners with practical steps to leverage Pandas for data handling and automation tasks.
Pandas is a powerful data ingestion and processing library widely used in data engineering and data science.
Before using Pandas, ensure Python is installed and the desired environment is activated, then install Pandas with pip install pandas and optionally create a Jupyter Notebook file with touch tutorial.ipynb .
In the notebook, import the library using import pandas as pd . You can read a CSV file into a DataFrame with people_df = pd.read_csv("./people.csv") or select specific columns using the usecols parameter, e.g., people_df = pd.read_csv("./people.csv", usecols=["first_name", "last_name"]) .
DataFrames can also be created from Python lists or dictionaries, for example: list_df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9]], columns=["a", "b", "c"]) and dict_df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6], "c": [7, 8, 9]}) .
For merging, the merge() method can join two DataFrames on a common column, as shown with: people1_df = people_df[["first_name", "last_name", "genre"]][:500] people2_df = people_df[["first_name", "last_name", "genre"]][500:] merged_df = people1_df.merge(people2_df, how="inner", on="genre")
Filtering rows based on conditions is done with boolean indexing, for example: filtered_df = people_df[(people_df["bot_score"] >= 0.5) & (people_df["bot_score"] <= 0.75)]
When handling large datasets, read the CSV in chunks using: mean_chunk_ages = [] for people_chunk_df in pd.read_csv("./people.csv", chunksize=100): mean_chunk_ages.append(people_chunk_df["age"].mean()) print("每块的平均年龄:", mean_chunk_ages)
To modify values, define a function and apply it with: def capitalize_words(text): return " ".join([word.capitalize() if word != "and" else word for word in text.split()]) people_df["genre"] = people_df["genre"].apply(capitalize_words)
Columns can be removed with people_df.drop("random_number", axis=1, inplace=True) or renamed with people_df.rename({"birth_date:date": "birth_date", "bot_score:float": "bot_score"}, axis=1, inplace=True) .
Finally, export the processed DataFrame to a CSV file using people_df.to_csv("./updated_people.csv", index=False) . The tutorial emphasizes that Pandas is essential for data processing and office automation tasks.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.