Fundamentals 5 min read

How to Use Pandas for Data Processing and Office Automation

This tutorial introduces Pandas installation, basic DataFrame creation, CSV reading, merging, filtering, chunked processing, value modification, column management, and exporting, providing beginners with practical steps to leverage Pandas for data handling and automation tasks.

Python Programming Learning Circle

Jun 18, 2024

How to Use Pandas for Data Processing and Office Automation

Pandas is a powerful data ingestion and processing library widely used in data engineering and data science.

Before using Pandas, ensure Python is installed and the desired environment is activated, then install Pandas with pip install pandas and optionally create a Jupyter Notebook file with touch tutorial.ipynb.

In the notebook, import the library using import pandas as pd. You can read a CSV file into a DataFrame with people_df = pd.read_csv("./people.csv") or select specific columns using the usecols parameter, e.g.,

people_df = pd.read_csv("./people.csv", usecols=["first_name", "last_name"])

DataFrames can also be created from Python lists or dictionaries, for example:

list_df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9]], columns=["a", "b", "c"])

and

dict_df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6], "c": [7, 8, 9]})

For merging, the merge() method can join two DataFrames on a common column, as shown with:

people1_df = people_df[["first_name", "last_name", "genre"]][:500]
people2_df = people_df[["first_name", "last_name", "genre"]][500:]
merged_df = people1_df.merge(people2_df, how="inner", on="genre")

Filtering rows based on conditions is done with boolean indexing, for example:

filtered_df = people_df[(people_df["bot_score"] >= 0.5) & (people_df["bot_score"] <= 0.75)]

When handling large datasets, read the CSV in chunks using:

mean_chunk_ages = []
for people_chunk_df in pd.read_csv("./people.csv", chunksize=100):
    mean_chunk_ages.append(people_chunk_df["age"].mean())
print("每块的平均年龄：", mean_chunk_ages)

To modify values, define a function and apply it with:

def capitalize_words(text):
    return " ".join([word.capitalize() if word != "and" else word for word in text.split()])
people_df["genre"] = people_df["genre"].apply(capitalize_words)

Columns can be removed with people_df.drop("random_number", axis=1, inplace=True) or renamed with

people_df.rename({"birth_date:date": "birth_date", "bot_score:float": "bot_score"}, axis=1, inplace=True)

Finally, export the processed DataFrame to a CSV file using people_df.to_csv("./updated_people.csv", index=False). The tutorial emphasizes that Pandas is essential for data processing and office automation tasks.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

CSV dataframe Pandas Jupyter Notebook

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.