Useful but Uncommon Pandas Functions: to_period, cumsum, groupby, and Category dtype
This article demonstrates several lesser‑known but highly useful Pandas functions—including to_period for period conversion, cumsum with groupby for cumulative sums, and the memory‑efficient Category dtype—through a step‑by‑step example DataFrame with code snippets and output illustrations.
In this tutorial we showcase some uncommon yet very handy Pandas functions using a sample DataFrame with three columns (date, class, amount) and 100 rows.
First, we create the DataFrame:
import numpy as np
import pandas as pd
df = pd.DataFrame({
"date": pd.date_range(start="2021-11-20", periods=100, freq="D"),
"class": ["A","B","C","D"] * 25,
"amount": np.random.randint(10, 100, size=100)
})
df.head()The DataFrame contains a continuous date column, a categorical class column with four distinct values, and a random integer amount column.
1. to_period
The to_period method converts datetime values to a specific time period such as month ("M") or quarter ("Q"), enabling proper time‑series grouping.
df["month"] = df["date"].dt.to_period("M")
df["quarter"] = df["date"].dt.to_period("Q")
df.head()We can view the counts of each month and quarter:
df["month"].value_counts()
# output
2021-12 31
2022-01 31
2022-02 27
2021-11 11
Freq: M, Name: month, dtype: int64
--------------------------
df["quarter"].value_counts()
# output
2022Q1 58
2021Q4 42
Freq: Q-DEC, Name: quarter, dtype: int642. cumsum and groupby
The cumsum function computes the cumulative sum of a column. Applied directly it gives the running total of amount:
df["cumulative_sum"] = df["amount"].cumsum()
df.head()To obtain cumulative sums per class, we combine groupby with cumsum: df["class_cum_sum"] = df.groupby("class")["amount"].cumsum() Viewing the result for class "A”:
df[df["class"] == "A"].head()The new column class_cum_sum contains cumulative totals calculated separately for each class.
3. Category dtype
Columns with a limited set of values can be stored as the category dtype, which uses less memory than the default object type.
df.dtypes
# output
date datetime64[ns]
class object
amount int64
month period[M]
quarter period[Q-DEC]
cumulative_sum int64
class_cum_sum int64Convert the class column to a categorical type:
df["class_category"] = df["class"].astype("category")
df.dtypes
# output includes
class_category categoryMemory usage comparison shows the categorical column consumes less than half the memory of the object column:
df.memory_usage()
# output
Index 128
date 800
class 800
amount 800
month 800
quarter 800
cumulative_sum 800
class_cum_sum 800
class_category 304
dtype: int64Although the difference is modest for this small dataset (496 bytes), it scales dramatically with larger data, providing significant space savings.
END
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
