Fundamentals 10 min read

Boost Pandas Data Processing Speed Up to 315× with Vectorized Techniques

This article walks through several pandas performance‑boosting methods—from naive for‑loops and iterrows to apply, .isin, pd.cut, and NumPy digitize—showing timing results and demonstrating how vectorized operations can accelerate hourly tariff calculations by hundreds of times.

Python Crawling & Data Mining

Mar 1, 2021

Boost Pandas Data Processing Speed Up to 315× with Vectorized Techniques

Introduction

The previous post demonstrated a 50× speed‑up using datetime tricks; this article shares even more common acceleration techniques for pandas data processing.

Naive for‑loop

A simple for loop that applies a custom apply_tariff function to each row takes several seconds for 8,760 rows.

def apply_tariff_loop(df):
    energy_cost_list = []
    for i in range(len(df)):
        energy_used = df.iloc[i]['energy_kwh']
        hour = df.iloc[i]['date_time'].hour
        energy_cost = apply_tariff(energy_used, hour)
        energy_cost_list.append(energy_cost)
    df['cost_cents'] = energy_cost_list

Using iterrows

Replacing the range‑based loop with df.iterrows() reduces the runtime to about 0.7 seconds.

def apply_tariff_iterrows(df):
    energy_cost_list = []
    for index, row in df.iterrows():
        energy_used = row['energy_kwh']
        hour = row['date_time'].hour
        energy_cost = apply_tariff(energy_used, hour)
        energy_cost_list.append(energy_cost)
    df['cost_cents'] = energy_cost_list

Using pandas apply

Applying the function with df.apply(..., axis=1) cuts the time further to roughly 0.27 seconds.

def apply_tariff_withapply(df):
    df['cost_cents'] = df.apply(
        lambda row: apply_tariff(kwh=row['energy_kwh'], hour=row['date_time'].hour),
        axis=1)

Vectorized .isin method

Creating Boolean masks for peak, shoulder, and off‑peak hours and assigning values with df.loc brings the runtime down to 0.01 seconds, a 315× improvement over the naive loop.

def apply_tariff_isin(df):
    peak_hours = df.index.hour.isin(range(17, 24))
    shoulder_hours = df.index.hour.isin(range(7, 17))
    off_peak_hours = df.index.hour.isin(range(0, 7))
    df.loc[peak_hours, 'cost_cents'] = df.loc[peak_hours, 'energy_kwh'] * 28
    df.loc[shoulder_hours, 'cost_cents'] = df.loc[shoulder_hours, 'energy_kwh'] * 20
    df.loc[off_peak_hours, 'cost_cents'] = df.loc[off_peak_hours, 'energy_kwh'] * 12

Using pd.cut

Leveraging pd.cut to bin hours and multiply by the appropriate rate reduces the average runtime to 0.272 seconds.

def apply_tariff_cut(df):
    cents_per_kwh = pd.cut(
        x=df.index.hour,
        bins=[0, 7, 17, 24],
        include_lowest=True,
        labels=[12, 20, 28]
    ).astype(int)
    df['cost_cents'] = cents_per_kwh * df['energy_kwh']

Using NumPy digitize

Applying np.digitize with a price array yields the fastest result—about 0.002 seconds for the same dataset.

def apply_tariff_digitize(df):
    prices = np.array([12, 20, 28])
    bins = np.digitize(df.index.hour.values, bins=[7, 17, 24])
    df['cost_cents'] = prices[bins] * df['energy_kwh'].values

Conclusion

Vectorized operations, especially those based on Boolean indexing, pd.cut, or np.digitize, dramatically outperform Pythonic loops and even the apply method, making them the preferred choice for large‑scale time‑based calculations in pandas.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python vectorization data-processing

Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.