Accelerating Pandas apply: Up to 600× Speedup with Swifter, Vectorization, dtype Conversion, and .values
This article demonstrates how to dramatically speed up the slow pandas apply function—by up to six hundred times—using Swifter for parallel execution, vectorized pandas/numpy operations, dtype downcasting, and direct .values array manipulation, with detailed timing comparisons.
Although packages like Dask and cuDF can accelerate data processing, many users still rely on pandas, whose apply function is notoriously slow; this article shows techniques to speed it up by about 600×.
Baseline (Apply only): Applying a custom function to a 1,000,000‑row DataFrame takes 18.4 seconds . <code>import pandas as pd import numpy as np df = pd.DataFrame(np.random.randint(0, 11, size=(1000000, 5)), columns=('a','b','c','d','e')) def func(a,b,c,d,e): if e == 10: return c*d elif e < 10 and e >= 5: return c+d else: return a+b df['new'] = df.apply(lambda x: func(x['a'],x['b'],x['c'],x['d'],x['e']), axis=1) </code>
1. Swifter parallelization: Installing and using swifter reduces the wall time to 7.67 seconds . <code>import swifter df['new'] = df.swifter.apply(lambda x: func(x['a'],x['b'],x['c'],x['d'],x['e']), axis=1) </code>
2. Vectorization with pandas/numpy: Rewriting the logic as vectorized column operations brings the execution time down to 421 ms . <code>df['new'] = df['c'] * df['d'] # e == 10 mask = df['e'] < 10 df.loc[mask, 'new'] = df['c'] + df['d'] mask = df['e'] < 5 df.loc[mask, 'new'] = df['a'] + df['b'] </code>
3. Downcasting column dtypes to int16 : This further cuts the time to 116 ms . <code>for col in ('a','b','c','d'): df[col] = df[col].astype(np.int16) # same vectorized operations as above </code>
4. Using .values (numpy arrays): Performing calculations on the underlying numpy arrays reduces the wall time to 74.9 ms . <code>df['new'] = df['c'].values * df['d'].values mask = df['e'].values < 10 df.loc[mask, 'new'] = df['c'] + df['d'] mask = df['e'].values < 5 df.loc[mask, 'new'] = df['a'] + df['b'] </code>
Experiment summary: The timings improve from 18.4 s (plain apply) → 7.67 s (apply + Swifter) → 421 ms (vectorized) → 116 ms (vectorized + int16) → 74.9 ms (vectorized + int16 + .values), demonstrating that a combination of parallelism, vectorization, dtype optimization, and direct array access can accelerate pandas workflows by several orders of magnitude.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.