Boost Python Performance with Numba: Real-World Pandas Benchmarks
This article introduces Numba, explains its parallel, distributed, and GPU acceleration capabilities, and provides detailed Pandas benchmark examples that show how Numba’s JIT compilation dramatically speeds up data‑frame operations compared with default and Cython engines.
Introduction to Numba
Numba is a library that uses LLVM to compile Python functions to optimized machine code at runtime, delivering speeds close to C or Fortran without requiring a separate compiler or interpreter changes.
Typical Use Cases
Parallel computing on CPU and GPU with minimal code changes.
Distributed computing via compatibility with Dask and Spark.
GPU acceleration using NVIDIA CUDA for data‑intensive and deep‑learning workloads.
Numba with Pandas – Example Benchmarks
First a synthetic DataFrame with five numeric columns and a categorical column is created.
import pandas as pd
import numpy as np
data = np.random.rand(int(1e5), 5)
df = pd.DataFrame(data=data, columns=list("ABCDE"))
df["Type"] = np.random.choice(["Class1","Class2"], size=len(df))Case 1: Setting the engine
Comparing the default engine, the Cython engine, and the Numba engine shows that Numba reduces execution time to about 5.62 ms, a clear performance gain.
%time out = rolling_df.mean()
# CPU times: user 14.1 ms, sys 402 µs, total 14.5 ms
%time out = rolling_df.mean(engine='cython')
# CPU times: user 12.6 ms, sys 0 ns, total 12.6 ms
%time out = rolling_df.mean(engine='numba')
# CPU times: user 5.62 ms, sys 2.26 ms, total 7.88 msCase 2: Custom function
Compiling a user‑defined function with the Numba engine cuts the runtime roughly in half compared with the default engine.
def custom_mean(x):
return (x * x).mean()
%time out = rolling_df.apply(custom_mean, raw=True)
# CPU times: user 2.89 s, sys 392 µs, total 2.89 s
%time out = rolling_df.apply(custom_mean, engine='cython', raw=True)
# CPU times: user 2.88 s, sys 3.62 ms, total 2.89 s
%time out = rolling_df.apply(custom_mean, engine='numba', raw=True)
# CPU times: user 1.23 s, sys 117 µs, total 1.23 sCase 3: Numba JIT
Using the @jit decorator to JIT‑compile the same function further reduces the runtime, with the Numba‑JIT version completing in about 951 ms.
from numba import jit, njit, vectorize, float64
@jit(cache=True)
def custom_mean_jitted(x):
return (x * x).mean()
%time out = rolling_df.apply(custom_mean_jitted, raw=True)
# CPU times: user 951 ms, sys 0 ns, total 951 msCase 4: Python loop JIT
JIT‑compiling a manual loop with Numba yields the fastest result (≈689 ms) compared with the plain Python implementation.
@jit(float64(float64[:]), nopython=True, cache=True)
def custom_mean_loops_jitted(x):
out = 0.0
for i in x:
out += (i*i)
return out / len(x)
%time out = rolling_df.apply(custom_mean_loops_jitted, raw=True)
# CPU times: user 689 ms, sys 0 ns, total 689 msSigned-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Sohu Tech Products
A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
