Fundamentals 7 min read

Boost Python Performance with Numba: Real-World Pandas Benchmarks

This article introduces Numba, explains its parallel, distributed, and GPU acceleration capabilities, and provides detailed Pandas benchmark examples that show how Numba’s JIT compilation dramatically speeds up data‑frame operations compared with default and Cython engines.

Sohu Tech Products
Sohu Tech Products
Sohu Tech Products
Boost Python Performance with Numba: Real-World Pandas Benchmarks

Introduction to Numba

Numba is a library that uses LLVM to compile Python functions to optimized machine code at runtime, delivering speeds close to C or Fortran without requiring a separate compiler or interpreter changes.

Typical Use Cases

Parallel computing on CPU and GPU with minimal code changes.

Distributed computing via compatibility with Dask and Spark.

GPU acceleration using NVIDIA CUDA for data‑intensive and deep‑learning workloads.

Numba with Pandas – Example Benchmarks

First a synthetic DataFrame with five numeric columns and a categorical column is created.

import pandas as pd
import numpy as np

data = np.random.rand(int(1e5), 5)
df = pd.DataFrame(data=data, columns=list("ABCDE"))
df["Type"] = np.random.choice(["Class1","Class2"], size=len(df))

Case 1: Setting the engine

Comparing the default engine, the Cython engine, and the Numba engine shows that Numba reduces execution time to about 5.62 ms, a clear performance gain.

%time out = rolling_df.mean()
# CPU times: user 14.1 ms, sys 402 µs, total 14.5 ms

%time out = rolling_df.mean(engine='cython')
# CPU times: user 12.6 ms, sys 0 ns, total 12.6 ms

%time out = rolling_df.mean(engine='numba')
# CPU times: user 5.62 ms, sys 2.26 ms, total 7.88 ms

Case 2: Custom function

Compiling a user‑defined function with the Numba engine cuts the runtime roughly in half compared with the default engine.

def custom_mean(x):
    return (x * x).mean()

%time out = rolling_df.apply(custom_mean, raw=True)
# CPU times: user 2.89 s, sys 392 µs, total 2.89 s

%time out = rolling_df.apply(custom_mean, engine='cython', raw=True)
# CPU times: user 2.88 s, sys 3.62 ms, total 2.89 s

%time out = rolling_df.apply(custom_mean, engine='numba', raw=True)
# CPU times: user 1.23 s, sys 117 µs, total 1.23 s

Case 3: Numba JIT

Using the @jit decorator to JIT‑compile the same function further reduces the runtime, with the Numba‑JIT version completing in about 951 ms.

from numba import jit, njit, vectorize, float64

@jit(cache=True)
def custom_mean_jitted(x):
    return (x * x).mean()

%time out = rolling_df.apply(custom_mean_jitted, raw=True)
# CPU times: user 951 ms, sys 0 ns, total 951 ms

Case 4: Python loop JIT

JIT‑compiling a manual loop with Numba yields the fastest result (≈689 ms) compared with the plain Python implementation.

@jit(float64(float64[:]), nopython=True, cache=True)
def custom_mean_loops_jitted(x):
    out = 0.0
    for i in x:
        out += (i*i)
    return out / len(x)

%time out = rolling_df.apply(custom_mean_loops_jitted, raw=True)
# CPU times: user 689 ms, sys 0 ns, total 689 ms
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

performancePythonJITBenchmarkpandasnumba
Sohu Tech Products
Written by

Sohu Tech Products

A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.