Fundamentals 9 min read

Why Is Python So Slow? Boost Speed 1000× with NumPy UFuncs

This article examines Python's notorious performance lag, explains why its dynamic typing and object overhead make simple loops sluggish, and demonstrates how NumPy's universal functions can accelerate reciprocal calculations by over a thousand times, outperforming even compiled languages.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
Why Is Python So Slow? Boost Speed 1000× with NumPy UFuncs

1. How slow is Python really?

Python often ranks at the bottom of language speed contests because it is interpreted, but languages like Java are also interpreted yet much faster. A benchmark using a traditional for loop to compute the reciprocal of one million numbers shows Python taking about 3.37 seconds, while C finishes in 9 ms, C# in 19 ms, Node.js in 26 ms, and Java in 5 ms.

import numpy as np
np.random.seed(0)
values = np.random.randint(1, 100, size=1000000)

def get_reciprocal(values):
    output = np.empty(len(values))
    for i in range(len(values)):
        output[i] = 1.0/values[i]
%timeit get_reciprocal(values)

The result: each loop averages 3.37 seconds (±582 ms) over seven runs.

2. The root cause of Python's slowness

Python is a dynamically‑typed language where every variable is an object. Each operation requires unboxing, type checking, and attribute lookup, which adds significant overhead inside loops. In contrast, compiled languages access data directly without such checks.

Even a simple assignment like a = 1 involves two steps: setting the object's type code to Integer and storing the value.

Step 1: Set a->PyObject_HEAD->typecode to Integer.

Step 2: Assign the value 1 to a->val.

3. The answer: NumPy universal functions (UFuncs)

NumPy arrays are built around C arrays, so accessing elements does not require type checks. Using a UFunc to compute the reciprocal of an entire array eliminates the loop overhead.

import numpy as np
np.random.seed(0)
values = np.random.randint(1, 100, size=1000000)
%timeit result = 1.0/values

This vectorized version runs in about 2.71 ms (±50.8 µs), roughly 2.7 ms per loop, a speedup of more than a thousand times compared to the pure Python loop.

4. Summary

For Python developers handling numeric data, storing values in NumPy arrays or Pandas DataFrames (which are based on NumPy) allows the use of UFuncs for massive speed gains. Operations that once took seconds can now finish faster than equivalent C code, making Python surprisingly fast when leveraged correctly.

5. Appendix – Test code for C, C#, Java, and Node.js

C:

#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>

int main(){
    struct timeval stop, start;
    int length = 1000000;
    int rand_array[length];
    float output_array[length];
    for(int i = 0; i<length; i++){
        rand_array[i] = rand();
    }
    gettimeofday(&start, NULL);
    for(int i = 0; i<length; i++){
        output_array[i] = 1.0/(rand_array[i]*1.0);
    }
    gettimeofday(&stop, NULL);
    printf("took %lu us
", (stop.tv_sec - start.tv_sec) * 1000000 + stop.tv_usec - start.tv_usec);
    return 0;
}

C# (.NET 5.0):

using System;
namespace speed_test{
    class Program{
        static void Main(string[] args){
            int length = 1000000;
            double[] rand_array = new double[length];
            double[] output = new double[length];
            var rand = new Random();
            for(int i =0; i<length;i++){
                rand_array[i] = rand.Next();
            }
            long start = DateTimeOffset.Now.ToUnixTimeMilliseconds();
            for(int i =0; i<length;i++){
                output[i] = 1.0/rand_array[i];
            }
            long end = DateTimeOffset.Now.ToUnixTimeMilliseconds();
            Console.WriteLine(end - start);
        }
    }
}

Java:

import java.util.Random;

public class speed_test {
    public static void main(String[] args){
        int length = 1000000;
        long[] rand_array = new long[length];
        double[] output = new double[length];
        Random rand = new Random();
        for(int i =0; i<length; i++){
            rand_array[i] = rand.nextLong();
        }
        long start = System.currentTimeMillis();
        for(int i =0; i<length; i++){
            output[i] = 1.0/rand_array[i];
        }
        long end = System.currentTimeMillis();
        System.out.println(end - start);
    }
}

Node.js:

let length = 1000000;
let rand_array = [];
let output = [];
for(var i=0;i<length;i++){
    rand_array[i] = Math.floor(Math.random()*10000000);
}
let start = (new Date()).getMilliseconds();
for(var i=0;i<length;i++){
    output[i] = 1.0/rand_array[i];
}
let end = (new Date()).getMilliseconds();
console.log(end - start);

Original article: https://python.plainenglish.io/a-solution-to-boost-python-speed-1000x-times-c9e7d5be2f40

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PythonprogrammingBenchmarkNumPyUFunc
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.