Accelerating NumPy with CuPy: GPU Speedup Demonstrations
This article explains how the CuPy library leverages NVIDIA CUDA GPUs to replace NumPy for array operations, provides installation steps, and presents benchmark code showing up to 700× speed improvements for large‑scale matrix computations compared to CPU‑based NumPy.
NumPy is a fundamental Python library for multi‑dimensional array and matrix operations, offering significant speed gains over pure Python loops, but its performance is limited to the CPU and the modest core counts of typical consumer processors.
CuPy is introduced as a GPU‑accelerated counterpart that mirrors the NumPy API; by installing the CUDA‑enabled library, users can replace np with cp to run the same code on NVIDIA GPUs and obtain massive parallel speedups.
Installation is straightforward using the Python package manager: pip install cupy After installation, the libraries are imported in the same way as NumPy: import numpy as np<br/>import cupy as cp<br/>import time Benchmark 1 creates a 1 × 10⁹ element 3‑D array and measures the time on CPU (NumPy) versus GPU (CuPy). NumPy takes about 1.68 seconds, while CuPy completes the task in roughly 0.16 seconds, a 10.5× speedup.
Benchmark 2 multiplies the entire array by 5. NumPy requires 0.507 seconds, whereas CuPy finishes in 0.00071 seconds, delivering a 714.1× acceleration.
Benchmark 3 performs three operations—scalar multiplication, element‑wise multiplication, and addition to itself—on a 10‑million‑point array. NumPy needs 1.49 seconds, while CuPy completes the sequence in 0.0922 seconds, a 16.16× improvement.
The article notes that speed gains grow dramatically once the data size exceeds ten million elements; for arrays larger than 100 million points, GPU acceleration becomes especially pronounced, though NumPy may still be faster for very small datasets. GPU memory capacity also limits the maximum array size that CuPy can handle.
In summary, CuPy provides a drop‑in, GPU‑accelerated solution for NumPy‑based numerical workloads, delivering multi‑order‑of‑magnitude performance improvements for large‑scale array computations.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
