Fundamentals 8 min read

Boost Python Performance 100× with Taichi: Real‑World Speedup Examples

Discover how importing the Taichi library can accelerate Python code by up to 100 times, with detailed examples ranging from prime counting and longest common subsequence dynamic programming to reaction‑diffusion simulations, including performance metrics, GPU support, and concise code snippets.

MaGe Linux Operations

Oct 7, 2022

Boost Python Performance 100× with Taichi: Real‑World Speedup Examples

It is well known that Python’s simplicity and readability come at the cost of performance, especially in compute‑intensive cases such as nested for loops. According to Hu Yuanming, importing the Taichi library can increase execution speed by up to 100×.

Just import a library called "Taichi" and you can boost code speed 100 times !

Prime counting speed ×120

The first example counts all prime numbers less than a given positive integer N. The standard Python implementation takes 2.235 seconds for N = 1,000,000:

After importing Taichi and adding two decorators (no changes to the function body), the same task finishes in 0.363 seconds, a nearly 6× speedup. For N = 10,000,000 the runtime drops from 55 seconds to 0.8 seconds, a 70× improvement. Enabling GPU execution with ti.init(arch=ti.gpu) yields a 120× speedup over the original Python code.

Dynamic programming (LCS) speed ×500

Using the classic longest common subsequence (LCS) problem from "Introduction to Algorithms", the Taichi implementation solves the problem in 0.9 seconds, whereas a NumPy version requires 476 seconds—over 500× slower.

Reaction‑diffusion equation: impressive results

The reaction‑diffusion model simulates pattern formation using two chemicals U and V. The governing equations involve diffusion rates Du, Dv, feed rate f, and kill rate k. Taichi implements the model on a grid, updating concentrations with a kernel that accesses neighboring cells via a Laplacian operator.

The kernel code (shown below) runs in fewer than ten lines and can be executed on the GPU, achieving over 300 fps, far surpassing a Numba implementation that caps at around 30 fps.

@ti.kernel
def compute(phase: int):
    for i, j in ti.ndrange(W, H):
        cen = uv[phase, i, j]
        lapl = uv[phase, i+1, j] + uv[phase, i, j+1] + uv[phase, i-1, j] + uv[phase, i, j-1] - 4.0 * cen
        du = Du * lapl[0] - cen[0] * cen[1] * cen[1] + feed * (1 - cen[0])
        dv = Dv * lapl[1] + cen[0] * cen[1] * cen[1] - (feed + kill) * cen[1]
        val = cen + 0.5 * tm.vec2(du, dv)
        uv[1 - phase, i, j] = val

Installation

Taichi can be installed with a single command: pip install taichi Taichi is a domain‑specific language embedded in Python that compiles @ti.kernel -decorated functions to run on CPUs or GPUs, providing high‑performance computing without the need to write C++/CUDA code. It interoperates with popular libraries such as NumPy, Matplotlib, and PyTorch.

For a detailed comparison of Taichi with other acceleration methods, see the original documentation.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

GPU Acceleration dynamic programming Taichi Numerical Computing Python performance

Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.