Fundamentals 13 min read

Why Go’s regexp Is Slower Than Python – The Safety Trade‑offs Behind the Design

The article dissects why Go’s standard regexp package lags behind Python and other languages, tracing the slowdown to a pure‑Go implementation that avoids CGO, the choice of a Thompson‑NFA engine for safety, heavy UTF‑8 rune decoding, memory‑intensive NFA simulation, and shows how community projects like coregex reclaim performance while preserving Go’s safety guarantees.

TonyBai
TonyBai
TonyBai
Why Go’s regexp Is Slower Than Python – The Safety Trade‑offs Behind the Design

When a Reddit benchmark showed Go’s regexp taking 38.1 seconds on a common log‑parsing pattern—far slower than Rust (1.3 s) or even Python—the Go community questioned the design of the regexp package.

Original "sin": avoiding CGO

Ian Lance Taylor explained that Go deliberately eschews the high‑performance PCRE library written in C because linking it would require CGO, which breaks Go’s cross‑compilation model and adds costly context switches. Instead, the team wrote a pure‑Go engine from scratch, sacrificing decades of C‑level optimisations.

Design trade‑off: safety over raw speed

Russ Cox, co‑creator of Go’s regex engine, insisted that system safety and predictable performance outweigh occasional peak speed. The engine therefore adopts a Thompson‑NFA (the same algorithm used by RE2) that guarantees linear‑time matching O(n) and eliminates catastrophic backtracking (ReDoS), unlike the backtracking NFA used by PCRE.

Where the performance loss occurs

UTF‑8 parsing overhead : Go decodes the input into Rune values for every character to honour Unicode semantics, incurring far more CPU work than byte‑level engines in Rust or C.

NFA virtual‑thread memory churn : The engine maintains two sparse‑set queues to simulate parallel NFA states. Each character read triggers massive slice allocations and copies, as shown by the hot functions (*machine).add and (*machine).step in the pprof profiles of Issues #19629 and #11646.

Attempts to transplant RE2’s DFA into Go were rejected because the dynamic state explosion would stress Go’s garbage collector and risk OOM failures.

Community response: the coregex project

Developer kolkov released coregex, a pure‑Go library that re‑introduces several performance tricks:

SIMD pre‑filtering : Hand‑written AVX2/SSSE3 assembly extracts static substrings and compares 32 bytes at once, yielding up to 1 500× speed‑ups for patterns like .*\.txt.

Lazy DFA with caching : Builds DFA states on‑the‑fly and caches them, avoiding repeated NFA construction.

Copy‑on‑write capture groups : Shares slice backing stores to cut allocation by roughly 50%.

In a CI benchmark on a 6 MB input, coregex processed email and URI patterns in 1.5 ms versus 260 ms for the standard library—a 170× improvement.

Practical guidance for Go developers

Even with a faster library, the Go team is unlikely to merge such changes soon. The article therefore recommends three rules of thumb:

Prefer built‑in string functions ( strings.Contains, strings.HasPrefix) over regex for simple substring checks.

Compile regexes once (e.g., in a global variable or init()) using regexp.MustCompile and reuse the compiled Regexp object, which is safe for concurrent use.

When extreme performance is required, consider breaking Go’s “purity” by using CGO bindings to PCRE or coregex, while remaining aware of ReDoS risks.

Ultimately, the slower performance is a conscious engineering decision that protects large‑scale, cloud‑native services from catastrophic failures, aligning with Go’s philosophy of “safety first, speed second.”

References

Reddit benchmark: https://www.reddit.com/r/golang/comments/1rr2evh/why_is_gos_regex_so_slow/

Go issue #26623, #19629, #11646

Russ Cox interview and RE2 design: https://swtch.com/~rsc/regexp/

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PerformanceGoSIMDCGOregexpcoregexregex engine
TonyBai
Written by

TonyBai

Tony Bai's tech world (tonybai.com). Not satisfied with just "knowing how", we strive for mastery. Focused on Go language internals, high-quality engineering practices, and cloud‑native architecture, exploring cutting‑edge intersections of Go and AI. Gophers who pursue technology are welcome—follow me and evolve with Go.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.