Why Go’s regexp Is Slower Than Python – The Safety Trade‑offs Behind the Design
The article dissects why Go’s standard regexp package lags behind Python and other languages, tracing the slowdown to a pure‑Go implementation that avoids CGO, the choice of a Thompson‑NFA engine for safety, heavy UTF‑8 rune decoding, memory‑intensive NFA simulation, and shows how community projects like coregex reclaim performance while preserving Go’s safety guarantees.
When a Reddit benchmark showed Go’s regexp taking 38.1 seconds on a common log‑parsing pattern—far slower than Rust (1.3 s) or even Python—the Go community questioned the design of the regexp package.
Original "sin": avoiding CGO
Ian Lance Taylor explained that Go deliberately eschews the high‑performance PCRE library written in C because linking it would require CGO, which breaks Go’s cross‑compilation model and adds costly context switches. Instead, the team wrote a pure‑Go engine from scratch, sacrificing decades of C‑level optimisations.
Design trade‑off: safety over raw speed
Russ Cox, co‑creator of Go’s regex engine, insisted that system safety and predictable performance outweigh occasional peak speed. The engine therefore adopts a Thompson‑NFA (the same algorithm used by RE2) that guarantees linear‑time matching O(n) and eliminates catastrophic backtracking (ReDoS), unlike the backtracking NFA used by PCRE.
Where the performance loss occurs
UTF‑8 parsing overhead : Go decodes the input into Rune values for every character to honour Unicode semantics, incurring far more CPU work than byte‑level engines in Rust or C.
NFA virtual‑thread memory churn : The engine maintains two sparse‑set queues to simulate parallel NFA states. Each character read triggers massive slice allocations and copies, as shown by the hot functions (*machine).add and (*machine).step in the pprof profiles of Issues #19629 and #11646.
Attempts to transplant RE2’s DFA into Go were rejected because the dynamic state explosion would stress Go’s garbage collector and risk OOM failures.
Community response: the coregex project
Developer kolkov released coregex, a pure‑Go library that re‑introduces several performance tricks:
SIMD pre‑filtering : Hand‑written AVX2/SSSE3 assembly extracts static substrings and compares 32 bytes at once, yielding up to 1 500× speed‑ups for patterns like .*\.txt.
Lazy DFA with caching : Builds DFA states on‑the‑fly and caches them, avoiding repeated NFA construction.
Copy‑on‑write capture groups : Shares slice backing stores to cut allocation by roughly 50%.
In a CI benchmark on a 6 MB input, coregex processed email and URI patterns in 1.5 ms versus 260 ms for the standard library—a 170× improvement.
Practical guidance for Go developers
Even with a faster library, the Go team is unlikely to merge such changes soon. The article therefore recommends three rules of thumb:
Prefer built‑in string functions ( strings.Contains, strings.HasPrefix) over regex for simple substring checks.
Compile regexes once (e.g., in a global variable or init()) using regexp.MustCompile and reuse the compiled Regexp object, which is safe for concurrent use.
When extreme performance is required, consider breaking Go’s “purity” by using CGO bindings to PCRE or coregex, while remaining aware of ReDoS risks.
Ultimately, the slower performance is a conscious engineering decision that protects large‑scale, cloud‑native services from catastrophic failures, aligning with Go’s philosophy of “safety first, speed second.”
References
Reddit benchmark: https://www.reddit.com/r/golang/comments/1rr2evh/why_is_gos_regex_so_slow/
Go issue #26623, #19629, #11646
Russ Cox interview and RE2 design: https://swtch.com/~rsc/regexp/
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
TonyBai
Tony Bai's tech world (tonybai.com). Not satisfied with just "knowing how", we strive for mastery. Focused on Go language internals, high-quality engineering practices, and cloud‑native architecture, exploring cutting‑edge intersections of Go and AI. Gophers who pursue technology are welcome—follow me and evolve with Go.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
