Avoid Common Go Middleware Pitfalls: Lessons from Alibaba’s Experience
This article shares the most frequent Go middleware pitfalls encountered at Alibaba, explains their root causes—from uneven request distribution and CPU leaks to transaction mishandling and SQL incompatibilities—and provides concrete solutions and best‑practice recommendations to help developers avoid repeating these errors.
Background
Why write this? Past mistakes have shaped Alibaba's Go middleware. Some issues have been directly avoided, while others may still appear. Learning from others' pitfalls helps us avoid similar errors and understand middleware principles.
1. VipServer Request Imbalance (rand)
Symptom: Load testing shows traffic skewed to certain machines.
Cause: Business code calls rand.Seed(time.Now().Unix()), making the middleware’s random number generation deterministic per second.
Example:
package main
import (
"fmt"
"math/rand"
)
func main() {
for i := 0; i < 3; i++ {
rand.Seed(1234567890)
fmt.Println(rand.Float64())
fmt.Println(rand.Float64())
fmt.Println(rand.Float64())
}
}All three loops produce identical random numbers, leading to the same IPs being selected.
Solution: Use golang.org/x/exp/rand with an independent globalRand and seed it once at startup:
globalRand.Seed(uint64(time.Now().UnixNano()))Or upgrade to Go 1.22+ rand/v2, which disables external Seed calls.
2. VipServer Hits Only One Data Center
Symptom: Traffic only reaches a single data center, causing uneven load.
Cause: VipServer client refreshes the IP list every 30 seconds and iterates sequentially. With many IPs, the refresh may occur before the full list is traversed, causing repeated selection of the same subset.
Solution: New version uses a fully random selection with binary search, improving distribution and performance.
3. VipServer CPU & Memory Leak
Leak originates from a time.Ticker that isn’t stopped; pre‑Go 1.23 versions keep the ticker alive until the timer expires.
Key point: Always call ticker.Stop(). time.After also leaks memory in older versions.
Go 1.23 fixed time.After leakage.
4. VipServer Warm‑up Issue
Symptom: After a process starts, the first request to a VipServer key returns 404.
Cause: The client never listened for updates, using an expired snapshot.
Fix: Call SrvHost(vipServerKey) at startup to trigger real‑time updates.
5. Diamond Reads Stale Configuration
Problem: Local snapshot is used before remote config is fetched, leading to outdated settings.
Recommended order: Disaster‑recovery file → Server → Local cache.
6. MySQL Transaction tx vs db Misuse
Developers often forget Rollback or mistakenly use *sql.DB inside a transaction, causing connection leaks and inconsistent state.
Best practice: Use a helper like WithTransaction to manage commit/rollback automatically.
func WithTransaction(ctx context.Context, db *sql.DB, fn func(ctx context.Context, tx *sql.Tx) error) error {
_, err := WithTransactionRet(ctx, db, func(ctx context.Context, tx *sql.Tx) (any, error) {
return nil, fn(ctx, tx)
})
return err
}7. Corona (DRDS) Middleware Rejects uint
Upgrading go‑sql‑driver changes integer handling from int64 to uint64, which Corona doesn’t support.
Solution: Avoid uint types or stay on compatible driver versions.
8. Corona to TDDLX SQL Type Mismatch
Using integer values where Corona expects strings causes errors after migration.
Ensure parameter types match column definitions.
9. TDDLX Index Selection Issue
Minor differences in generated SQL (comments, aliases, LIMIT syntax) change the execution plan cache, leading to full‑table scans.
Fix: Align SQL formatting or force index usage.
10. TDDLX Routing Hash Bug
Early Go version hashed integers by converting them to strings first, producing wrong routing.
Upgrade to the latest TDDLX version.
11. MetaQ Consumer Subscribes Multiple Topics
MetaQ 4.x consumer groups cannot subscribe to multiple topics; create separate CIDs for each.
12. MetaQ Expansion Panic
New machines lacking CMDB info cause startup failures; adding environment labels to requests resolves the issue.
13. HSF Go Connection Pool Exhaustion
Default connection limit (700) caused errors; increasing to 10 000 solved the problem.
14. HSF Dependency Version Pitfall
Different strcase versions change camel‑case conversion, breaking method name compatibility.
Pin to github.com/iancoleman/strcase v0.2.0 unless already using v0.3.0.
Conclusion
Since 2019, countless contributors have identified and fixed these pitfalls, leading to a robust Go middleware ecosystem within Alibaba. By learning from these experiences, developers can avoid repeating the same mistakes.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
