Investigation and Resolution of Service Availability Fluctuations in a High‑QPS Go Backend Service
An investigation of a 100k‑QPS Go monolith revealed that intermittent availability drops were caused by a memory‑leak in the third‑party gcache LFU implementation, which inflated GC work and produced long mark phases; upgrading gcache eliminated the leak and restored 0.999+ availability, highlighting the need for thorough observability and dependency monitoring.
