Tagged articles
1 articles
Page 1 of 1
Alibaba Cloud Developer
Alibaba Cloud Developer
Dec 26, 2024 · Cloud Native

How a New Telemetry Service Overwhelmed OpenAI’s Kubernetes API Server

An in‑depth post‑mortem reveals how OpenAI’s newly deployed telemetry service generated massive Kubernetes API requests, overloading the API server, breaking DNS resolution, and slowing recovery, while contrasting OpenAI’s approach with LoongCollector/iLogtail’s design to minimize API load and improve cluster stability.

API ServerCloud NativeCluster Reliability
0 likes · 15 min read
How a New Telemetry Service Overwhelmed OpenAI’s Kubernetes API Server