Forget Kafka: A Lightweight Go Queue Achieves 2 Million Messages per Second
The article analyzes how replacing Kafka with a simple in‑memory Go queue reduced architectural complexity, boosted throughput from 240‑330 K to 1.8‑2.0 M messages per second, and clarified debugging, while still acknowledging scenarios where Kafka remains the better choice.
Using Kafka as a job queue turned a straightforward backend system into a complex distributed‑systems exercise, forcing engineers to monitor consumer lag, partition behavior, retry settings, node health, serialization paths, and offset states before even reaching business logic.
Root Cause of the Pain
The team conflated the appeal of a "production‑grade" platform with actual maturity needs, treating Kafka as a vanity architecture that became the most fragile component after six weeks.
What a Queue Really Needs
The essential requirements are fast data ingestion, bounded memory usage, predictable back‑pressure, batch processing, retry handling, and a clear recovery path when a worker crashes. Features such as endless replay, multiple downstream subscribers, or elaborate partition strategies are unnecessary for their workload.
Lightweight Go Queue Design
Producers push jobs into an in‑memory ring buffer. Workers process tasks in batches, and failed jobs are routed to a delayed‑retry channel. A compact acknowledgment log combined with periodic disk flushes provides reliable recovery without per‑message complexity. The hot path stays minimal, keeping the system focused on a single task.
Performance Results
Internal load tests showed a throughput of 1.8‑2.0 M messages per second, compared with 240‑330 K msg/sec for a Kafka‑based queue. The biggest difference stemmed from less coordination, fewer hops, and larger batch sizes.
Internal Test Snapshot
Workload: small jobs, fixed payload shape, batched workers
Path Peak Throughput
Kafka‑based queue 240K–330K msg/sec
Lightweight Go queue 1.8M–2.0M msg/sec
Biggest difference: less coordination, fewer hops, larger batch winsBeyond raw numbers, the simplified queue made the system easier to understand: overload reasons appear in one place, worker slowdown is observable centrally, and retry spikes are visible next to the code rather than hidden in multi‑layer ops processes.
When Kafka Still Shines
If the problem requires durable event history, many independent consumers, long replay windows, strict ordering, or a cross‑service data‑flow backbone, Kafka remains the optimal choice.
Conclusion
For most internal task‑transport scenarios, a simple, fast, and transparent queue is wiser than an over‑engineered platform. The author urges engineers to match tools to actual needs rather than defaulting to heavyweight solutions that add hidden debt.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
