How We Traced and Fixed a Netty Off‑Heap Memory Leak in a WebSocket Service

When a WebSocket‑based service built on Netty started returning massive 5xx errors, we used log analysis, CAT monitoring, reflective access to Netty's internal memory counter, and step‑by‑step debugging to locate and fix an off‑heap memory leak caused by a null subType field in the encoder.

Meituan Technology Team
Meituan Technology Team
Meituan Technology Team
How We Traced and Fixed a Netty Off‑Heap Memory Leak in a WebSocket Service

Netty is an asynchronous event‑driven network framework built on JDK NIO, simplifying TCP/UDP socket programming.

In a WebSocket‑based long‑connection middleware we used the netty‑socketio library (a Netty implementation of the Socket.IO protocol). During production we observed frequent 5xx errors from Nginx.

Using Meituan’s open‑source monitoring platform CAT we discovered two anomalies at the same timestamp: a GC spike and JVM thread blockage.

Problem

An alert indicated massive 5xx responses, suggesting the backend service was unavailable.

Investigation Process

Stage 1 – Suspect log4j2

We first checked log4j2 configuration and found a console appender that printed excessive logs, blocking NIO threads. Disabling it did not stop the 5xx alerts.

Stage 2 – Suspicious log entries

Log files showed repeated lines like failed to allocate 64(bytes) of direct memory(...) and an OutOfDirectMemoryError, indicating off‑heap memory exhaustion.

Stage 3 – Locate OOM source

We traced the Netty class PlatformDependent, which updates the static counter DIRECT_MEMORY_COUNTER before each off‑heap allocation and throws a custom OOM error when the limit is exceeded.

Stage 4 – Reflective monitoring

Since CAT did not report off‑heap usage accurately, we used Java reflection to access DIRECT_MEMORY_COUNTER and printed its value every second.

Stage 5 – Growth pattern

After deployment the counter started at 16 MiB (the default chunk size) and then grew slowly without being released, eventually reaching nearly 1 GiB over a weekend.

Stage 6 – Local reproduction

Running the service locally with non‑pooled memory, we observed that each WebSocket disconnect caused an immediate 256 B increase in off‑heap memory that never decreased.

Stage 7 – Source‑level debugging

Stepping through the code we narrowed the leak to the encoder.encodePacket() path, where a null subType caused an NPE and prevented the allocated memory from being released.

Stage 8 – Bug fix

We fixed the NPE by ensuring subType is set (e.g., to DISCONNECT), rebuilt the library, and pushed the changes to our internal repository.

Stage 9 – Local verification

After the fix, repeated connect‑disconnect cycles no longer increased off‑heap memory.

Stage 10 – Production verification

We instrumented the custom counter to report to CAT; the metric remained stable, confirming the leak was resolved.

Conclusion

Off‑heap memory leaks can be diagnosed by careful log analysis and reflective monitoring.

Netty’s internal counter can be accessed without third‑party tools.

Systematic narrowing, thread‑level debugging, and binary search in the code are effective for locating leaks.

IDE debugging shortcuts (pre‑execution, thread stack inspection) accelerate the process.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

BackendDebuggingJavaNettymemory leakWebSocketOff-Heap Memory
Meituan Technology Team
Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.