Why a Tiny RPC Change Crashed Our Service: 4 GB OOM Bug Explained

A seemingly harmless RPC framework bug caused a 4 GB byte array allocation, leading to repeated OutOfMemoryErrors in service B after service A’s deployment, and the article walks through the diagnosis, root‑cause analysis, and a simple fix.

Java Interview Crash Guide
Java Interview Crash Guide
Java Interview Crash Guide
Why a Tiny RPC Change Crashed Our Service: 4 GB OOM Bug Explained

Case Overview

Online systems often encounter OutOfMemory (OOM) errors not because of business code but due to bugs in underlying open‑source components.

System Architecture

Services communicate via an RPC framework built on a custom wrapper.

Incident

Service A was updated and redeployed; shortly after, service B crashed with OOM despite never having this issue before.

Log inspection on service B showed a java.lang.OutOfMemoryError Java heap space exception.

Initial Diagnosis

Reviewing the logs revealed that the OOM originated from the self‑developed RPC framework during request handling.

Memory Snapshot Analysis

Using MAT, the largest object in the heap was a massive byte[] array occupying the entire 4 GB heap. The array was allocated inside the RPC framework.

Source Code Analysis

Identify the component causing OOM by checking logs; often a framework like Tomcat, Jetty, or a custom RPC library.

Use heap analysis tools (e.g., MAT) to locate the biggest memory consumer and trace its references.

Inspect the source of the offending framework to understand its request‑processing flow.

The RPC framework serializes request objects into a byte[] buffer. When deserialization fails (e.g., due to mismatched Request class definitions between services), the framework allocates a default 4 GB buffer to store the raw bytes, instantly exhausting heap memory.

Root Cause

Service A’s engineers added new fields to the Request protobuf class without updating Service B. During deserialization, the mismatch caused failure, triggering the allocation of a 4 GB byte[] as a fallback, leading to OOM.

Solution

Reduce the default buffer size in the RPC framework from 4 GB to a reasonable limit such as 4 MB.

Ensure that Request class definitions remain consistent across all services.

After applying these changes, the OOM issue disappeared and service stability was restored.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

BackendJavaRPCmemory leakOutOfMemoryError
Java Interview Crash Guide
Written by

Java Interview Crash Guide

Dedicated to sharing Java interview Q&A; follow and reply "java" to receive a free premium Java interview guide.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.