Backend Development 10 min read

Debugging a Memory Leak in Baidu's Bigpipe Broker Using GDB and Live Process Inspection

This article presents a step‑by‑step case study of locating and fixing a memory‑leak problem in Baidu's Bigpipe Broker backend by analyzing a running leaking process with GDB, pmap, and custom scripts, highlighting the pitfalls of Valgrind and the importance of clear function naming.

Baidu Intelligent Testing

Dec 2, 2015

Debugging a Memory Leak in Baidu's Bigpipe Broker Using GDB and Live Process Inspection

In Baidu's Quality Assurance department, QA engineers regularly discover, locate, and drive the proper fixing of bugs; this report shares a detailed memory‑leak case from the Bigpipe Broker backend server.

Abstract: Memory leaks are common in backend services. Traditional tools like Valgrind require restarting the process, which can be impractical when the leak is hard to reproduce. The article explores using the already‑leaking process instance to pinpoint the leak.

1. Problem Description – The Broker module of Baidu's internal distributed transmission system Bigpipe uses an asynchronous framework and extensive reference‑counting to manage object lifetimes. Under prolonged load, memory usage steadily grows, indicating a leak.

2. Preliminary Analysis – A recent monitoring feature adds read operations on several parameter objects, each incrementing the reference count and decrementing it after use. The task is to identify which parameter object fails to release its reference.

3. Code & Business Analysis – Attempting to reproduce the leak with Valgrind proved ineffective because (a) the leak trigger is complex and may not reproduce, and (b) Valgrind does not report leaks for objects stored in containers after normal exit. Therefore, the live leaking process is used for debugging, with GDB as the primary tool.

Challenges – Attaching GDB to the process pauses it; if the paired master/slave Broker stops sending heartbeats, the Broker exits, losing the debugging window. Hence, only one attach attempt is feasible.

Solution – Use GDB to print memory information and infer the leak location.

Step 1: View memory map with pmap -x 24671 (replace 24671 with the actual PID) and note the anonymous (anon) regions.

Step 2: Launch GDB and attach to the process: gdb ./bin/broker then attach 24671.

Step 3: Enable full output and logging: set height 0 and set logging on. The log is saved to gdb.txt.

Step 4: Dump a memory region, e.g., for an anon heap of 144508 KB: x/18497024a 0x000000000109d000. Commands can be placed in a file (e.g., command.txt) and executed with source command.txt.

Step 5: Analyze gdb.txt. Each line shows an address, its raw hex values, and optional symbol information. For example, address 0x22c2f00 contains a virtual destructor pointer (vptr) for bigpipe::BigpipeDIEngine, whose vtable resides at 0x10200d0 and points to the destructor at 0x53e2c6.

By sorting and counting symbol occurrences in gdb.txt, the most frequent project‑related symbols (e.g., bmq, Bigpipe, bmeta) are extracted:

cat gdb.txt | grep "<" | awk -F '<' '{print $2}' | awk -F '>' '{print $1}' | sort | uniq -c | sort -rn > result.txt

Further filtering yields a list where the CConnect object appears most often, indicating it as the likely leak source.

Root Cause – Inspection of atomic_add revealed it returns the value *before* the increment. The caller mistakenly assumed it returned the post‑increment value, causing reference counts to never reach zero and preventing _free from being called. This subtle naming issue led to the leak.

5. Solution – The fix modifies the monitoring code to use the correct post‑increment value (or a more appropriately named function) and ensures reference counts are decremented properly.

6. Summary

1) Debugging asynchronous frameworks requires combining logs, GDB, pmap, and custom scripts to reproduce and locate leaks.

2) Valgrind is not the only tool; it has limitations for certain leak scenarios.

3) Function names should clearly indicate their behavior to avoid misuse.

4) Developers must read library documentation carefully to understand usage semantics.

5) The presented method works when the leaking process remains alive and its memory layout retains identifiable symbols; it may not help if the leak leaves no traceable symbols.

6) This approach complements other leak‑diagnosis techniques and can be a valuable addition to a developer's toolbox.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Backend Development C#reference-counting gdb debugging

Written by

Baidu Intelligent Testing

Welcome to follow.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.