Cloud Native 18 min read

How Minsheng Bank Built eBPF‑Based Observability for Cloud‑Native Services

The article details Minsheng Bank's step‑by‑step journey from traditional network monitoring to a full‑stack, zero‑intrusion observability platform built with DeepFlow, vTap, distributed data collection, and eBPF, illustrating concrete case studies and future plans for expanding business‑level monitoring.

Linux Code Review Hub

Jan 29, 2024

How Minsheng Bank Built eBPF‑Based Observability for Cloud‑Native Services

Background

Minsheng Bank needed to move from network‑centric troubleshooting to cloud‑native observability because business continuity demands required faster fault isolation.

Traditional flow analysis platform

~7‑8 years ago a platform mirrored traffic via switch port mirroring, filtered and labeled it, and fed it to monitoring systems, providing data services to transaction, security and big‑data platforms.

Observability evolution

Phase 1 – vTap‑based traffic distribution

Deployed DeepFlow collectors on compute nodes to capture east‑west traffic inside containers and VMs using vTap. Traffic was encapsulated in VxLAN tunnels and sent to a unified aggregation platform, covering the virtual‑network blind spot.

Phase 2 – Distributed data collection

Raw container traffic saturated the aggregation layer. Introduced in‑collector processing: metrics, logs and trace data are computed locally, then only structured data is sent upstream. This reduced bandwidth pressure and enabled full‑path tracing across four TCP points (client pod NIC → server pod NIC).

Phase 3 – eBPF application observability

Leveraged eBPF to capture zero‑intrusion application‑level data (function calls, metrics, logs) for services such as Nginx, DNS and Redis. Provided call‑chain tracing, flame‑graph analysis and CPU profiling, extending observability from network to system and application layers.

Phase 4 – Data‑plane unification and exploration

Ingested Prometheus, SkyWalking and Tingyun metrics to build a unified data foundation. Explored WebAssembly‑based deep‑packet and system‑call decoding to expose business‑level fields (transaction IDs, response codes).

Case studies

Web service latency of 3.04 s traced to a backend oms‑app pod via eBPF call‑chain tracing.

Retail service latency ~500 ms identified as a slow acuiagwapp service.

HTTP 502 error traced to a DNS resolution failure in the service‑side DNS request.

High‑frequency DNS queries from tpp‑pay‑* pods revealed inefficient kube‑dns configuration, prompting DNS optimization.

Benefits

Full‑stack, proactive observability reduces fault‑location time, provides data services for network, system and application teams, and supports faster root‑cause analysis.

Future work

Plan to achieve 100 % kernel‑version coverage for eBPF by 2024, integrate deeper APM data from SkyWalking and Tingyun, and extend business‑transaction monitoring using WebAssembly‑based packet and syscall decoding.

DeepFlow

DeepFlow is an open‑source observability product that uses eBPF for zero‑code metric, trace and log collection and smart tagging for universal correlation. Repository: https://github.com/deepflowio/deepflow.

cloud-native observability distributed tracing eBPF Network Monitoring DeepFlow

Written by

Linux Code Review Hub

A professional Linux technology community and learning platform covering the kernel, memory management, process management, file system and I/O, performance tuning, device drivers, virtualization, and cloud computing.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Background

Traditional flow analysis platform

Observability evolution

Phase 1 – vTap‑based traffic distribution

Phase 2 – Distributed data collection

Phase 3 – eBPF application observability

Phase 4 – Data‑plane unification and exploration

Case studies

Benefits

Future work

DeepFlow

Linux Code Review Hub

How this landed with the community

Was this worth your time?

0 Comments

Phase 1 – vTap‑based traffic distribution

Phase 2 – Distributed data collection

Phase 3 – eBPF application observability

Phase 4 – Data‑plane unification and exploration