Operations 10 min read

Reflections on the 3rd eBPF Developer Conference: Harnessing eBPF for AI

The article recaps the 3rd eBPF Developer Conference in Xi'an, highlighting talks on BPF‑on‑MPTCP, system‑wide PGO, bperf, autonomous‑driving use cases, and AI‑driven observability, while sharing the author's insights on continuous profiling, SysOM, and future challenges of scaling eBPF with large models.

Linux Kernel Journey

May 5, 2025

Reflections on the 3rd eBPF Developer Conference: Harnessing eBPF for AI

Morning Main Forum

Tang Geliang (Kylin Software) presented BPF on MPTCP , showing how eBPF can sense per‑path bandwidth and latency in Multipath TCP and dynamically allocate traffic to avoid congestion.

Ren Yuxin (openEuler) described a whole‑system PGO solution that uses eBPF to collect runtime performance data across the OS, enabling profile‑guided optimizations for the entire stack.

Liu Song (Meta) introduced bperf , an eBPF‑based enhancement to the Linux perf subsystem that reduces overhead and improves measurement accuracy.

Chen Tao (Didi) shared a case study of eBPF in autonomous‑driving, where eBPF agents monitor vehicle state, network connectivity, and sensor streams in real time to detect and mitigate anomalies.

Round‑Table Discussion

Experts from academia and industry (including professors from Xiyou, Huawei, Alibaba Cloud, and Didi) examined eBPF’s characteristics and its evolution under large‑model AI. They identified two open research directions:

System for AI : using eBPF to observe GPU/CPU faults and performance metrics during model training and inference.

AI for System : applying large language models to correlate business‑level KPIs with low‑level Linux indicators.

Challenges highlighted included integrating AI workloads with eBPF instrumentation and the difficulty of debugging eBPF programs.

Parallel Sessions

Four sub‑tracks covered eBPF trends, networking & security, observability, and performance engineering. Cheng Shuyi demonstrated Coolbpf and an AI‑driven flame‑graph that visualizes CPU and GPU call stacks together, enabling rapid bottleneck identification with Perfetto support.

SysOM Intelligent Operations Platform

SysOM provides continuous profiling for both CPU and GPU. It collects low‑overhead, high‑precision samples from user‑space and kernel stacks, stores them in a backend, and offers a UI for differential analysis across instances, models, and GPU cards. The platform extends the traditional observability pillars—logs, tracing, metrics—by adding continuous profiling that merges user‑space and kernel‑space insights.

Remaining Challenges and Proposed Solutions

Deploying profiling at scale faces three main issues:

Massive data volume.

High collection cost.

Non‑trivial overhead.

Proposed mitigations include centralizing symbol resolution, using large‑model‑driven adaptive sampling rates, and tuning network parameters to reduce per‑node data transfer.

Additional work on Perfetto optimizations for GPU profiling was noted.

References

SysOM AI flame‑graph and profiling details: https://mp.weixin.qq.com/s?__biz=MzkyMjM4MTcwOQ==∣=2247485451&idx=1&sn=3a76911f89a20368c5aade6d9357ed1b

SysOM observability system construction (Part 1): https://mp.weixin.qq.com/s?__biz=MzkyMjM4MTcwOQ==∣=2247485414&idx=1&sn=eee2c25c903ce5041b81fe3692031893

Video tutorial code repository: https://github.com/haolipeng/libbpf-ebpf-beginer/tree/master/src

Associated blog post: https://github.com/haolipeng/study_cloud_security_public/blob/master/ebpf%E5%AD%A6%E4%B9%A0/ebpf%E5%BC%80%E5%8F%91%E6%89%8B%E6%8A%8A%E6%89%8B%E6%95%99%E5%AD%A6/%E7%AC%AC%E4%B8%80%E8%AF%BE%20helloworld%E7%A8%8B%E5%BA%8F.md

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance AI Observability Linux eBPF Profiling SysOM

Written by

Linux Kernel Journey

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.