Tagged articles
11 articles
Page 1 of 1
DeepHub IMBA
DeepHub IMBA
Mar 25, 2026 · Artificial Intelligence

TPU Architecture and Pallas Kernels: From Memory Hierarchy to FlashAttention

This article explains why TPU programming differs from GPU, describes the explicit HBM‑VMEM‑register data movement required on TPU, introduces the Pallas grid‑BlockSpec‑Ref model, and walks through four progressively more complex kernels—including element‑wise add, tiled dot product, fused RMSNorm with scratch memory, and a production‑grade FlashAttention implementation—showing how each kernel maps to the TPU memory hierarchy and leverages Pallas features such as input_output_aliases and PrefetchScalarGridSpec.

FlashAttentionJAXMemory Hierarchy
0 likes · 20 min read
TPU Architecture and Pallas Kernels: From Memory Hierarchy to FlashAttention
Deepin Linux
Deepin Linux
Jan 11, 2026 · Fundamentals

Mastering Linux Kernel Linked Lists: From Theory to High‑Performance Code

This article explains the design, implementation, and practical use of the Linux kernel's intrusive linked‑list data structure, covering its core concepts, list_head definition, common macros, insertion, deletion, traversal, optimization techniques, concurrency control with RCU and memory barriers, and real‑world examples in device drivers and process scheduling.

Data StructuresLinux kernelconcurrency
0 likes · 37 min read
Mastering Linux Kernel Linked Lists: From Theory to High‑Performance Code
Big Data Technology Tribe
Big Data Technology Tribe
Jun 11, 2025 · Fundamentals

Mastering eBPF with BCC: A Step‑by‑Step Guide to Building the opensnoop Tool

This article outlines the standard BCC workflow for creating eBPF tools, then dissects the opensnoop source code, covering requirement analysis, kernel‑space program writing, BPF map configuration, user‑space Python integration, argument handling, testing, optimization, and deployment steps to monitor open system calls.

BCCLinux tracingPython
0 likes · 13 min read
Mastering eBPF with BCC: A Step‑by‑Step Guide to Building the opensnoop Tool
Linux Code Review Hub
Linux Code Review Hub
Mar 22, 2025 · Fundamentals

Inside Linux inotify: How the Kernel Tracks File Changes

This article dissects Linux’s inotify mechanism, detailing the kernel data structures, system‑call flow, and functions that generate and deliver file‑system event notifications, complete with code walkthroughs and diagrams. It explains how read/write operations trigger event creation, how events are queued in inotify_device, and how user‑space processes retrieve them via read and poll interfaces.

File MonitoringLinux kernelevent-handling
0 likes · 14 min read
Inside Linux inotify: How the Kernel Tracks File Changes
Linux Kernel Journey
Linux Kernel Journey
Sep 27, 2024 · Fundamentals

Understanding eBPF Ringbuf: Design, API, and Comparison

The article explains the motivation, design, and API of the new multi‑producer single‑consumer eBPF Ring Buffer, compares it with perf buffers and other alternatives, and provides complete BPF and userspace code examples demonstrating reservation, commit, and polling of events while preserving ordering across CPUs.

BPF_MAP_TYPE_RINGBUFRing BuffereBPF
0 likes · 8 min read
Understanding eBPF Ringbuf: Design, API, and Comparison
Linux Kernel Journey
Linux Kernel Journey
Sep 2, 2024 · Backend Development

How eBPF Powers Modern Software Network Functions

The article examines why eBPF has become a core building block for cloud‑native network functions, outlines its performance, security and flexibility advantages, discusses technical challenges such as memory constraints and missing SIMD support, and presents the eNetSTL library that mitigates these issues with concrete design details and benchmark results.

Performance OptimizationeBPFeNetSTL
0 likes · 12 min read
How eBPF Powers Modern Software Network Functions
Liangxu Linux
Liangxu Linux
Apr 5, 2024 · Fundamentals

Why Zero‑Length Arrays Matter in Linux Kernel Development

This article explains what zero‑length arrays are, how they are defined in C, why they appear frequently in the Linux kernel as flexible array members, and provides a complete kernel‑style implementation showing creation, expansion, and cleanup of a dynamically sized integer array.

CDynamic memory allocationFlexible array member
0 likes · 10 min read
Why Zero‑Length Arrays Matter in Linux Kernel Development
Liangxu Linux
Liangxu Linux
Jul 3, 2021 · Fundamentals

What’s the Hidden Trick Behind This Unusual C Macro in Kernel Code?

The author discovers an unusual macro definition in kernel source that omits the usual parameter list, explains the standard macro usage, shows the original and altered code snippets, and highlights why the macro without arguments works, offering a practical insight for C and embedded developers.

C languageC macrosEmbedded C
0 likes · 2 min read
What’s the Hidden Trick Behind This Unusual C Macro in Kernel Code?