Backend Development 16 min read

Unlocking High‑Performance Packet Processing: Inside DPDK’s libgraph Framework

DPDK’s libgraph, inspired by VPP’s vector packet processing, replaces traditional scalar packet handling with a graph‑based pipeline that improves i‑cache locality, reduces pointer copies, and enables flexible, high‑throughput networking across multiple CPUs and NICs, while introducing trade‑offs such as memory safety risks.

ByteDance SYS Tech

Feb 2, 2023

Unlocking High‑Performance Packet Processing: Inside DPDK’s libgraph Framework

Background

DPDK (Data Plane Development Kit) is a high‑performance packet‑processing framework that runs on major NICs, accelerators and CPU architectures (x86, Arm, Power) under Linux, FreeBSD and Windows, and is widely used in gateways, load balancers, SDN and virtual switches.

libgraph Design

The libgraph component of DPDK adopts the vector‑packet‑processing ideas from the open‑source VPP project. Unlike the scalar model where each function processes a single packet, libgraph processes a batch of packets (a vector) in a graph of nodes, reducing i‑cache misses and pointer‑copy overhead.

Scalar vs Vector Processing

In the scalar model, packets are passed between functions via simple pointer assignments, which can cause high i‑cache miss rates and inefficient memory copies when pipelines become complex. The vector model allows optimized memory copies or even pointer swaps, improving cache locality and decoupling business logic from memory allocation.

Advantages of the Graph Framework

Better i‑cache management and locality.

Flexible pipeline model abstracted from business logic.

Reduced pointer copying.

Nodes can accumulate packets from previous nodes, improving batch performance.

Table‑driven node scheduling that facilitates QoS.

Graph Workflow

A graph consists of nodes connected by edges. At runtime, a circular buffer (cir_start) with head and tail pointers tracks pending streams. The graph walk iterates from source nodes, locating the next node, preparing its objects, invoking its process function, and updating head/tail until no pending streams remain.

Node Object Queue Size

Each node has an object queue of size RTE_GRAPH_BURST_SIZE. If the downstream node’s queue is full, the upstream node’s queue size is dynamically doubled, enabling batch processing. The current API does not support shrinking the queue after it has grown.

Enqueue Mechanisms

Two enqueue types exist:

Normal enqueue : packets are copied or pointer‑assigned to the next node using rte_memcpy or similar.

Home run : when all processed packets go to a single downstream node whose queue is empty, the pointers of the two nodes are swapped, eliminating copy overhead.

/* Home run scenario */
/* Swap the pointers if dst don't have valid objs */
if (likely(dst->idx == 0)) {
    void **dobjs = dst->objs;
    uint16_t dsz = dst->size;
    dst->objs = src->objs;
    dst->size = src->size;
    src->objs = dobjs;
    src->size = dsz;
    dst->idx = src->idx;
    __rte_node_enqueue_tail_update(graph, dst);
}

Core Components

The main components are the graph (global scheduler) and node (encapsulates business logic). A struct node contains fields such as name, flags, process, init, fini functions, identifiers, parent ID, edge count, and next‑node names.

struct node {
    STAILQ_ENTRY(node) next;              /* Next node in the list. */
    char name[RTE_NODE_NAMESIZE];         /* Name of the node. */
    uint64_t flags;                       /* Node config flag - source node? */
    rte_node_process_t process;           /* Node process function. */
    rte_node_init_t init;                 /* Node init function. */
    rte_node_fini_t fini;                 /* Node fini function. */
    rte_node_t id;                        /* Allocated identifier for the node. */
    rte_node_t parent_id;                 /* Parent node identifier. */
    rte_edge_t nb_edges;                  /* Number of edges from this node. */
    char next_nodes[][RTE_NODE_NAMESIZE]; /* Names of next nodes. */
};

Nodes are registered via constructor attributes, and a source node must exist to start the pipeline; non‑source nodes are added to the pending stream dynamically.

Summary

DPDK’s libgraph provides a vector‑based graph pipeline that reduces i‑cache misses, abstracts business logic, and offers flexible scheduling, but it introduces potential memory‑safety risks, increased code complexity, and a steep learning curve for beginners. It is best suited for complex pipelines where performance gains outweigh these trade‑offs.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Network Performance DPDK packet processing libgraph vector pipeline

Written by

ByteDance SYS Tech

Focused on system technology, sharing cutting‑edge developments, innovation and practice, and analysis of industry tech hotspots.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.