Backend Development 13 min read

Boosting Packet Forwarding with DPDK Graph Pipeline: L3FWD Example and ACL Node Performance

This article demonstrates how to use DPDK's Graph API for L3 packet forwarding, introduces a custom ACL node, and presents performance comparisons of different packet‑transfer mechanisms and batch sizes within the graph pipeline.

ByteDance SYS Tech

Feb 19, 2025

Boosting Packet Forwarding with DPDK Graph Pipeline: L3FWD Example and ACL Node Performance

Example: l3fwd-graph

DPDK provides the sample application l3fwd-graph to illustrate how the Graph API can be used for L3 packet forwarding.

Packet flow in l3fwd-graph

The processing chain consists of the following nodes:

ethdev‑rx: receive packets on port 0 via rte_eth_rx_burst pkt cls: classify packets (IPv4, IPv6, etc.) and forward unknown types to pkt drop ipv4 lookup: route lookup based on destination IP, miss goes to pkt drop ipv4 rewrite: modify headers such as TTL and checksum

ethdev‑tx: transmit packets on port 1, failures go to pkt drop pkt drop: release packets

Overall, packets are received on port 0, processed through the node chain, and transmitted on port 1.

ACL node design

An ACL node is added to demonstrate custom node creation. The node implements a simple rule set (accept or drop) based on five‑tuple matching.

struct {<br/>    char mapped[NB_SOCKETS];<br/>    struct rte_acl_ctx *acx_ipv4[NB_SOCKETS];<br/>} acl_config;<br/><br/>static uint16_t pkt_acl_node_process(struct rte_graph *graph, struct rte_node *node,<br/>                void **objs, uint16_t nb_objs);<br/><br/>static struct rte_acl_ctx* setup_acl(struct rte_acl_rule *route_base,<br/>        struct rte_acl_rule *acl_base, unsigned int route_num,<br/>        unsigned int acl_num, int socketid);<br/><br/>int rte_node_acl_rules_setup(const char *rule_path, int numa_on,<br/>                            uint32_t enabled_port_mask);<br/><br/>static int pkt_acl_node_init(const struct rte_graph *graph, struct rte_node *node);<br/><br/>struct rte_node_register pkt_acl_node = {<br/>    .process = pkt_acl_node_process,<br/>    .name = "pkt_acl",<br/>    .init = pkt_acl_node_init,<br/>    .nb_edges = PKT_ACL_NEXT_MAX,<br/>    .next_nodes = {<br/>        [PKT_ACL_NEXT_PKT_CLS] = "pkt_cls",<br/>        [PKT_ACL_NEXT_PKT_DROP] = "pkt_drop",<br/>    },<br/>};<br/>RTE_NODE_REGISTER(pkt_acl_node);

The .process function classifies packets with rte_acl_classify and enqueues them to either pkt_cls or pkt_drop based on the ACL result.

rte_acl_classify(acl_config.acx_ipv4[socketid], acl_search.data_ipv4,<br/>    acl_search.res_ipv4, acl_search.num_ipv4, DEFAULT_MAX_CATEGORIES);<br/>for (i = 0; i < acl_search.num_ipv4; i++) {<br/>    pkt = acl_search.m_ipv4[i];<br/>    acl_res = acl_search.res_ipv4[i];<br/>    if (likely((acl_res & ACL_DENY_SIGNATURE) == 0 && acl_res != 0)) {<br/>        rte_node_enqueue_x1(graph, node, PKT_ACL_NEXT_PKT_CLS, pkt);<br/>    } else {<br/>        rte_node_enqueue_x1(graph, node, PKT_ACL_NEXT_PKT_DROP, pkt);<br/>    }<br/>}

Performance testing

The experiment compares three packet‑transfer mechanisms—pointer swap, memory copy, and pointer assignment—and evaluates the impact of batch splitting on throughput.

Test environment

Intel(R) Xeon(R) Platinum 8336C CPU @ 2.30 GHz

128 cores (1 core bound to the graph program)

54 MB LLC cache

Node design

All nodes are kept minimal. The default node size is 256 packets per graph walk.

Source node

static inline void rte_node_next_stream_put(struct rte_graph *graph, struct rte_node *node,<br/>             rte_edge_t next, uint16_t idx) {<br/>    uint16_t count;<br/>    int i;<br/>    RTE_SET_USED(objs);<br/>    RTE_SET_USED(nb_objs);<br/>    for (i = 0; i < node->ctx[0]; i++) {<br/>        count = (node->ctx[i + 9] * RTE_GRAPH_BURST_SIZE) / 100;<br/>        rte_node_next_stream_get(graph, node, node->ctx[i + 1], count);<br/>        rte_node_next_stream_put(graph, node, node->ctx[i + 1], count);<br/>    }<br/>    return RTE_GRAPH_BURST_SIZE;<br/>}

Worker node

static uint16_t test_perf_node_worker(struct rte_graph *graph, struct rte_node *node,<br/>              void **objs, uint16_t nb_objs) {<br/>    uint16_t next = 0;<br/>    uint16_t count;<br/>    void **to_next;<br/>    int i;<br/>    if (node->ctx[0] == 1) {<br/>        rte_node_next_stream_move(graph, node, node->ctx[1]);<br/>        return nb_objs;<br/>    }<br/>    for (i = 0; i < node->ctx[0]; i++) {<br/>        next = node->ctx[i + 1];<br/>        count = (node->ctx[i + 9] * nb_objs) / 100;<br/>        to_next = rte_node_next_stream_get(graph, node, next, nb_objs);<br/>        while (count) {<br/>            rte_memcpy(to_next, objs, 8 * sizeof(objs[0]));<br/>            to_next += 8;<br/>            objs += 8;<br/>            count -= 8;<br/>            rte_node_next_stream_put(graph, node, next, 8);<br/>        }<br/>    }<br/>    return nb_objs;<br/>}

Destination node

static uint16_t test_perf_node_sink(struct rte_graph *graph, struct rte_node *node,<br/>    void **objs, uint16_t nb_objs) {<br/>    return nb_objs;<br/>}

Results

Images below illustrate the throughput of each transfer method and the effect of splitting batches.

Pointer swap yields the highest performance, followed by memory copy and then pointer assignment. Memory copy still outperforms pointer assignment, though its efficiency varies with packet count.

Batch splitting shows linear scalability: doubling the number of batches results in a comparable percentage drop in throughput, confirming the graph pipeline’s robustness for horizontal scaling.

Conclusion

The DPDK Graph Pipeline provides a flexible way to build packet‑processing graphs. Adding a custom ACL node demonstrates modularity, and performance tests reveal that pointer swap is the most efficient packet‑transfer mechanism, while memory copy remains a solid alternative. The pipeline also scales predictably when the workload is divided into multiple batches, making it advantageous over traditional linear pipelines.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Network Programming DPDK ACL Graph Pipeline packet forwarding

Written by

ByteDance SYS Tech

Focused on system technology, sharing cutting‑edge developments, innovation and practice, and analysis of industry tech hotspots.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.