Fundamentals 54 min read

Unlocking Linux Networking: The Essential Role of sk_buff Explained

sk_buff is the backbone of Linux’s network stack, handling packet storage, metadata, memory management and protocol‑layer interactions; this article dissects its structure, pointer model, core operations, packet lifecycle, practical code examples, and common pitfalls such as memory shortage, data loss and performance bottlenecks.

Deepin Linux
Deepin Linux
Deepin Linux
Unlocking Linux Networking: The Essential Role of sk_buff Explained

1. Introduction to sk_buff

In the Linux kernel, sk_buff (socket buffer) is the fundamental data structure that carries every network packet through the entire networking subsystem, from NIC reception to protocol‑stack processing and back to the device driver.

1.1 What is sk_buff?

sk_buff is a versatile container that holds the actual packet data and a rich set of metadata required for routing, checksum verification, protocol identification, and memory management. It can be visualised as a parcel where the payload is the cargo and the metadata is the shipping label.

1.2 Why understanding sk_buff matters

Without a solid grasp of sk_buff you cannot reason about packet encapsulation/decapsulation, efficient memory handling, or debug core modules such as network drivers and Netfilter. It is the key to unlocking the inner workings of the Linux network stack.

2. Core Structure of sk_buff

2.1 Structure fields

Memory‑layout fields: next, prev, qlen, lock – link sk_buffs into a doubly‑linked list and protect concurrent access.

Data storage fields: head, end, data, tail, len, data_len, mac_len – define the buffer boundaries and the length of linear data.

Protocol‑header pointers: mac_header, network_header, transport_header – give fast access to Ethernet, IP and TCP/UDP headers.

Other key fields: sk (socket), dev (net device), csum (checksum), flag bits such as cloned and nohdr.

2.2 The four‑pointer memory model (head, data, tail, end)

head – points to the beginning of the allocated buffer.

data – points to the first byte of the current valid payload.

tail – points just past the last byte of the current payload.

end – points to the end of the allocated buffer.

These pointers are manipulated by helper functions: skb_put(skb, len) moves tail forward, increases len, and returns a pointer to the newly added space. skb_pull(skb, len) moves data forward, decreases len, and returns the new data start. skb_push(skb, len) moves data backward, creates room for a new header, and returns the new header start. skb_reserve(skb, len) advances data and tail without changing len, reserving headroom for later header insertion.

3. Packet Lifecycle in the Kernel

3.1 Reception path

When a NIC receives a frame, DMA copies it into a driver‑provided buffer, triggers an interrupt, and the driver allocates an sk_buff with alloc_skb(). The driver copies the frame into the linear part of the sk_buff, initialises head, data, tail and end, and then passes the buffer up the stack:

// Allocate sk_buff for the incoming frame
struct sk_buff *skb = alloc_skb(len, GFP_ATOMIC);
if (!skb) return NET_RX_DROP;
skb_put(skb, len);
memcpy(skb->data, rx_buffer, len);
// Strip Ethernet header and hand to IP layer
skb_pull(skb, sizeof(struct ethhdr));
netif_receive_skb(skb);

The IP layer parses the IP header, checks the destination address, and either forwards the packet or passes it to the appropriate transport protocol.

3.2 Forwarding path

If the destination IP is not local, the kernel performs a route lookup, decrements the TTL, updates the IP checksum, possibly fragments the packet, rewrites the MAC header, and finally hands the sk_buff to the driver for transmission.

if (!is_local_ip(skb->daddr)) {
    struct rtable *rt = ip_route_output(&skb->daddr);
    ip_hdr(skb)->ttl--;
    ip_hdr(skb)->check = 0;
    ip_hdr(skb)->check = ip_fast_csum(ip_hdr(skb), ip_hdr(skb)->ihl);
    eth_hdr(skb)->h_source = out_dev->dev_addr;
    eth_hdr(skb)->h_dest   = rt->next_hop_mac;
    dev_queue_xmit(skb);
    return;
}

3.3 Transmission path

For locally generated traffic, the application’s data is copied into an sk_buff, then the TCP (or UDP) layer pushes its header, the IP layer pushes its header, and finally the link‑layer pushes the Ethernet header before the driver queues the packet on the NIC.

// Allocate and fill payload
struct sk_buff *skb = alloc_skb(payload_len + IP_HDR_LEN + TCP_HDR_LEN, GFP_KERNEL);
if (!skb) return -ENOMEM;
skb_reserve(skb, IP_HDR_LEN + TCP_HDR_LEN);
memcpy(skb_put(skb, payload_len), payload, payload_len);
// Add TCP header
struct tcphdr *tcp = skb_push(skb, TCP_HDR_LEN);
/* fill tcp fields */
// Add IP header
struct iphdr *ip = skb_push(skb, IP_HDR_LEN);
/* fill ip fields */
// Add Ethernet header and send
struct ethhdr *eth = skb_push(skb, sizeof(struct ethhdr));
/* fill eth fields */
dev_queue_xmit(skb);

4. Core Operations on sk_buff

4.1 Allocation and release

alloc_skb(size, gfp)

creates a new buffer; dev_kfree_skb(skb) releases it, handling reference counting to avoid leaks.

struct sk_buff *skb = alloc_skb(1500, GFP_KERNEL);
if (!skb) return -ENOMEM;
/* use skb */
dev_kfree_skb(skb);

4.2 Data‑space management

Typical workflow: skb_reserve() – reserve headroom for later headers. skb_put() – append payload. skb_push() – prepend protocol headers. skb_pull() – strip headers during reception.

Examples are shown in the code snippets above.

4.3 Cloning and copying

skb_clone(skb, gfp)

creates a lightweight clone that shares the data buffer; reference counting ensures the data is freed only when the last clone is released.

struct sk_buff *c1 = skb_clone(skb, GFP_KERNEL);
struct sk_buff *c2 = skb_clone(skb, GFP_KERNEL);
/* c1 and c2 can be processed independently */
dev_kfree_skb(c1);
dev_kfree_skb(c2);

When a modification is required, pskb_copy() makes a deep copy so the original data remains untouched.

struct sk_buff *copy = pskb_copy(skb, GFP_KERNEL);
if (!copy) return;
/* modify copy safely */
dev_kfree_skb(copy);

4.4 Concatenation, trimming and sharing

Concatenation uses repeated skb_put() calls; trimming uses skb_pull() for the head and skb_trim() for the tail. Sharing is achieved through reference‑count helpers such as skb_get() and skb_share_check().

// Concatenate two fragments
struct sk_buff *skb = alloc_skb(len1 + len2, GFP_KERNEL);
skb_put(skb, len1);
memcpy(skb->data, data1, len1);
skb_put(skb, len2);
memcpy(skb->data + len1, data2, len2);

// Trim MAC header (14 bytes) and CRC (4 bytes)
skb_pull(skb, 14);
skb_trim(skb, skb->len - 4);

5. End‑to‑End Example (TCP Send/Receive)

The following sequence demonstrates how a user‑space send() call is transformed into a fully‑encapsulated Ethernet frame and how an incoming frame is processed back to the application.

// ----- Send path -----
struct sk_buff *skb = alloc_skb(data_len + IP_HDR_LEN + TCP_HDR_LEN, GFP_KERNEL);
if (!skb) return -ENOMEM;
skb_reserve(skb, IP_HDR_LEN + TCP_HDR_LEN);
memcpy(skb_put(skb, data_len), user_buf, data_len);
struct tcphdr *tcp = skb_push(skb, TCP_HDR_LEN);
/* fill TCP fields */
struct iphdr *ip = skb_push(skb, IP_HDR_LEN);
/* fill IP fields */
struct ethhdr *eth = skb_push(skb, sizeof(struct ethhdr));
/* fill Ethernet fields */
dev_queue_xmit(skb);

// ----- Receive path -----
struct sk_buff *skb = alloc_skb(len, GFP_ATOMIC);
if (!skb) return NET_RX_DROP;
skb_put(skb, len);
memcpy(skb->data, rx_buf, len);
// Strip Ethernet header
skb_pull(skb, sizeof(struct ethhdr));
// Strip IP header
skb_pull(skb, ip_hdrlen(skb));
// Strip TCP header
skb_pull(skb, tcp_hdrlen(skb));
// Deliver payload to socket
sock_recvmsg(sk, skb->data, skb->len, flags);
dev_kfree_skb(skb);

6. Common Issues and Solutions

6.1 Memory shortage / skb allocation failure

High‑concurrency workloads can exhaust the pool of sk_buffs. Tuning kernel parameters (e.g., net.core.rmem_max, net.core.wmem_max, net.core.netdev_max_backlog) and using a pre‑allocated skb pool dramatically reduces allocation failures.

# Increase network buffers
sysctl -w net.core.rmem_max=16777216
sysctl -w net.core.wmem_max=16777216
sysctl -w net.core.netdev_max_backlog=10000

// Simple skb pool example
#define POOL_SIZE 1024
static struct sk_buff *skb_pool[POOL_SIZE];
static int pool_idx = 0;

static struct sk_buff *alloc_skb_from_pool(unsigned int size, gfp_t gfp)
{
    if (pool_idx > 0)
        return skb_pool[--pool_idx];
    return alloc_skb(size, gfp);
}

static void free_skb_to_pool(struct sk_buff *skb)
{
    if (pool_idx < POOL_SIZE)
        skb_pool[pool_idx++] = skb;
    else
        kfree_skb(skb);
}

6.2 Data loss

Insufficient buffer size leads to truncation. Allocate generous buffers for large frames and validate lengths before processing.

if (skb->len < sizeof(struct ethhdr) + sizeof(struct iphdr)) {
    pr_warn("Packet too short, dropping
");
    kfree_skb(skb);
    return;
}

6.3 Performance bottlenecks

Frequent alloc/free of sk_buffs is expensive. Employ memory pools, fast‑path forwarding, and hardware off‑load features (TSO, GSO, RSS) to minimise CPU work.

// Fast‑path: direct forward without full stack processing
if (is_forward_route(skb)) {
    update_ttl_and_checksum(skb);
    dev_queue_xmit(skb);
    return;
}

Enabling NIC off‑load:

ethtool -K eth0 tso on gso on rss on
memory managementNetwork Stackpacket processingsk_buffkernel networking
Deepin Linux
Written by

Deepin Linux

Research areas: Windows & Linux platforms, C/C++ backend development, embedded systems and Linux kernel, etc.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.