Unveiling Linux Kernel Network Stack: Inside sk_buff, net_device, and Netfilter
This article explains core components of the Linux kernel network stack, including the sk_buff packet buffer, net_device interfaces for physical and virtual NICs, the relationship between socket and sock structures, layer transitions, Netfilter hooks, and routing via dst_entry, providing a comprehensive overview for developers.
Protocol Stack Details
Below we introduce concepts frequently involved in the kernel network protocol stack.
sk_buff
The kernel uses the sk_buff (socket buffer) structure to represent a packet, analogous to the BSD mbuf . The structure itself does not store packet data; it contains pointers to the actual packet memory.
sk_buff traverses the entire protocol stack; each layer adjusts the pointers within the structure as the packet moves.
net_device
The kernel represents network interfaces with net_device . Devices can be physical (real NICs, including those of virtual machines) or virtual (e.g., tun/tap, vxlan, veth pairs). Physical devices have driver code supplied by the hardware vendor, while virtual devices enable tunneling, container networking, and other functions.
socket & sock
User‑space programs use socket(), bind(), listen(), accept() and other library calls for network programming. In the kernel, socket is the structure exposed to user space, while sock is the lower‑level structure used by the protocol stack.
The two structures correspond one‑to‑one; each contains an ops pointer, but of different types ( struct proto_ops for socket, struct proto for sock). The values of these pointers are determined when the structures are created, based on the socket family and type.
#include <sys/socket.h>
sockfd = socket(int socket_family, int socket_type, int protocol);For the common PF_INET family, the socket->ops and sock->ops entries are recorded in the INET protocol switch table:
static struct inet_protosw inetsw_array[] = {
{
.type = SOCK_STREAM,
.protocol = IPPROTO_TCP,
.prot = &tcp_prot, // sock->ops
.ops = &inet_stream_ops, // socket->ops
.flags = INET_PROTOSW_PERMANENT | INET_PROTOSW_ICSK,
},
{
.type = SOCK_DGRAM,
.protocol = IPPROTO_UDP,
.prot = &udp_prot, // sock->ops
.ops = &inet_dgram_ops, // socket->ops
.flags = INET_PROTOSW_PERMANENT,
},
// ...
};L3 → L4
The network stack is logically layered, but in the kernel the layers are implemented as function calls. Outgoing packets follow a direct call chain, while incoming packets use a registration‑callback mechanism because the L3 layer (IP) must dispatch to different L4 handlers (TCP, UDP, ICMP, etc.).
Registration of L4 protocols is done via inet_add_protocol:
int inet_add_protocol(const struct net_protocol *prot, unsigned char protocol);During initialization, TCP and UDP protocols are registered:
static struct net_protocol tcp_protocol = {
// ...
.handler = tcp_v4_rcv,
// ...
};
static struct net_protocol udp_protocol = {
// ...
.handler = udp_rcv,
// ...
};When a packet reaches the IP layer, ip_local_deliver_finish looks up the appropriate L4 handler from the protocol switch table and invokes it:
static int ip_local_deliver_finish(struct net *net, struct sock *sk, struct sk_buff *skb) {
// ...
ipprot = rcu_dereference(inet_protos[protocol]);
// ...
ret = ipprot->handler(skb);
// ...
}L2 → L3
Layer‑2 to Layer‑3 registration uses dev_add_pack:
void dev_add_pack(struct packet_type *pt);For example, the IP protocol registers its packet type:
static struct packet_type ip_packet_type = {
.type = cpu_to_be16(ETH_P_IP),
.func = ip_rcv,
};When a device receives a packet, the kernel sets skb->protocol and dispatches to the corresponding callback:
__netif_receive_skb(struct sk_buff *skb) {
// ...
type = skb->protocol;
// ...
ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
// ...
}Netfilter
Netfilter provides five hook points in the kernel where packets pass. Users can attach iptables rules to filter or modify packets at these hooks. The kernel macro NF_HOOK invokes the registered hooks:
static inline int NF_HOOK(uint8_t pf, unsigned int hook, struct net *net,
struct sock *sk, struct sk_buff *skb,
struct net_device *in, struct net_device *out,
int (*okfn)(struct net *, struct sock *, struct sk_buff *)) {
int ret = nf_hook(pf, hook, net, sk, skb, in, out, okfn);
if (ret == 1)
ret = okfn(net, sk, skb);
return ret;
}dst_entry
The kernel uses the forwarding information base (fib) to decide whether a packet should be locally delivered, forwarded, or sent out. The fib lookup takes a sk_buff as input and returns a dst_entry, which is attached to the skb:
static inline void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst) {
skb->_skb_refdst = (unsigned long)dst;
}A dst_entry contains function pointers for input and output processing:
struct dst_entry {
// ...
int (*input)(struct sk_buff *);
int (*output)(struct net *net, struct sock *sk, struct sk_buff *skb);
// ...
};For locally delivered packets the input pointer is set to ip_local_deliver, for forwarded packets to ip_forward, and for locally generated packets the output pointer is set to ip_output:
rth->dst.input = ip_local_deliver; // local delivery
rth->dst.input = ip_forward; // forwarding
rth->dst.output = ip_output; // local outSigned-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Open Source Linux
Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
