Demystifying Linux Kernel Networking: Inside sk_buff, net_device, and Netfilter
This article explains core Linux kernel networking concepts such as the sk_buff packet buffer, net_device representation of NICs, the relationship between socket and sock structures, layer registration mechanisms, and how Netfilter and routing tables process packets within the stack.
sk_buff
The kernel uses the sk_buff (socket buffer) structure to represent a packet, analogous to the BSD mbuf . It does not store packet data itself; instead it contains pointers to the actual memory holding the packet. Because it is passed through every layer of the protocol stack, the kernel only needs to adjust these pointers when moving the packet between layers.
net_device
The kernel represents a network interface with a net_device structure. Physical NICs can transmit packets from the host, while virtual NICs (e.g., tun/tap, vxlan, veth pairs) are used for tunnels and container communication. Each NIC has two ends: one connected to the protocol stack (IP/TCP/UDP) and the other to either a driver (physical) or a virtual implementation.
socket & sock
User‑space programs use socket(), bind(), listen(), accept(), etc. Inside the kernel there are two related structures: socket (exposed to user space) and sock (used by the protocol stack). Both contain an ops pointer, but socket->ops points to struct proto_ops while sock->ops points to struct proto. The values of these pointers are determined by the socket_family and socket_type arguments.
#include <sys/socket.h>
sockfd = socket(int socket_family, int socket_type, int protocol);For the common PF_INET family, the kernel records the corresponding operations in the INET protocol switch table:
static struct inet_protosw inetsw_array[] = {
{
.type = SOCK_STREAM,
.protocol = IPPROTO_TCP,
.prot = &tcp_prot, // sock->ops
.ops = &inet_stream_ops, // socket->ops
.flags = INET_PROTOSW_PERMANENT | INET_PROTOSW_ICSK,
},
{
.type = SOCK_DGRAM,
.protocol = IPPROTO_UDP,
.prot = &udp_prot, // sock->ops
.ops = &inet_dgram_ops, // socket->ops
.flags = INET_PROTOSW_PERMANENT,
},
// ...
};L3 → L4
The kernel’s protocol stack is logically layered, but implementation is a series of function calls. Sending is a direct call downwards, while receiving uses a registration‑callback mechanism. Protocols register themselves with the kernel, for example:
int inet_add_protocol(const struct net_protocol *prot, unsigned char protocol);During initialization, L4 protocols such as TCP and UDP are registered:
static struct net_protocol tcp_protocol = {
// ...
.handler = tcp_v4_rcv,
// ...
};
static struct net_protocol udp_protocol = {
// ...
.handler = udp_rcv,
// ...
};When the IP layer receives a packet destined for the local host, it looks up the appropriate L4 protocol and invokes its handler:
static int ip_local_deliver_finish(struct net *net, struct sock *sk, struct sk_buff *skb) {
// ...
ipprot = rcu_dereference(inet_protos[protocol]);
// ...
ret = ipprot->handler(skb);
// ...
}L2 → L3
The same registration pattern applies between L2 and L3. Protocols register a packet_type structure:
void dev_add_pack(struct packet_type *pt);For example, IP registers itself as follows:
static struct packet_type ip_packet_type = {
.type = cpu_to_be16(ETH_P_IP),
.func = ip_rcv,
};When a device driver receives a frame, it sets skb->protocol. The core function netif_receive_skb then dispatches the packet based on this protocol:
__netif_receive_skb(struct sk_buff *skb) {
// ...
type = skb->protocol;
// ...
ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
// ...
}Netfilter
Every packet traverses Netfilter hooks placed at five points in the kernel. Users can attach iptables rules to these hooks to filter or modify packets. The kernel invokes the macro NF_HOOK at each hook:
static inline int NF_HOOK(uint8_t pf, unsigned int hook, struct net *net, struct sock *sk,
struct sk_buff *skb, struct net_device *in, struct net_device *out,
int (*okfn)(struct net *, struct sock *, struct sk_buff *)) {
int ret = nf_hook(pf, hook, net, sk, skb, in, out, okfn);
if (ret == 1)
ret = okfn(net, sk, skb);
return ret;
}st_entry (Routing and Destination)
The kernel decides whether a packet should be locally delivered, forwarded, or sent out. This decision uses the Forwarding Information Base (FIB), which acts like a routing database. A lookup takes the sk_buff as input and returns a dst_entry that is attached to the skb:
static inline void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst) {
skb->_skb_refdst = (unsigned long)dst;
}The dst_entry contains function pointers for input and output processing:
struct dst_entry {
// ...
int (*input)(struct sk_buff *);
int (*output)(struct net *net, struct sock *sk, struct sk_buff *skb);
// ...
};For locally delivered packets the kernel sets:
rth->dst.input = ip_local_deliver;For forwarded packets:
rth->dst.input = ip_forward;For packets originating from the host:
rth->dst.output = ip_output;MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
