How Linux Connection Tracking (Conntrack) Works: Design, Implementation, and NAT Integration
This article explains the principles, use cases, and kernel implementation of Linux connection tracking (conntrack), covering its role in NAT, L4 load balancing, Netfilter hook mechanisms, key data structures, core functions, and performance considerations.
1 Introduction
Connection tracking (conntrack, CT) is the foundation for many network features such as Kubernetes Services, Service‑Mesh sidecars, LVS/IPVS load balancers, Docker networking, OVS, and iptables firewalls. It records the state of each connection so higher‑level functions like NAT can operate.
1.1 Concept
Conntrack tracks and records the state of connections. For a Linux host with IP 10.1.1.2 you can see three connections: outbound HTTP (port 80), inbound FTP (port 21), and outbound DNS (port 53). Conntrack extracts a tuple from each packet, maintains a state database (the conntrack table), recycles expired entries, and provides information to other modules such as NAT.
Note that the “connection” in conntrack is not the same as the TCP/IP notion of a connection‑oriented flow; a conntrack entry represents a unidirectional flow identified by a tuple, and protocols like UDP and even ICMP can have entries.
1.2 NAT (Network Address Translation)
NAT rewrites the IP+Port of packets. For example, a private address 192.168.x.x cannot be reached from the Internet, so the source IP is replaced with the host’s public IP 10.1.1.2 before sending, and the reverse translation is applied to return packets. Docker’s default bridge network uses this principle.
NAT types include:
SNAT – translate source address
DNAT – translate destination address
Full NAT – translate both source and destination
Conntrack provides the essential state information that NAT relies on.
1.3 L4 Load Balancing (L4LB)
L4LB distributes traffic based on Layer‑4 fields (src/dst IP, src/dst port, protocol). A virtual IP (VIP) aggregates multiple real backend IPs; packets arrive at the VIP, the load‑balancing algorithm selects a backend, and NAT may be applied to rewrite addresses.
2 Netfilter Hook Mechanism
Netfilter implements conntrack and NAT as kernel modules. The main hook points are:
#define NF_IP_PRE_ROUTING 0
#define NF_IP_LOCAL_IN 1
#define NF_IP_FORWARD 2
#define NF_IP_LOCAL_OUT 3
#define NF_IP_POST_ROUTING 4
#define NF_IP_NUMHOOKS 5Handlers can be registered at each hook with a priority. Hook return values include NF_DROP, NF_ACCEPT, NF_STOLEN, NF_QUEUE, and NF_REPEAT.
2.1 Netfilter Framework
Five hook points allow packet interception, filtering, or modification. Users can register custom handlers that receive each packet at the appropriate stage.
Hook mechanisms simply place detection points on the packet’s mandatory path; each packet must pass through them and be processed according to the handler’s verdict.
3 Netfilter Conntrack Implementation
3.1 Key Structures and Functions
struct nf_conntrack_tuple– defines a tuple (source/destination IP, ports, protocol, etc.) struct nf_conntrack_l4proto – per‑protocol method set (e.g., pkt_to_tuple()) struct nf_conntrack_tuple_hash – entry in the conntrack hash table struct nf_conn – represents a flow (connection) with status bits, timeout, and reference count
Important functions: hash_conntrack_raw() – computes a 32‑bit hash from a tuple nf_conntrack_in() – core entry point for incoming packets resolve_normal_ct() → init_conntrack() → l4proto->new() – creates a new conntrack entry nf_conntrack_confirm() – moves an entry from the unconfirmed list to the confirmed list after the packet passes the final hook
3.2 Tuple Details
A tuple uniquely identifies a flow. For IPv4 UDP the five‑tuple consists of: dst.protonum – protocol number src.u3.ip – source IP dst.u3.ip – destination IP src.u.udp.port – source port dst.u.udp.port – destination port
Conntrack currently supports six protocols: TCP, UDP, ICMP, DCCP, SCTP, and GRE. ICMP entries are created using the ICMP type and code fields.
3.3 L4 Protocol Method Set
Each trackable protocol implements struct nf_conntrack_l4proto methods such as pkt_to_tuple(), packet(), new(), and error().
struct nf_conntrack_l4proto {
bool (*pkt_to_tuple)(struct sk_buff *skb, ...);
int (*packet)(struct nf_conn *ct, const struct sk_buff *skb, ...);
bool (*new)(struct nf_conn *ct, const struct sk_buff *skb, unsigned int dataoff);
int (*error)(struct net *net, struct nf_conn *tmpl, struct sk_buff *skb, ...);
...
};3.4 Conntrack Hash Entry
Each flow has two hash entries (original and reply direction). The hash key is derived from the tuple via hash_conntrack_raw().
static u32 hash_conntrack_raw(struct nf_conntrack_tuple *tuple, struct net *net) {
get_random_once(&nf_conntrack_hash_rnd, sizeof(nf_conntrack_hash_rnd));
u32 seed = nf_conntrack_hash_rnd ^ net_hash_mix(net);
unsigned int n = (sizeof(tuple->src) + sizeof(tuple->dst.u3)) / sizeof(u32);
return jhash2((u32 *)tuple, n, seed ^ ((tuple->dst.u.all << 16) | tuple->dst.protonum));
}3.5 Connection Structure
struct nf_connholds the flow’s status bits (e.g., IPS_CONFIRMED, IPS_SRC_NAT), timeout, reference count, and pointers to protocol‑specific data.
enum ip_conntrack_status {
IPS_EXPECTED = (1 << IPS_EXPECTED_BIT),
IPS_SEEN_REPLY = (1 << IPS_SEEN_REPLY_BIT),
IPS_CONFIRMED = (1 << IPS_CONFIRMED_BIT),
IPS_SRC_NAT = (1 << IPS_SRC_NAT_BIT),
IPS_DST_NAT = (1 << IPS_DST_NAT_BIT),
...
};3.6 nf_conntrack_in()
The function processes each packet:
Retrieve any existing conntrack record.
Determine whether the packet needs tracking; if not, increment the ignore counter and return NF_ACCEPT.
Extract L4 header information and obtain the protocol‑specific l4proto methods.
Run error() to validate the packet.
Call resolve_normal_ct() which creates a new entry or updates an existing one.
Invoke the protocol’s packet() method for further handling (e.g., timeout updates for UDP).
unsigned int nf_conntrack_in(struct net *net, u_int8_t pf, unsigned int hooknum, struct sk_buff *skb) {
struct nf_conn *tmpl = nf_ct_get(skb, &ctinfo);
if (tmpl || ctinfo == IP_CT_UNTRACKED) {
if ((tmpl && !nf_ct_is_template(tmpl)) || ctinfo == IP_CT_UNTRACKED) {
NF_CT_STAT_INC_ATOMIC(net, ignore);
return NF_ACCEPT;
}
skb->_nfct = 0;
}
struct nf_conntrack_l4proto *l4proto = __nf_ct_l4proto_find(...);
if (l4proto->error && l4proto->error(net, tmpl, skb, dataoff, pf, hooknum) <= 0) {
NF_CT_STAT_INC_ATOMIC(net, error);
NF_CT_STAT_INC_ATOMIC(net, invalid);
goto out;
}
resolve_normal_ct(net, tmpl, skb, ...);
l4proto->packet(ct, skb, dataoff, ctinfo);
if (ctinfo == IP_CT_ESTABLISHED_REPLY && !test_and_set_bit(IPS_SEEN_REPLY_BIT, &ct->status))
nf_conntrack_event_cache(IPCT_REPLY, ct);
out:
if (tmpl)
nf_ct_put(tmpl);
}3.7 init_conntrack()
When a new flow is detected, init_conntrack() allocates a struct nf_conn, calls the protocol’s new() method, handles expectations, and inserts the entry into the unconfirmed list.
static struct nf_conntrack_tuple_hash *init_conntrack(struct net *net, struct nf_conn *tmpl,
const struct nf_conntrack_tuple *tuple,
const struct nf_conntrack_l4proto *l4proto,
struct sk_buff *skb, unsigned int dataoff, u32 hash) {
struct nf_conn *ct = __nf_conntrack_alloc(net, zone, tuple, &repl_tuple, GFP_ATOMIC, hash);
l4proto->new(ct, skb, dataoff);
local_bh_disable();
nf_conntrack_get(&ct->ct_general);
nf_ct_add_to_unconfirmed_list(ct);
local_bh_enable();
return &ct->tuplehash[IP_CT_DIR_ORIGINAL];
}3.8 nf_conntrack_confirm()
After the packet passes the final hook (POST_ROUTING for outbound or LOCAL_IN for inbound), nf_conntrack_confirm() moves the entry to the confirmed list, sets IPS_CONFIRMED, updates the timeout, and triggers events.
static inline int nf_conntrack_confirm(struct sk_buff *skb) {
struct nf_conn *ct = (struct nf_conn *)skb_nfct(skb);
int ret = NF_ACCEPT;
if (ct && !nf_ct_is_confirmed(ct))
ret = __nf_conntrack_confirm(skb);
if (likely(ret == NF_ACCEPT))
nf_ct_deliver_cached_events(ct);
return ret;
}4 Netfilter NAT Implementation
4.1 Core Data Structures
Each NAT‑capable protocol implements struct nf_nat_l3proto and struct nf_nat_l4proto method sets (e.g., manip_pkt(), unique_tuple()).
struct nf_nat_l4proto {
u8 l4proto;
bool (*manip_pkt)(struct sk_buff *skb, ...);
void (*unique_tuple)(...);
int (*nlattr_to_range)(struct nlattr *tb[], struct nf_nat_range2 *range);
};4.2 nf_nat_inet_fn()
This function is invoked at PRE_ROUTING, POST_ROUTING, LOCAL_OUT, and LOCAL_IN (all except FORWARD). It looks up the conntrack entry; if none exists, NAT cannot be applied. When a conntrack entry is present, it retrieves the applicable NAT rules and finally calls nf_nat_packet() to perform address/port rewriting.
unsigned int nf_nat_inet_fn(void *priv, struct sk_buff *skb, const struct nf_hook_state *state) {
struct nf_conn *ct = nf_ct_get(skb, &ctinfo);
if (!ct) return NF_ACCEPT;
switch (ctinfo) {
case IP_CT_NEW:
case IP_CT_RELATED:
case IP_CT_RELATED_REPLY:
if (!nf_nat_initialized(ct, maniptype)) {
// apply NAT rules
if (nf_nat_packet(ct, ctinfo, state->hook, skb) != NF_ACCEPT)
return NF_DROP;
}
break;
default: // ESTABLISHED
if (nf_nat_oif_changed(state->hook, ctinfo, nat, state->out))
return NF_DROP;
break;
}
return nf_nat_packet(ct, ctinfo, state->hook, skb);
}4.3 nf_nat_packet()
Based on the direction and manipulation type (source or destination), this function selects the appropriate L3/L4 protocol handlers and invokes manip_pkt(). If the manipulation fails, the packet is dropped.
unsigned int nf_nat_packet(struct nf_conn *ct, enum ip_conntrack_info ctinfo,
unsigned int hooknum, struct sk_buff *skb) {
enum nf_nat_manip_type mtype = HOOK2MANIP(hooknum);
enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo);
unsigned int statusbit = (mtype == NF_NAT_MANIP_SRC) ? IPS_SRC_NAT : IPS_DST_NAT;
if (dir == IP_CT_DIR_REPLY) statusbit ^= IPS_NAT_MASK;
if (ct->status & statusbit)
return nf_nat_manip_pkt(skb, ct, mtype, dir);
return NF_ACCEPT;
}5 Summary
Connection tracking is a core Linux kernel subsystem that records the state of network flows, enabling NAT, load balancing, and firewalling. Its design relies on Netfilter hook points, a hash‑based conntrack table, and per‑protocol method sets. While essential, conntrack can become a bottleneck in high‑concurrency L4 load‑balancing scenarios if the table is undersized or garbage collection is insufficient.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Liangxu Linux
Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
