Backend Development 23 min read

How Kernel-Level Content-Based Load Balancing Boosts Server Performance

This article explains the principles and implementation of content‑based request distribution in Linux IPVS and the kernel‑mode KTCPVS, covering TCP gateway vs. migration, scheduling algorithms, high‑availability mechanisms, and performance benefits such as improved cache hit rates and scalability.

Open Source Linux

Jul 15, 2021

How Kernel-Level Content-Based Load Balancing Boosts Server Performance

Preface

Previously we described an IP virtual server (IPVS) built on a Linux virtual server framework that implements three IP load‑balancing techniques, forming a highly scalable and available server cluster. IPVS makes the cluster transparent to clients, who access services as if they were a single high‑performance server, without modifying client programs. Scalability is achieved by transparently adding or removing nodes, and high availability is ensured by detecting node or service failures and resetting the system.

IPVS is essentially a Layer‑4 switch that provides load‑balancing. When an initial TCP SYN arrives, IPVS selects a server and forwards the packet. Subsequent packets are directed to the same server based on IP/TCP header information. Because IPVS cannot inspect request content, all backend servers must provide identical services. However, many deployments have heterogeneous backends (web, image, CGI servers), requiring content‑based request distribution to improve cache locality.

1. Content‑Based Request Distribution

Scheduling TCP connections based on Layer‑7 information is difficult because application data is only available after the three‑way handshake. Therefore, the switch must first accept the client TCP connection, then extract request content to decide the appropriate backend server.

1.1 Methods for Content Scheduling

Two approaches exist: TCP Gateway, where the switch establishes a TCP connection to the backend and proxies traffic, and TCP Migration, where the client‑to‑switch connection is migrated to the backend, allowing a direct client‑to‑server TCP link. TCP Migration requires modifying the TCP/IP stacks of both the switch and all backends, making it costly and non‑portable, whereas TCP Gateway is widely used in commercial and open‑source Layer‑7 switches.

The TCP Gateway incurs high overhead because each request requires four kernel‑user space transitions (receive, forward, receive response, forward response), limiting scalability to a few servers and becoming a bottleneck at high connection rates.

1.2 Example

Requests for the same page are likely to hit the server's cache, improving memory cache usage.

Web traffic exhibits spatial locality; directing similar requests to the same server enhances cache hit rates and overall system performance.

Backends can run different services (document, image, CGI, database).

In a two‑server cluster handling the request sequence AACBAACABC, content‑based scheduling sends all A requests to server 1 and B/C requests to server 2, greatly increasing the chance that needed objects are already cached. Round‑robin distribution would spread A, B, C across both servers, raising cache miss probability.

While content‑based distribution improves cache locality, it may cause load imbalance, so algorithms must also consider load‑balancing.

2. Content‑Based Distribution in the Kernel – KTCPVS

Kernel‑level content‑based distribution cluster KTCPVS

To avoid the overhead of user‑space TCP Gateways, we implement a Layer‑7 switch inside the Linux kernel, called KTCPVS (Kernel TCP Virtual Server).

2.1 KTCPVS Architecture

KTCPVS consists of two parts: the KTCPVS switch, which routes requests to different backends based on content, and the backend servers, which can run various network services. The switch and backends are connected via LAN/WAN.

The KTCPVS switch is transparent to clients; they see a single high‑performance, highly available virtual server.

2.2 Implementation Details

In Linux 2.4, a kernel thread implements the Layer‑7 service and is packaged as a loadable KTCPVS module. The module registers control interfaces in /proc and via setsockopt. The user‑space tool tcpvsadm configures server rules. Content‑based modules (e.g., HTTP, RTSP) can be loaded as needed.

The main thread spawns worker threads that listen on a port, receive client requests, invoke the appropriate content‑based module to select a backend, establish a TCP connection to that backend, forward the request, receive the response, and return it to the client—all within kernel space, eliminating user‑space/kernel switches.

2.3 High Availability

KTCPVS achieves high availability through two mechanisms: server‑failure handling (periodic ARP checks or service‑process health probes) and scheduler‑failure handling (heartbeat or VRRP, similar to IPVS).

3. KTCPVS Scheduling Algorithms

KTCPVS schedules at the granularity of TCP connections. Different connections from the same client may be routed to different servers, helping balance load.

3.1 Weighted Least‑Connection Scheduling

Least‑Connection Scheduling assigns a new connection to the server with the fewest active connections. Weighted Least‑Connection extends this by considering server weights, aiming to keep the ratio of connections to weight proportional.

3.2 Locality‑Aware Least‑Connection Scheduling

This algorithm assumes all backends can handle any request but tries to keep the same request target on the same server to improve cache locality, using high/low connection thresholds to trigger rebalancing only when imbalance is severe.

while (true) {
  get next request r;
  r.target = {extract path from static/dynamic request r};
  if (server[r.target] == NULL) {
    n = {least connection node};
    server[r.target] = n;
  } else {
    n = server[r.target];
    if (n.conns > n.high && a node with node.conns < node.low ||
        n.conns >= 2*n.high) {
      n = {least connection node};
      server[r.target] = n;
    }
  }
  if (r is dynamic request) n.conns += 2; else n.conns += 1;
  send r to n and return results to the client;
  if (r is dynamic request) n.conns -= 2; else n.conns -= 1;
}

3.3 Content‑Based Scheduling

Requests are dispatched to servers capable of handling their type. If multiple servers can serve the same target, the least‑loaded one is chosen. The algorithm also maintains a set of servers per target and periodically removes the most loaded server when the set has been stable for a configurable interval (default 60 s).

while (true) {
  get next request r;
  extract path from request and set r.target;
  if (definedServerSet[r.target] == ∅) {
    n = {least connection node in defaultServerSet};
  } else {
    n = {least connection node in definedServerSet[r.target]};
  }
  send r to n and return results to the client;
}

These algorithms balance load while preserving cache locality, providing a flexible kernel‑level solution for content‑aware load balancing.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

high availability load balancing Networking IPVS content-based scheduling KTCPVS

Written by

Open Source Linux

Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Table of Contents

Preface

1. Content‑Based Request Distribution

1.1 Methods for Content Scheduling

1.2 Example

2. Content‑Based Distribution in the Kernel – KTCPVS

2.1 KTCPVS Architecture

2.2 Implementation Details

2.3 High Availability

3. KTCPVS Scheduling Algorithms

3.1 Weighted Least‑Connection Scheduling

3.2 Locality‑Aware Least‑Connection Scheduling

3.3 Content‑Based Scheduling

Open Source Linux

How this landed with the community

Was this worth your time?

0 Comments