Cloud Computing 11 min read

How Session‑Based Acceleration Cuts Cloud Connect Latency from 500 ms to 1 ms

This article examines the Cloud Connect Network gateway architecture, identifies performance bottlenecks in the V1 packet‑forwarding process, proposes a session‑based optimization that reduces query complexity from O(n) to O(1), and validates the improvement with latency tests showing a drop from 500 ms to about 1 ms.

360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
How Session‑Based Acceleration Cuts Cloud Connect Latency from 500 ms to 1 ms

Project Background

Cloud Connect Network (CCN) provides a fast, high‑quality, stable network capability for building cross‑region VPC and on‑premise data‑center interconnections, enabling a global enterprise‑grade cloud network. By creating cloud connections, users can load VPC instances or on‑premise IP resources into a CCN instance for global connectivity.

Existing Problems

As business moves to the cloud, VPC‑to‑VPC traffic increases, and the CCN gateway handles both cross‑region and intra‑region traffic, exposing performance pressure.

CCN Architecture

The gateway consists of a control plane (exposes APIs for creating CCSI rules) and a forwarding plane built on DPDK. Packets are matched against CCSI rules, then encapsulated in VXLAN and sent to the appropriate node. Key concepts include CCSI instance, subnet route, and next‑hop IP.

Cloud network architecture diagram
Cloud network architecture diagram

The current deployment spans Beijing, Zhengzhou, and Shanghai with high‑availability, scalable gateways. When traffic reaches ~20% of a 100 Gbps NIC’s capacity, latency spikes and packet loss occur, affecting service stability.

Optimization Proposal

Introducing a session module creates a session acceleration table based on source‑destination IPs, reducing the number of queries per packet. After the first packet establishes a session, subsequent packets between the same IP pair hit the session table, lowering complexity from O(n) to O(1).

Optimized CCN architecture (V2)
Optimized CCN architecture (V2)

V2 Forwarding Process

a) Check if a session exists for the source‑destination IP.

b) If found, use the session’s stored information to encapsulate and forward the VXLAN packet.

c) If not found, follow the V1 flow, then create a new session.

This additional step eliminates most of the six‑step query chain in V1, dramatically improving forwarding efficiency.

V2 forwarding flow diagram
V2 forwarding flow diagram

Session Creation

struct dp_vs_ccn_conn {
    struct list_head        list;
    union inet_addr         srcaddr;  /* Src address */
    union inet_addr         dstaddr;  /* Dst address */
    uint8_t                 src_plen;
    uint8_t                 dst_plen;
    union inet_addr         src_net;
    union inet_addr         dst_net;
    union inet_addr         vtep_ip;
    uint32_t                src_vxlan_id;
    uint32_t                dst_vxlan_id;
    uint64_t                timestamp;
    char                    ccsiid[CCN_ID_LEN];
    struct dp_vs_vport_entry *ccn_vport;
    struct rte_mempool      *connpool;
}_rte_cache_aligned;

When the first packet passes the CCN hook, the six‑step rule matching records all relevant information (source/destination IP, VXLAN ID, etc.) into a dp_vs_ccn_conn structure. This structure is then hashed by source and destination IP and stored in a per‑CPU session hash table.

Session Cleanup

Sessions must be removed to avoid memory leaks. Cleanup occurs either when policies change (e.g., subnets are no longer connected or a CCSI instance is deleted) or when a session remains idle for more than 180 seconds, detected by a timer that scans the session table.

Performance Test Results

Test Environment

Three identical physical machines serve as client, server, and gateway.

Test environment diagram
Test environment diagram

Latency Comparison (25 Gbps NIC Full Load)

Before optimization (V1): DPDK port latency rose to ~500 ms with significant packet loss.

V1 latency chart
V1 latency chart

After optimization (V2): DPDK port latency stabilized around 1 ms with no packet loss.

V2 latency chart
V2 latency chart

Latency Under Different Session Loads

Latency vs. session entries
Latency vs. session entries

Future Optimizations

The high‑performance CCN version has been running stably across three regions for over two months. Future work includes making session handling stateful so that inbound and outbound packets are processed on the same CPU, reducing duplicate session creation, and adding redirect mechanisms similar to load‑balancer stateful sessions. Monitoring tools are being enhanced to visualize per‑CPU session load for easier troubleshooting.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

cloud networkingDPDKSession ManagementVXLAN
360 Zhihui Cloud Developer
Written by

360 Zhihui Cloud Developer

360 Zhihui Cloud is an enterprise open service platform that aims to "aggregate data value and empower an intelligent future," leveraging 360's extensive product and technology resources to deliver platform services to customers.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.