Cloud Computing 7 min read

Investigation and Resolution of Octavia API Slow Response Issue

This article details the background, architecture, step‑by‑step troubleshooting, analysis of network and server queues, and the final configuration changes that resolved the intermittent slow response times of the Octavia load‑balancer API in an OpenStack environment.

360 Smart Cloud

Feb 25, 2021

Investigation and Resolution of Octavia API Slow Response Issue

Octavia provides a high‑availability load‑balancing solution for OpenStack clusters, exposing a REST API that creates VIPs and routes traffic through HAProxy and LVS. In practice, the API response time varied widely from 0.2 s to 50 s, causing VIP creation and query failures.

The service architecture uses keepalived VRRP for HA and multiple Octavia API nodes behind HAProxy. The deployment diagram shows the high‑availability setup and the flow of requests.

Problem Investigation

Packet capture was performed to pinpoint where latency occurred. Custom HTTP headers distinguished requests, and analysis revealed long‑lasting packets with retransmissions, indicating issues beyond the client‑HAProxy hop.

Further analysis showed that the client‑HAProxy interaction was fast; the delay originated between HAProxy and the Octavia API, ruling out HAProxy as the bottleneck.

Hardware checks confirmed no packet loss or network jitter, and the three Octavia API servers showed low load and normal connection counts.

Application‑layer inspection found that Octavia API did not respond with SYN/ACK, prompting a check of port 9876 connections.

netstat -s | grep -i listen  #发现两个数值都在增长
1173805 times the listen queue of a socket overflowed
1175909 SYNs to LISTEN sockets dropped

The growing counters relate to the half‑connection (SYN) queue and the accept queue. The accept queue size is determined by min(backlog, net.core.somaxconn), and overflow triggers packet retransmission based on /proc/sys/net/ipv4/tcp_abort_on_overflow (0 drops ACKs, 1 sends RST).

Changing tcp_abort_on_overflow to 1 caused immediate connection termination, confirming the link between retransmissions and queue overflow.

Solution

The accept queue overflow was due to a small backlog value (5) passed when Octavia API creates its WSGI server via wsgiref.simple_server.make_server. The backlog originates from SocketServer.TCPServer.server_activate which uses a fixed request_queue_size of 5.

Increasing request_queue_size to 128 raised the backlog, eliminating queue overflow. However, response times remained high because Octavia API ran a single‑process WSGI server.

The fix involved replacing the wsgiref server with OpenStack’s oslo_service WSGI server, allowing multiple worker processes (defaulting to CPU count) and a default backlog of 128. After this change, API latency dropped to under 1 second.

Alternatively, deploying Octavia behind Apache httpd with mod_wsgi can improve throughput, as wsgiref is intended only for demonstration and not production use.

References:

https://www.cnblogs.com/Alexkk/p/12101950.html

http://jm.taobao.org/2017/05/25/525-1/

Linux 3.10.0‑957.27.2.el7 kernel source

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Network Troubleshooting API Load Balancer OpenStack Octavia

Written by

360 Smart Cloud

Official service account of 360 Smart Cloud, dedicated to building a high-quality, secure, highly available, convenient, and stable one‑stop cloud service platform.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.