Databases 10 min read

Analyzing and Resolving OceanBase Connection Timeout via OBProxy Logs and Packet Capture

This article walks through a production‑level OceanBase connection‑timeout incident, detailing how to examine OBProxy logs, capture and analyze network packets with tcpdump and Wireshark, identify a blocked random port, and apply kernel‑parameter fixes to prevent the issue.

Aikesheng Open Source Community
Aikesheng Open Source Community
Aikesheng Open Source Community
Analyzing and Resolving OceanBase Connection Timeout via OBProxy Logs and Packet Capture

Compared with single‑node databases like MySQL, OceanBase has a longer access chain (Application → VIP → 3‑node OBProxy → 3‑node OBServer). When connection timeouts occur, troubleshooting requires additional steps. The following case study demonstrates how to capture packets and analyze an OceanBase application connection timeout.

1 Problem Description

The production environment uses a VIP managed by keepalived, OBServer 4.2.1.1 and OBProxy 4.2.2.0. The application intermittently reports a MySQL‑style timeout error:

pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on 'xx.xx.xx.9' (timed out)")

Logs show the error occurring 1‑2 times per day.

2 Analysis Process

2.1 OBProxy Log Analysis

Idea 1: The VIP is bound to OBProxy node xx.xx.xx.12, so all traffic to xx.xx.xx.9:3306 actually goes through xx.xx.xx.12. Therefore, inspect obproxy.log on that node.

Idea 2: Determine whether the problem lies in the front‑end (application → OBProxy) or the back‑end (OBProxy → OBServer) connection.

Check if the front‑end connection handling is abnormal.

If the front‑end is normal, examine the back‑end connection establishment.

Using the available log slice (03‑11 08:14), the following searches were performed:

Filter for VC_EVENT_EOS to detect abnormal front‑end disconnects – none found.

grep VC_EVENT_EOS obproxy.log.* | egrep -i 'tenant_name=[a-z].*'

Extract client IP:PORT for requests from xx.xx.xx.12 to xx.xx.xx.9:3306 at the target time.

grep 'xx.xx.xx.9:3306' obproxy.log.20240311084410 | egrep '2024-03-11 08:14:[0-2].*' | egrep 'caddr={xx.xx.xx.12' | awk -F'caddr={' '{print $2}' | awk -F'}' '{print $1}'

Filter for succ to set proxy_sessid events, confirming successful back‑end connections.

grep 'succ to set proxy_sessid' obproxy.log.20240311084410 | egrep '2024-03-11 08:14:[0-2].*' | awk -F'client_addr=' '{print $2}' | awk -F'"' '{print $2}' | grep 'xx.xx.xx.12'

The logs indicate that OBProxy handled connections correctly, suggesting the timeout is likely network‑related. The next step is packet capture.

2.2 Packet Capture

Because the issue is intermittent, a long‑running capture is needed with file‑size limits and filtering:

tcpdump -X -s 0 -C 50 -W 500 -Z root -i lo -w /tmp/cap/obproxy.cap host xx.xx.xx.9

Key parameters:

Capture on the loopback interface ( -i lo ) since the app and OBProxy share the same host.

Write to /tmp/cap/obproxy.cap .

Rotate files at 50 MB ( -C 50 ) and keep up to 500 files ( -W 500 ).

Filter only traffic to the VIP ( host xx.xx.xx.9 ).

Analysis of the first capture (03‑09 15:52:57) revealed repeated SYN retransmissions from the client port 4232, while the server sent SYN+ACK packets that never reached the client, leading to a 10‑second timeout matching the Python connector's default connection_timeout .

Client‑Side View

15:52:47 – SYN to xx.xx.xx.9:3306 (packet 8359).

15:52:48 – Retransmitted SYN (packet 8517) marked as TCP Retransmission.

15:52:50 – Another retransmission (packet 9075).

15:52:54 – Another retransmission (packet 10140).

Server‑Side View

15:52:47 – SYN+ACK to client port 4232 (packet 8360).

15:52:48 – Retransmitted SYN+ACK (packet 8517).

ICMP “Destination unreachable (Port unreachable)” messages indicated that the server could not deliver packets to the client’s random port 4232.

3 Second Capture (03‑22)

A similar packet pattern was captured 13 days later, confirming that the same random client port (4232) was blocked by the network.

4 Conclusion

The network policy prohibited external access to port 4232, causing the client’s SYN packets to be retransmitted and eventually time out because the server’s responses were dropped.

5 Solution

Restrict the local random port range to avoid the blocked port by setting the kernel parameter ip_local_port_range :

sysctl -w net.ipv4.ip_local_port_range="10000 60999"

Note: OBServer initialization may set net.ipv4.ip_local_port_range = 3500 65535 , so ensure network policies accommodate this range.

Databasenetwork troubleshootingTCPLinuxpacket captureOceanBaseOBproxy
Aikesheng Open Source Community
Written by

Aikesheng Open Source Community

The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.