Analyzing and Resolving OceanBase Connection Timeout via OBProxy Logs and Packet Capture
This article walks through a production‑level OceanBase connection‑timeout incident, detailing how to examine OBProxy logs, capture and analyze network packets with tcpdump and Wireshark, identify a blocked random port, and apply kernel‑parameter fixes to prevent the issue.
Compared with single‑node databases like MySQL, OceanBase has a longer access chain (Application → VIP → 3‑node OBProxy → 3‑node OBServer). When connection timeouts occur, troubleshooting requires additional steps. The following case study demonstrates how to capture packets and analyze an OceanBase application connection timeout.
1 Problem Description
The production environment uses a VIP managed by keepalived, OBServer 4.2.1.1 and OBProxy 4.2.2.0. The application intermittently reports a MySQL‑style timeout error:
pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on 'xx.xx.xx.9' (timed out)")Logs show the error occurring 1‑2 times per day.
2 Analysis Process
2.1 OBProxy Log Analysis
Idea 1: The VIP is bound to OBProxy node xx.xx.xx.12, so all traffic to xx.xx.xx.9:3306 actually goes through xx.xx.xx.12. Therefore, inspect obproxy.log on that node.
Idea 2: Determine whether the problem lies in the front‑end (application → OBProxy) or the back‑end (OBProxy → OBServer) connection.
Check if the front‑end connection handling is abnormal.
If the front‑end is normal, examine the back‑end connection establishment.
Using the available log slice (03‑11 08:14), the following searches were performed:
Filter for VC_EVENT_EOS to detect abnormal front‑end disconnects – none found.
grep VC_EVENT_EOS obproxy.log.* | egrep -i 'tenant_name=[a-z].*'Extract client IP:PORT for requests from xx.xx.xx.12 to xx.xx.xx.9:3306 at the target time.
grep 'xx.xx.xx.9:3306' obproxy.log.20240311084410 | egrep '2024-03-11 08:14:[0-2].*' | egrep 'caddr={xx.xx.xx.12' | awk -F'caddr={' '{print $2}' | awk -F'}' '{print $1}'Filter for succ to set proxy_sessid events, confirming successful back‑end connections.
grep 'succ to set proxy_sessid' obproxy.log.20240311084410 | egrep '2024-03-11 08:14:[0-2].*' | awk -F'client_addr=' '{print $2}' | awk -F'"' '{print $2}' | grep 'xx.xx.xx.12'The logs indicate that OBProxy handled connections correctly, suggesting the timeout is likely network‑related. The next step is packet capture.
2.2 Packet Capture
Because the issue is intermittent, a long‑running capture is needed with file‑size limits and filtering:
tcpdump -X -s 0 -C 50 -W 500 -Z root -i lo -w /tmp/cap/obproxy.cap host xx.xx.xx.9Key parameters:
Capture on the loopback interface ( -i lo ) since the app and OBProxy share the same host.
Write to /tmp/cap/obproxy.cap .
Rotate files at 50 MB ( -C 50 ) and keep up to 500 files ( -W 500 ).
Filter only traffic to the VIP ( host xx.xx.xx.9 ).
Analysis of the first capture (03‑09 15:52:57) revealed repeated SYN retransmissions from the client port 4232, while the server sent SYN+ACK packets that never reached the client, leading to a 10‑second timeout matching the Python connector's default connection_timeout .
Client‑Side View
15:52:47 – SYN to xx.xx.xx.9:3306 (packet 8359).
15:52:48 – Retransmitted SYN (packet 8517) marked as TCP Retransmission.
15:52:50 – Another retransmission (packet 9075).
15:52:54 – Another retransmission (packet 10140).
Server‑Side View
15:52:47 – SYN+ACK to client port 4232 (packet 8360).
15:52:48 – Retransmitted SYN+ACK (packet 8517).
ICMP “Destination unreachable (Port unreachable)” messages indicated that the server could not deliver packets to the client’s random port 4232.
3 Second Capture (03‑22)
A similar packet pattern was captured 13 days later, confirming that the same random client port (4232) was blocked by the network.
4 Conclusion
The network policy prohibited external access to port 4232, causing the client’s SYN packets to be retransmitted and eventually time out because the server’s responses were dropped.
5 Solution
Restrict the local random port range to avoid the blocked port by setting the kernel parameter ip_local_port_range :
sysctl -w net.ipv4.ip_local_port_range="10000 60999"Note: OBServer initialization may set net.ipv4.ip_local_port_range = 3500 65535 , so ensure network policies accommodate this range.
Aikesheng Open Source Community
The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.