Investigation of the "Too many open files" Error in Tomcat with Apollo Configuration Center
This article analyzes a production incident where a Java web application using Apollo configuration center encountered "Too many open files" errors, detailing the fault symptoms, root cause analysis involving Tomcat's classloader and file‑descriptor limits, and presenting remediation and preventive measures.
The author, a technical expert from Ctrip's framework R&D department, describes a recurring "Too many open files" error observed on Linux when a Java web application integrated with Apollo configuration center modified its configuration in production.
After the configuration change, the application began to throw massive errors because Redis connections failed, as shown by the stack trace:
Caused by: redis.clients.jedis.exceptions.JedisConnectionException: Could not get a resource from the pool Caused by: java.net.SocketException: Too many open filesInitial investigation revealed that only 5 out of 20 machines successfully received the configuration notification, while the remaining 15 reported NoClassDefFoundError for com.ctrip.framework.apollo.model.ConfigChange . The author hypothesized that the shortage of file descriptors prevented the JVM from loading the required JAR files.
Further analysis identified the root cause: the process limit Max Open Files was set to 4096 on many machines (instead of the intended 65536). When the configuration change triggered Tomcat's WebappClassLoader to load a previously unused class, it opened all dependent JAR files at once, quickly exhausting the file‑descriptor limit and causing both the NoClassDefFoundError and Redis connection failures.
Empirical verification was performed using lsof commands. Immediately after a configuration push, the number of open file handles jumped from ~192 to ~422, with ~228 handles belonging to WEB-INF/lib JAR files. After about 30 seconds, Tomcat's background thread released the handles, returning the count to ~194.
The article also documents the class‑loading mechanism of Tomcat 7.0.72, showing how it initially opens all JAR files, searches for the required class, and later closes the files via a periodic cleanup thread.
Based on the findings, the author proposes several optimization measures:
Increase the operating‑system Max Open Files limit for production services.
Enhance application monitoring and alerting for connection counts and file‑descriptor usage.
Initialize middleware clients early at startup to preload classes and avoid on‑the‑fly loading.
When incidents occur, retain a problematic instance for post‑mortem analysis instead of immediate restart.
In summary, the incident was caused by an insufficient file‑descriptor limit combined with Tomcat's class‑loader behavior, leading to cascading failures in Redis connectivity and service availability.
Ctrip Technology
Official Ctrip Technology account, sharing and discussing growth.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.