Why MySQL getUiToken Times Out: PhantomReference GC Bottleneck & Fix
The article investigates frequent timeout alerts on the getUiToken API, tracing the issue to excessive PhantomReference objects causing long GC pauses, explains how MySQL’s AbandonedConnectionCleanupThread and default Druid pool settings generate these references, and presents configuration tweaks, JVM flags, and cleanup tasks that reduced GC pause times from 1.25 s to 0.1 s.
Background
Online service frequently raised timeout alerts (1 s) for the getUiToken API, with 4037 occurrences of error code “-1”. The failure rate was 0 % but response times showed long tails (TP99 = 1024 ms, max = 1280 ms).
Environment Information
Server: Linux 4‑core 8 GB
JVM parameters:
-server -Djava.library.path=/usr/local/lib -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/export/log -Djava.awt.headless=true -Dsun.net.client.defaultConnectTimeout=60000 -Dsun.net.client.defaultReadTimeout=60000 -Djmagick.systemclassloader=no -Dnetworkaddress.cache.ttl=300 -Dsun.net.inetaddr.ttl=300 -Xms5G -Xmx5G -XX:+UseG1GC -XX:G1HeapRegionSize=4m -Xloggc:/export/log/${APP_NAME}/gc_detail.log -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=10m -XX:MaxTenuringThreshold=15 -XX:+PrintTenuringDistribution -XX:+PrintHeapAtGCInterface traffic:
Problem Investigation
Code analysis showed getUiToken only reads memory data, so the delay is not from business logic. Monitoring with SGM revealed that most of the latency occurs while waiting, suggesting GC pauses.
GC logs indicated a high number of Young GC events (4227) with occasional long pauses (max 1.25 s). Detailed GC logs showed that Ref‑Proc phase spent ~977 ms processing Ref objects.
Ref‑proc handles soft, weak, phantom, final, JNI references.
Further inspection identified many PhantomReference objects, especially ConnectionPhantomReference created by MySQL’s driver.
public class NonRegisteringDriver implements java.sql.Driver {
protected static final ConcurrentHashMap<ConnectionPhantomReference, ConnectionPhantomReference> connectionPhantomRefs = new ConcurrentHashMap<>();
protected static final ReferenceQueue<ConnectionImpl> refQueue = new ReferenceQueue<>();
protected static void trackConnection(Connection newConn) {
ConnectionPhantomReference phantomRef = new ConnectionPhantomReference((ConnectionImpl) newConn, refQueue);
connectionPhantomRefs.put(phantomRef, phantomRef);
}
}The default Druid datasource configuration creates a new connection each time, adding entries to connectionPhantomRefs. Because the pool’s idle timeout is 30 minutes and minIdle is 0, connections are continuously created, inflating the phantom reference set and lengthening Ref‑Proc.
Solutions
Adjust Druid pool settings: increase minimum idle connections and connection lifetime (e.g., minIdle = 4, maxActive = 10, minEvictableIdleTimeMillis ≈ 3540000 ms) and extend MySQL wait_timeout to 59 minutes.
Enable parallel reference processing: add JVM flag -XX:+ParallelRefProcEnabled.
Periodically clean the phantom reference map when its size exceeds a threshold:
@Component
public class CleanPhantomRefsSchedule {
private static final ScheduledExecutorService CLEANER_EXECUTOR =
Executors.newSingleThreadScheduledExecutor(r -> {
Thread t = new Thread(r, "mysql-phantom-ref-cleaner");
t.setDaemon(true);
return t;
});
@PostConstruct
public void doTask() {
try {
Field field = NonRegisteringDriver.class.getDeclaredField("connectionPhantomRefs");
field.setAccessible(true);
CLEANER_EXECUTOR.scheduleAtFixedRate(() -> {
try {
Map<?,?> refs = (Map<?,?>) field.get(null);
if (refs != null && refs.size() > 500) {
refs.clear();
log.info("Cleared MySQL phantom references (count={})", refs.size());
}
} catch (Exception e) {
log.error("connectionPhantomRefs clear error!", e);
}
}, 1, 1, TimeUnit.HOURS);
} catch (NoSuchFieldException e) {
throw new IllegalStateException("Failed to initialize MySQL phantom refs field", e);
}
}
@PreDestroy
void shutdown() {
CLEANER_EXECUTOR.shutdownNow();
}
}Upgrade mysql-connector-java to 8.x or disable phantom reference generation with -Dcom.mysql.cj.disableAbandonedConnectionCleanup=true.
Verification of Optimization
After applying the changes and redeploying, GC pause max dropped from 1.25 s to 0.1 s, Young GC frequency improved from every ~20 s to every 6 minutes, and PhantomReference processing time fell to 0.0001966 s. No further timeout alerts were observed, confirming the effectiveness of the optimizations.
JD Tech Talk
Official JD Tech public account delivering best practices and technology innovation.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
