Operations 19 min read

Cache Monitoring Practices for Redis and Caffeine in High‑Traffic Game Services

The article details practical monitoring strategies for both remote Redis and local Caffeine caches in high‑traffic game services, including prefix‑based Redis key tracking, Aspect‑oriented instrumentation, Caffeine statistics collection, and real‑world case studies that illustrate how these metrics identify hot‑keys, cache‑miss spikes, and reduce system load.

vivo Internet Technology
vivo Internet Technology
vivo Internet Technology
Cache Monitoring Practices for Redis and Caffeine in High‑Traffic Game Services

This article, authored by a member of the Vivo Internet Server team, presents a practical guide on monitoring Redis (remote cache) and Caffeine (local cache) in large‑scale game business scenarios. It shares real‑world cases, monitoring objectives, implementation details, and concrete examples.

1. Background

Game services generate massive high‑frequency requests, making caching essential for handling high concurrency.

Both remote Redis and local Caffeine are combined to cope with traffic spikes.

The article distills effective monitoring and governance cases from production experience.

2. Remote Cache (Redis) Monitoring

2.1 Monitoring Scheme

The goal is to discover, locate, and resolve issues quickly while keeping costs under control and enriching monitoring dimensions.

Beyond basic server metrics (request count, connections), business‑related metrics are needed.

Common Redis problems include hot‑key, large‑key, and overload due to massive request volume.

2.1.2 Monitoring Plan

Monitoring should cover both single hot‑key detection and aggregated trends for keys sharing the same prefix.

Redis‑side key monitoring is built into the Redis server (not detailed here).

Business‑side aggregation is achieved via an Aspect interceptor that reports custom metrics.

2.1.3 Dashboard

Images illustrate Redis server native metrics and business‑dimension prefix metrics (links omitted for brevity).

2.2 Implementation

Key design uses a unified prefix format (e.g., Prefix:UserId ) and a utility to build keys:

public class RedisKeyConstants {
    public static final String REDIS_GAMEGROUP_NEW_KEY = "newgamegroup";
    public static final String REDIS_GAMEGROUP_DETAIL_KEY = "gamegroup:detail";
    public static final String REDIS_KEY_IUNIT_STRATEGY_COUNT = "activity:ihandler:strategy:count";
    public static final String CONTENT_DISTRIBUTE_CURRENT = "content:distribute:current";
    public static final String RECOMMEND_NOTE = "recommend:note";
}

public class RedisUtils {
    public static final String COMMON_REDIS_KEY_SPLIT = ":";
    public static String buildRedisKey(String key, Object... params) {
        if (params == null || params.length == 0) {
            return key;
        }
        for (Object param : params) {
            key += COMMON_REDIS_KEY_SPLIT + param;
        }
        return key;
    }
}

The Aspect‑oriented monitor intercepts Redis write operations and reports prefix‑aggregated metrics:

@Slf4j
@Aspect
@Order(0)
@Component
public class RedisMonitorAspect {
    private static final String PREFIX_CONFIG = "redis.monitor.prefix";
    private static final Set
PREFIX_SET = new HashSet<>();
    @Resource
    private MonitorComponent monitorComponent;
    static {
        String prefixValue = VivoConfigManager.getString(PREFIX_CONFIG, "");
        refreshConf(prefixValue);
        VivoConfigManager.addListener(new VivoConfigListener() {
            @Override
            public void eventReceived(PropertyItem propertyItem, ChangeEventType changeEventType) {
                if (StringUtils.equalsIgnoreCase(propertyItem.getName(), PREFIX_CONFIG)) {
                    refreshConf(propertyItem.getValue());
                }
            }
        });
    }
    private static void refreshConf(String prefixValue) {
        if (StringUtils.isNotEmpty(prefixValue)) {
            String[] prefixArr = StringUtils.split(prefixValue, ",");
            Arrays.stream(prefixArr).forEach(item -> PREFIX_SET.add(item));
        }
    }
    @Pointcut("execution(* com.vivo.joint.dal.common.redis.dao.RedisDao.set*(..))")
    public void point() {}
    @Around("point()")
    public Object around(ProceedingJoinPoint pjp) throws Throwable {
        Object result = pjp.proceed();
        try {
            if (VivoConfigManager.getBoolean("joint.center.redis.monitor.switch", true)) {
                Object[] args = pjp.getArgs();
                if (args != null && args.length > 0) {
                    String redisKey = String.valueOf(args[0]);
                    if (VivoConfigManager.getBoolean("joint.center.redis.monitor.send.log.switch", true)) {
                        LOGGER.info("Updating Redis cache {}", redisKey);
                    }
                    String monitorKey = null;
                    if (!PREFIX_SET.isEmpty()) {
                        for (String prefix : PREFIX_SET) {
                            if (StringUtils.startsWithIgnoreCase(redisKey, prefix)) {
                                monitorKey = prefix;
                                break;
                            }
                        }
                    }
                    if (StringUtils.isEmpty(monitorKey) && StringUtils.contains(redisKey, ":")) {
                        monitorKey = StringUtils.substringBeforeLast(redisKey, ":");
                    }
                    monitorComponent.sendRedisMonitorData(monitorKey);
                }
            }
        } catch (Exception e) {
            // ignore
        }
        return result;
    }
}

A concrete case shows how prefix‑based monitoring revealed excessive set operations on popup:user:plan , indicating an ineffective cache design that was later removed, reducing Redis load and improving response time.

3. Local Cache (Caffeine) Monitoring

3.1 Monitoring Scheme

Metrics include request count, hit rate, miss rate, etc.

Caffeine’s native recordStats feature is enabled via a customized vivo‑caffeine library.

Data is reported per‑machine and per‑cache‑instance, supporting full‑stack queries and alerting (e.g., cache‑miss spikes).

Dashboard images (omitted) display Caffeine system metrics.

3.2 Implementation

public final class Caffeine
{
    /** caffeine instance name */
    String instanceName;
    /** map of instance name to cache */
    static Map
cacheInstanceMap = new ConcurrentHashMap<>();
    @NonNull
    public
Cache
build() {
        requireWeightWithWeigher();
        requireNonLoadingCache();
        @SuppressWarnings("unchecked")
        Caffeine
self = (Caffeine
) this;
        Cache localCache = isBounded() ? new BoundedLocalCache.BoundedLocalManualCache<>(self)
                                      : new UnboundedLocalCache.UnboundedLocalManualCache<>(self);
        if (localCache != null && StringUtils.isNotEmpty(localCache.getInstanceName())) {
            cacheInstanceMap.put(localCache.getInstanceName(), localCache);
        }
        return localCache;
    }
}

static Cache
> accountWhiteCache = Caffeine.newBuilder()
    .applyName("accountWhiteCache")
    .expireAfterWrite(VivoConfigManager.getInteger("trade.account.white.list.cache.ttl", 10), TimeUnit.MINUTES)
    .recordStats()
    .maximumSize(VivoConfigManager.getInteger("trade.account.white.list.cache.size", 100))
    .build();

Utility to extract CacheStats and package them into a StatsData object for reporting:

public static StatsData getCacheStats(String instanceName) {
    Cache cache = Caffeine.getCacheByInstanceName(instanceName);
    CacheStats cacheStats = cache.stats();
    StatsData statsData = new StatsData();
    statsData.setInstanceName(instanceName);
    statsData.setTimeStamp(System.currentTimeMillis() / 1000);
    statsData.setMemoryUsed(String.valueOf(cache.getMemoryUsed()));
    statsData.setEstimatedSize(String.valueOf(cache.estimatedSize()));
    statsData.setRequestCount(String.valueOf(cacheStats.requestCount()));
    statsData.setHitCount(String.valueOf(cacheStats.hitCount()));
    statsData.setHitRate(String.valueOf(cacheStats.hitRate()));
    statsData.setMissCount(String.valueOf(cacheStats.missCount()));
    statsData.setMissRate(String.valueOf(cacheStats.missRate()));
    // additional fields omitted for brevity
    return statsData;
}

Reporting logic gathers all cache instances, builds a ReportData object, and sends it via HTTP POST to the monitoring platform.

public static void sendReportData() {
    try {
        if (!VivoConfigManager.getBoolean("memory.caffeine.data.report.switch", true)) {
            return;
        }
        Method listCacheInstanceMethod = HANDLER_MANAGER_CLASS.getMethod("listCacheInstance", null);
        List
instanceNames = (List) listCacheInstanceMethod.invoke(null, null);
        if (CollectionUtils.isEmpty(instanceNames)) {
            return;
        }
        String appName = System.getProperty("app.name");
        String localIp = getLocalIp();
        String localPort = String.valueOf(NetPortUtils.getWorkPort());
        ReportData reportData = new ReportData();
        InstanceData instanceData = new InstanceData();
        instanceData.setAppName(appName);
        instanceData.setIp(localIp);
        instanceData.setPort(localPort);
        // collect stats per instance
        Method getCacheStatsMethod = HANDLER_MANAGER_CLASS.getMethod("getCacheStats", String.class);
        Map
statsDataMap = new HashMap<>();
        instanceNames.forEach(instanceName -> {
            try {
                StatsData statsData = (StatsData) getCacheStatsMethod.invoke(null, instanceName);
                statsDataMap.put(instanceName, statsData);
            } catch (Exception e) {
                // ignore
            }
        });
        reportData.setInstanceData(instanceData);
        reportData.setStatsDataMap(statsDataMap);
        HttpPost httpPost = new HttpPost(getReportDataUrl());
        httpPost.setConfig(requestConfig);
        StringEntity stringEntity = new StringEntity(JSON.toJSONString(reportData));
        stringEntity.setContentType("application/json");
        httpPost.setEntity(stringEntity);
        HttpResponse response = httpClient.execute(httpPost);
        String result = EntityUtils.toString(response.getEntity(), "UTF-8");
        EntityUtils.consume(response.getEntity());
        logger.info("Caffeine data report success URL {} params {} result {}", getReportDataUrl(), JSON.toJSONString(reportData), result);
    } catch (Throwable throwable) {
        logger.error("Caffeine data report failed URL {}", getReportDataUrl(), throwable);
    }
}

3.3 Case Study

An incident showed a sudden surge in Redis requests without clear key prefixes. With the introduced Redis business‑side monitoring, the root cause was quickly identified.

The surge was traced to a Caffeine local cache whose size was too small, causing massive cache‑misses and consequently a flood of Redis reads, nearly crashing the Redis service.

Monitoring hit‑rate and miss‑rate trends allowed the team to pinpoint the cache‑penetration issue and adjust the Caffeine configuration.

4. Conclusion

The sharing demonstrates effective cache‑monitoring practices derived from real production incidents in game services.

Redis prefix‑key monitoring and Caffeine native statistics together provide a comprehensive view that helps reduce failure impact and improve system stability.

Continuous improvement and community sharing are encouraged.

performanceRediscaffeineAspectJCache MonitoringGame Services
vivo Internet Technology
Written by

vivo Internet Technology

Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.