Databases 17 min read

How We Cut Redis Costs by $460k Monthly: 10 Proven Optimization Strategies

In 2023, a TapTap infra team reduced Redis operating costs by 460,000 CNY per month through low‑cost ESSD instances, traffic compression, unused‑instance cleanup, TTL management, data migration, online compression, and targeted cleaning, detailing ten concrete measures and the open‑source tools that enabled zero‑downtime optimization.

Programmer DD
Programmer DD
Programmer DD
How We Cut Redis Costs by $460k Monthly: 10 Proven Optimization Strategies

Author: KL Blog, TapTap infra engineer, author of open source projects kkFileView and Apollo Config Center PMC.

Optimization Results

In 2023, by switching to low‑cost Redis ESSD instances, implementing traffic compression, cleaning unused data, managing TTL, decommissioning idle instances, and developing tools for traffic replication, data migration, online compression, targeted cleaning, and key‑access analysis, we reduced Redis expenses by 460,000 CNY per month.

Note: All Redis mentioned are Alibaba Cloud Redis products.

Optimization Measures

Clean unused instances – 5%

Instance down‑scaling to improve memory utilization – 15%

Tag usage scenarios, allowing some to fill memory – 1%

Set appropriate TTL – 8%

Clean historical data – 6%

Improve KV structure – 1%

Regular scan to release expired memory – 1%

Reduce availability – 3%

Compress value – 32%

Migrate to Redis‑compatible disk‑storage project – 28%

1. Clean unused instances

Identify instances belonging to services that have been shut down and release them. This is the easiest and fastest win: collect Redis metrics, filter instances with persistently low QPS, verify with business owners, then decommission.

2. Instance down‑scaling

Memory usage often lags behind allocated capacity, leading to low utilization. Reasons include over‑estimated capacity and stable traffic phases. Aim for at least 70% memory usage. When down‑scaling, consider:

Cluster to master‑slave downgrade – ensure client compatibility.

Large cluster to smaller cluster – watch for large‑key limits.

Data skew when reducing node count.

Bandwidth considerations.

Prefer specifications with more nodes at the same total memory.

Perform changes during low‑traffic windows to avoid connection spikes.

3. Tag usage scenarios

Redis usage is divided into passive‑cache and active‑cache scenarios. Passive cache (e.g., front‑ending MySQL) can allow memory to fill completely with eviction policies. Active cache stores real‑time streams; when memory reaches 90% alert, expand capacity to avoid data loss. Different TTL policies apply per scenario.

4. Set appropriate TTL

99% of keys are expirable. Align TTL with business needs, collect key‑last‑access distribution, and apply TTL to reduce memory. Example: a 512 GB instance without TTL can save 50% memory by setting reasonable TTLs.

5. Clean historical data

Some services use Redis as persistent storage, leaving obsolete structures. Provide key‑access distribution to business, then use a tool to delete keys by prefix or by idle time.

6. Improve KV structure

Trim oversized JSON objects, use Bloom filters instead of strings where appropriate, achieving >90% memory savings.

7. Regular scan to release expired memory

Redis deletes expired keys lazily. Periodic SCAN and manual deletion improve memory usage and stability.

8. Reduce availability in dev/qa

Merge small instances, downgrade clusters to single‑node specs where high availability is not required.

9. Compress value

Tested gzip, zstd, and snappy on a 852 KB user_profile sample. Results:

snappy

Compression time: 122722 µs

Compression ratio: 22.60%

Decompression time: 49 µs

gzip

Compression time: 39821 µs

Compression ratio: 15.79%

Decompression time: 3521 µs

zstd

Compression time: 6953 µs

Compression ratio: 17.08%

Decompression time: 3153 µs

Snappy’s decompression is fastest but its compression latency is high, so we choose gzip or zstd based on instance size (prefer >100 GB).

10. Migrate to Redis‑compatible disk storage

Memory‑only Redis becomes costly at large scale. Disk‑based solutions (e.g., Alibaba Cloud Tair ESSD, community projects pika, kvrocks) reduce cost to ~16% of memory‑type while offering ≥64 GB capacity. Migration requires careful traffic replay, capacity planning, and monitoring.

Optimization Tools

We open‑sourced a Redis toolset (https://github.com/taptap/redis-tools) that includes:

Traffic replication & scaling.

Online compression & decompression.

Targeted key cleaning & TTL setting.

Key‑last‑access scanning.

Disk‑instance metric collection.

Summary

By preparing tooling, systematically applying the measures above, and executing without service interruption, we cut Redis operating costs dramatically while maintaining system stability.

RedisPerformance Tuningcost optimizationdata compressionInfrastructureDatabase Management
Programmer DD
Written by

Programmer DD

A tinkering programmer and author of "Spring Cloud Microservices in Action"

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.