How We Cut Redis Costs by $460k Monthly: 10 Proven Optimization Strategies
In 2023, a TapTap infra team reduced Redis operating costs by 460,000 CNY per month through low‑cost ESSD instances, traffic compression, unused‑instance cleanup, TTL management, data migration, online compression, and targeted cleaning, detailing ten concrete measures and the open‑source tools that enabled zero‑downtime optimization.
Author: KL Blog, TapTap infra engineer, author of open source projects kkFileView and Apollo Config Center PMC.
Optimization Results
In 2023, by switching to low‑cost Redis ESSD instances, implementing traffic compression, cleaning unused data, managing TTL, decommissioning idle instances, and developing tools for traffic replication, data migration, online compression, targeted cleaning, and key‑access analysis, we reduced Redis expenses by 460,000 CNY per month.
Note: All Redis mentioned are Alibaba Cloud Redis products.
Optimization Measures
Clean unused instances – 5%
Instance down‑scaling to improve memory utilization – 15%
Tag usage scenarios, allowing some to fill memory – 1%
Set appropriate TTL – 8%
Clean historical data – 6%
Improve KV structure – 1%
Regular scan to release expired memory – 1%
Reduce availability – 3%
Compress value – 32%
Migrate to Redis‑compatible disk‑storage project – 28%
1. Clean unused instances
Identify instances belonging to services that have been shut down and release them. This is the easiest and fastest win: collect Redis metrics, filter instances with persistently low QPS, verify with business owners, then decommission.
2. Instance down‑scaling
Memory usage often lags behind allocated capacity, leading to low utilization. Reasons include over‑estimated capacity and stable traffic phases. Aim for at least 70% memory usage. When down‑scaling, consider:
Cluster to master‑slave downgrade – ensure client compatibility.
Large cluster to smaller cluster – watch for large‑key limits.
Data skew when reducing node count.
Bandwidth considerations.
Prefer specifications with more nodes at the same total memory.
Perform changes during low‑traffic windows to avoid connection spikes.
3. Tag usage scenarios
Redis usage is divided into passive‑cache and active‑cache scenarios. Passive cache (e.g., front‑ending MySQL) can allow memory to fill completely with eviction policies. Active cache stores real‑time streams; when memory reaches 90% alert, expand capacity to avoid data loss. Different TTL policies apply per scenario.
4. Set appropriate TTL
99% of keys are expirable. Align TTL with business needs, collect key‑last‑access distribution, and apply TTL to reduce memory. Example: a 512 GB instance without TTL can save 50% memory by setting reasonable TTLs.
5. Clean historical data
Some services use Redis as persistent storage, leaving obsolete structures. Provide key‑access distribution to business, then use a tool to delete keys by prefix or by idle time.
6. Improve KV structure
Trim oversized JSON objects, use Bloom filters instead of strings where appropriate, achieving >90% memory savings.
7. Regular scan to release expired memory
Redis deletes expired keys lazily. Periodic SCAN and manual deletion improve memory usage and stability.
8. Reduce availability in dev/qa
Merge small instances, downgrade clusters to single‑node specs where high availability is not required.
9. Compress value
Tested gzip, zstd, and snappy on a 852 KB user_profile sample. Results:
snappy
Compression time: 122722 µs
Compression ratio: 22.60%
Decompression time: 49 µs
gzip
Compression time: 39821 µs
Compression ratio: 15.79%
Decompression time: 3521 µs
zstd
Compression time: 6953 µs
Compression ratio: 17.08%
Decompression time: 3153 µs
Snappy’s decompression is fastest but its compression latency is high, so we choose gzip or zstd based on instance size (prefer >100 GB).
10. Migrate to Redis‑compatible disk storage
Memory‑only Redis becomes costly at large scale. Disk‑based solutions (e.g., Alibaba Cloud Tair ESSD, community projects pika, kvrocks) reduce cost to ~16% of memory‑type while offering ≥64 GB capacity. Migration requires careful traffic replay, capacity planning, and monitoring.
Optimization Tools
We open‑sourced a Redis toolset (https://github.com/taptap/redis-tools) that includes:
Traffic replication & scaling.
Online compression & decompression.
Targeted key cleaning & TTL setting.
Key‑last‑access scanning.
Disk‑instance metric collection.
Summary
By preparing tooling, systematically applying the measures above, and executing without service interruption, we cut Redis operating costs dramatically while maintaining system stability.
Programmer DD
A tinkering programmer and author of "Spring Cloud Microservices in Action"
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
