Operations 11 min read

How We Cut Redis Costs by 95% with AWS ElastiCache Serverless for Argo CD

This article details Kaltura's migration of Argo CD's built‑in Redis to AWS ElastiCache Serverless for Valkey, explaining the cost, performance, and operational challenges of the default setup, the step‑by‑step migration process, and the substantial savings and reliability gains achieved.

DevOps Coach
DevOps Coach
DevOps Coach
How We Cut Redis Costs by 95% with AWS ElastiCache Serverless for Argo CD

Background and Motivation

At Kaltura, the platform team manages dozens of Argo CD instances across hundreds of Kubernetes clusters, each deploying a default high‑availability (HA) architecture that includes three Redis pods and three HAProxy pods. This setup significantly increased CPU, memory, and network costs, turning Redis into a costly operational burden.

Problems with the Default Redis Deployment

Each Argo CD instance adds six extra pods, inflating resource consumption.

Redis traffic is routed through HAProxy and often crosses availability zones, incurring additional inter‑zone data transfer fees (approximately $0.04 per GB).

Monthly Redis‑related expenses approached $2,000, not counting engineering time for monitoring, scaling, patching, and security updates (e.g., CVE‑2024‑31989).

Decision to Adopt a Managed Serverless Solution

Realizing that managing Redis conflicted with the team's GitOps focus, Kaltura evaluated cloud‑native managed services and selected AWS ElastiCache Serverless for Valkey, a Redis‑compatible engine offering automatic scaling, high availability, and zero‑maintenance operation.

Migration Strategy

The migration involved three key steps:

Create a Serverless Valkey instance with secure credentials and logical databases to support multiple Argo CD environments.

Update Argo CD configuration to point to the external Redis endpoint and enable TLS using the --redis-use-tls flag for all components (application controller, repo server, and server).

Disable the built‑in Redis and HAProxy pods in the Helm chart, removing the associated resources.

The following values.yaml snippet illustrates the required configuration changes:

controller:
  extraArgs:
    - '--redis-use-tls'
repoServer:
  extraArgs:
    - '--redis-use-tls'
server:
  extraArgs:
    - '--redis-use-tls'
redis:
  enabled: false
redis-ha:
  enabled: false
externalRedis:
  # -- ElastiCache host address, change this to your own ElastiCache endpoint
  host: example.serverless.use1.cache.amazonaws.com
  port: 6379
  # -- The name of an existing secret with Redis (must contain key `redis-password`)
  existingSecret: 'argocd-redis'
redisSecretInit:
  enabled: false

Results and Benefits

Cost Savings : Monthly Redis‑related spend dropped from nearly $2,000 to about $100, a reduction of over 95%. The Serverless service itself cost roughly $30 per month, while network egress charges fell from $0.04/GB to $0.01/GB.

Operational Efficiency : No longer required to patch, scale, back up, or manage failover for Redis; AWS handles all maintenance, effectively achieving “zero‑ops” for the cache layer.

Performance and Availability : Automatic scaling matches workload demand, and the service provides a 99.99% SLA across multiple AZs, improving reliability and eliminating disaster‑recovery delays.

Lessons Learned

Question the necessity of self‑managed components; if a managed service can meet requirements, it often yields lower cost and complexity.

Logical databases enable multi‑tenant use of a single managed Redis instance, simplifying environment isolation.

Adopting cloud‑native services aligns with GitOps principles by reducing manual operational overhead.

Guidance for Others

Teams still using self‑hosted Redis with Argo CD should evaluate total cost of ownership, including compute, network, and engineering effort. Start with a non‑critical Argo CD instance, migrate to a managed Redis service (AWS ElastiCache, Azure Cache for Redis, or Google Cloud Memorystore), and then roll out to production clusters.

The migration can be performed without downtime; Argo CD automatically rebuilds its cache after restart.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

ServerlessKubernetesredisCost OptimizationGitOpsArgo CDElastiCache
DevOps Coach
Written by

DevOps Coach

Master DevOps precisely and progressively.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.