How Grab Scaled Its Token Service: Migrating Redis to AWS Elasticache

This article details Grab's migration from a single‑node Redis to AWS Elasticache, outlining the evaluated solutions, the chosen architecture for horizontal read scalability, and a six‑step migration process that ensured safe, zero‑downtime transition.

Java High-Performance Architecture
Java High-Performance Architecture
Java High-Performance Architecture
How Grab Scaled Its Token Service: Migrating Redis to AWS Elasticache

Background

Grab is a ride‑hailing giant in Southeast Asia with 55 million app downloads and 1.2 million drivers.

The app authenticates requests with a token stored in Redis and persisted to MySQL. The original single‑node Redis could not keep up with rapid user growth.

Solution Options

Alternatives

(1) Multi‑node replication – improves fault tolerance but master‑failover causes write pauses.

(2) Build a Redis Cluster – solves availability but adds client‑side sharding complexity and requires custom migration logic.

Sharding relies on the client, increasing its complexity.

Adding new shards is cumbersome; you must design migration logic and move a set of users.

(3) Use AWS Elasticache – provides on‑demand replica nodes and handles data partitioning, but does not support adding new shards.

Choice

Grab needed horizontal scalability for read‑heavy traffic. AWS Elasticache best met this need: each shard can dynamically add replica nodes, supporting read scaling, though write scaling is limited by the fixed number of shards.

Read load dominates; write load is modest. After capacity planning, Grab selected three shards with two replicas each (nine nodes total).

Migration Process

After choosing AWS Redis Cluster, the migration was split into six steps to ensure safety.

Step 1

Move data from the old Redis nodes to the new cluster using scan, dump, and restore commands, minimizing impact on the legacy nodes.

Step 2

Applications start writing to the cluster while still writing to the old nodes asynchronously, allowing verification without affecting production.

Step 3

Switch to synchronous writes to the cluster, fully involving it in the business flow; any errors now affect real API calls, providing a live test.

Step 4

Read operations are performed against the cluster asynchronously and compared with the old nodes to validate data consistency.

Step 5

All reads are migrated to the cluster and the old Redis is retired from read traffic.

Step 6

Stop all writes to the old Redis, cut any remaining interaction, and complete the migration. Each step is configuration‑driven, allowing rapid rollback if needed.

Conclusion

The migration was straightforward, and Grab’s analytical approach and rigorous process are worth emulating.

This article is a translation of Grab’s engineering post.

http://engineering.grab.com/migrating-existing-datastores
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Data MigrationAWS Elasticache
Java High-Performance Architecture
Written by

Java High-Performance Architecture

Sharing Java development articles and resources, including SSM architecture and the Spring ecosystem (Spring Boot, Spring Cloud, MyBatis, Dubbo, Docker), Zookeeper, Redis, architecture design, microservices, message queues, Git, etc.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.