Databases 20 min read

Building a High‑Performance OAuth Token Service with Tarantool, Raft, and Sharding

This article explains how we designed and implemented a scalable OAuth token storage and refresh system using Tarantool’s in‑memory database, Raft leader election, sharding across multiple data centers, and a custom lightweight queue to handle high‑throughput token updates while maintaining consistency and fault tolerance.

ITPUB
ITPUB
ITPUB
Building a High‑Performance OAuth Token Service with Tarantool, Raft, and Sharding

Background

Tarantool DBMS is known for its high performance, on‑disk engine Vinyl, and JSON support, but many articles overlook its ability to run Lua code inside the storage engine, enabling efficient data processing.

Problem Statement

We needed a service to store and refresh OAuth tokens for Mail.Ru‑related projects. Tokens consist of an access_token (short‑lived, allows actions) and a refresh_token (long‑lived, can obtain new access tokens). The one‑hour lifespan of access tokens creates a high‑load scenario when millions of tokens must be refreshed.

Initial Architecture

The simple framework consists of front‑ends that read/write tokens, an updater that obtains new access tokens from the OAuth provider, and two Tarantool nodes (master and replica) placed in separate data centers.

Challenges

Token validity is only one hour, leading to up to 3000 rps for 10 million tokens.

Database or server outages can invalidate large fractions of tokens.

CPU saturation after two years of logical extensions.

We upgraded the CPU temporarily, then moved to Tarantool 1.6, which supports master‑master replication.

Leader Election with Raft

To avoid three‑fold increase in OAuth provider requests, we needed a single leader per data‑center. We replaced a complex Paxos‑like algorithm with Raft, which selects the node that can communicate with the others as leader.

Using Tarantool’s net.box we connect every node in a mesh and run Raft on top of those connections. The resulting state is either leader, follower, or neither.

Handling “Abandoned” Nodes

If a data‑center loses connectivity, its nodes become “abandoned”. Raft still elects a new leader among the remaining nodes, preserving cluster operation. Abandoned nodes can optionally update tokens via a dedicated updater, but this may cause redundant work.

Sharding Strategy

To overcome CPU limits we introduced sharding. Two shards each have a replica. A deterministic function (e.g., CRC32) maps a key to a shard. We considered client‑side sharding (simple but requires all clients to know the function) and database‑side sharding (more complex DB code but hides sharding from clients).

Proxy Layer

To reduce the quadratic connection growth (each node connecting to every other), we introduce a lightweight proxy that knows the shard map (stored in a Lua config). Clients talk to the proxy; the proxy forwards requests to the appropriate shard leader.

Token Update Queue

Standard queues do not meet our timing requirements because each token must be refreshed before its one‑hour expiry. We implement a custom queue inside Tarantool tuples, adding two fields: status and time. The queue supports two operations:

put : insert a new task with status and expiry time.

take : create an index‑based iterator that selects ready tasks, waits if none are ready, and uses Tarantool’s fiber channels to notify waiting workers.

When a token is published, its tuple is tracked; if a client disconnects, the associated tasks are released automatically.

Conclusion

By combining master‑master replication, Raft leader election, sharding with a proxy layer, and a purpose‑built token queue, we achieved a fault‑tolerant, scalable OAuth token service that handles millions of refreshes per hour while keeping connection overhead linear and CPU usage within limits.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Distributed SystemsshardingRaftOAuthTarantoolToken Queue
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.