Artificial Intelligence 15 min read

How Tair Powers Sub‑Second AI Agent Memory for Real‑Time Ordering

This article examines how Taobao Flash Sale’s AI Agent uses Alibaba Cloud’s Tair as a high‑performance short‑term memory layer, detailing data model design, latency impact, concurrency control, elastic scaling, bandwidth handling, and TTL‑based cleanup to achieve sub‑second response times during massive traffic spikes.

Alibaba Cloud Developer

Mar 27, 2026

How Tair Powers Sub‑Second AI Agent Memory for Real‑Time Ordering

Background and Problem

When a user asks the AI Agent to "order a cup of tea with less sugar and no ice," the system must quickly perform intent recognition, address parsing, product search, specification matching, and cart addition. Each step relies on accurate recall of the entire conversation history, making short‑term memory latency critical for the AI Agent’s overall response time.

Why a High‑Performance Memory Layer Is Essential

In the "one‑sentence ordering" project, the AI Agent aims to compress the traditional 3‑5 minute ordering process to under 30 seconds. According to Little’s Law (Concurrency ≈ QPS × Latency), increasing memory access latency from 5 ms to 50 ms can cause the number of in‑flight requests to grow tenfold, exhausting connections, threads, and queues. Since each dialogue round involves multiple memory reads and writes, latency compounds and can lead to queuing, timeouts, or system collapse.

Choosing Tair as the Short‑Term Memory Store

Tair (compatible with Redis) was selected for its:

Low latency : a custom multithreaded kernel provides microsecond‑level read/write performance.

Rich data structures : List for ordered conversation history and Hash for structured business context.

Elastic scalability : seamless cluster expansion and bandwidth burst handling.

TTL lifecycle management : automatic expiration of session data.

Memory Model Design

Model Memory (List)

Conversation history is stored in a Tair List, one key per session:

Key: memory:model:{sessionId}
Type: List
Example data:
[
  {"role":"user","content":"Help me order a tea"},
  {"role":"assistant","content":"Found 3 nearby shops...","cards":[...]},
  {"role":"user","content":"Less sugar, no ice"},
  {"role":"assistant","content":"Selected: ..."}
]

Core operations:

# Append a new record after each dialogue round
RPUSH memory:model:{sessionId} "{dialogue JSON}"
# Retrieve the latest N rounds as context for inference
LRANGE memory:model:{sessionId} -{N} -1
# Set session expiration (e.g., 30 minutes)
EXPIRE memory:model:{sessionId} 1800

Note: Before writing, raw data (including rich media) is transformed into a concise natural‑language format to reduce token consumption.

Business Context Memory (Hash)

Structured state needed by the Agent’s tool layer is stored in a Hash, also keyed by session:

Key: memory:context:{sessionId}
Type: Hash
Fields:
{
  "session":"{userID, channel, stage, ...}",
  "search":"{current query, results, recommended items}",
  "order":"{cart items, selected SKUs, quantities}",
  "conversation":"{current intent, previous intent, intent switch flag}",
  "coupon":"{available coupons, selected coupon}",
  "bizState":"{delivery address, shipping method, payment status}"
}

Core operations:

# Update a single sub‑module (e.g., user confirms address)
HSET memory:context:{sessionId} bizState "{updated JSON}"
# Read a specific sub‑module
HGET memory:context:{sessionId} order
# Retrieve the whole context for intent recognition
HGETALL memory:context:{sessionId}
# Set expiration (e.g., 30 minutes)
EXPIRE memory:context:{sessionId} 1800

Note: Field‑level reads/writes avoid the read‑modify‑write cycle of a full JSON, eliminating contention between modules.

Data Structure Comparison

List is ideal for ordered conversation history (supports RPUSH and LRANGE). Hash enables independent field updates, perfect for business context where each domain (search, order, etc.) can be modified without affecting others. Both structures are natively supported by Tair’s Redis‑compatible engine.

Concurrency Isolation with Distributed Locks

Concurrent writes can occur when a user sends rapid messages or streams input. Tair uses a session‑level distributed lock to guarantee consistency:

# Acquire lock (3 s timeout)
SET lock:memory:{sessionId} {requestId} NX EX 3
# Perform read/write operations
HSET memory:context:{sessionId} order "{updated order JSON}"
RPUSH memory:model:{sessionId} "{new dialogue record}"
# Release lock via Lua script (ensures only the owner releases)
EVAL "if redis.call('GET', KEYS[1]) == ARGV[1] then return redis.call('DEL', KEYS[1]) else return 0 end" 1 lock:memory:{sessionId} {requestId}

Note: Locks are scoped to a single session, so different users never contend for the same lock.

Elastic Scaling and Bandwidth Management

During the Spring Festival “Red Packet” event, traffic peaked at 10‑12× the forecast. Tair’s cloud‑native architecture handled this by:

Adding read replicas (1‑9 per shard) and increasing shard count (2‑256) to linearly boost throughput.

Using cluster‑level bandwidth expansion: each load balancer provides up to 20 Gbps; additional LB instances can be added without breaking existing connections.

Enabling burst bandwidth per shard (up to 288 MB/s) that auto‑scales for hot keys and retracts when load subsides.

These mechanisms keep latency stable (P99 stays in milliseconds) even under massive load.

TTL‑Based Automatic Cleanup

All session keys receive a reasonable TTL (e.g., 30 minutes). After traffic spikes, memory automatically reclaims space without manual intervention, ensuring low‑cost operation during off‑peak periods.

Summary and Outlook

The Tair‑based short‑term memory service provides:

Sub‑millisecond read/write latency matching real‑time AI Agent requirements.

Flexible modeling with List, Hash, and String structures for diverse memory types.

Automatic lifecycle management via TTL.

Strong concurrency safety through session‑level distributed locks.

Elastic scaling (read‑write separation, shard/replica expansion) and burst bandwidth to absorb 10× traffic spikes.

Future work includes extending the memory layer to long‑term user profiles, enabling the Agent to remember not only the current conversation but also historical preferences and behaviors.