Design and Implementation of lmstfy: A Redis‑Based Task Queue Service
lmstfy is a stateless, Redis‑backed task‑queue service from Meitu that provides delayed execution, automatic retries, priority handling, expiration, and a RESTful HTTP API, while supporting horizontal scaling via namespace‑based token routing, rich Prometheus metrics, and future disk‑based storage extensions.
lmstfy (Let Me Schedule Task For You) is a simple task‑queue service built by Meitu's architecture foundation team in early 2018 on top of Redis. It has been used in multiple Meitu online products for nearly two years, offering delayed execution, automatic retries, priority handling, and expiration.
Supports delayed tasks, automatic retries, priority, and expiration.
Provides an HTTP RESTful API for task operations.
Horizontally scalable.
Rich business and performance metrics.
GitHub project: https://github.com/meitu/lmstfy
Use Cases
Task queues differ from message queues in that tasks have no strict ordering constraints, while messages often require FIFO ordering. Task queues also need to update task status, which messages typically do not. Typical scenarios include:
Scheduled jobs (e.g., push notifications at 8 am daily, periodic data cleanup).
Workflow orchestration (e.g., resource creation, DNS updates).
Retryable jobs such as offline image processing.
Goals and Research
Before building a custom queue, the team evaluated open‑source solutions against three requirements:
Support for delayed/priority tasks and automatic retries.
High availability with no single point of failure and data loss protection.
Scalability in capacity and performance.
They considered Disque (Redis author’s distributed queue) but abandoned it after Redis decided to deprecate the project. Beanstalkd was examined but rejected due to lack of replication and single‑point‑of‑failure concerns. Kafka/RocketMQ‑based designs were also dismissed because per‑delay‑interval topics would explode in number and degrade performance.
Design and Implementation
Overall Architecture
lmstfy is a stateless HTTP service that can be fronted by a Layer‑7 load balancer. It consists of four core modules:
Pump Thread – polls Redis each second to move expired tasks from the timer to the ready queue.
Metric Collector – periodically gathers queue statistics and exposes them via a Prometheus exporter.
Token Manager – manages namespaces and tokens for business isolation.
Producer/Consumer – handles task submission and consumption.
Default pool stores both business data and metadata such as namespace/token.
Core Concepts
namespace – isolates business domains.
queue – identifies a specific message type within a namespace.
job – the actual task, identified by a globally unique 16‑byte ID.
delay – seconds before the job becomes ready.
tries – maximum retry count.
ttl – time‑to‑live after which the job expires.
ttr – time‑to‑run; if the consumer does not ACK within this period, the job is considered failed.
Data Storage
Four Redis structures are used:
timer (sorted set) – stores delayed jobs ordered by execution timestamp.
ready queue (list) – holds jobs ready for immediate consumption.
deadletter (list) – stores jobs that have exhausted their retry attempts.
job pool (string) – holds the actual payload; other structures store only job IDs to save memory.
Delayed tasks are thus a combination of a FIFO list and a sorted set. A background thread periodically moves expired entries from the timer to the ready queue using an atomic Redis Lua script.
Task Insertion
When a task is created, a 16‑byte job ID is generated (timestamp + random + delay). The payload is stored under the key j:{namespace}/{queue}/{ID} . Depending on the delay value, the job ID is placed either directly into the ready queue (delay = 0) or into the timer sorted set (delay > 0) with an absolute timestamp.
Task Consumption
Consumers pop a job ID from the ready queue (RPOP). The payload is fetched from the job pool, the remaining retry count is decremented, and the job is moved to the timer with its ttr. If retries reach zero, the job is transferred to the dead‑letter queue after ttr expires.
Synchronous Task Model
Beyond asynchronous and delayed execution, lmstfy can emulate a synchronous request‑response pattern: the producer writes a task and listens on a queue named after the job ID; the consumer, after successful processing, writes a reply to that same queue. The producer treats a timely reply as success, otherwise it times out.
Horizontal Scaling
lmstfy is stateless, so scaling the service layer is trivial. Scaling the storage layer is achieved by routing namespaces to different Redis pools via tokens. Example configuration:
[Pool]
[Pool.default]
Addr = "1.1.1.1:6379"
[Pool.meipai]
Addr = "2.2.2.2:6389"When creating a namespace, the desired pool name is embedded in the token (e.g., meipai:01DT8EZ1N6XT ), allowing the service to route requests to the appropriate Redis instance.
How to Use
# Create a namespace and token (admin port)
$ ./scripts/token-cli -c -n test_ns -p default -D "test ns apply by @hulk" 127.0.0.1:7778
{
"token": "01DT9323JACNBQ9JESV80G0000"
}
# Push a task with value payload
$ curl -XPUT -d "value" -i "http://127.0.0.1:7777/api/test_ns/q1?tries=3&delay=1&token=01DT931XGSPKNB7E2XFKPY3ZPB"
{"job_id":"01DT9323JACNBQ9JESV80G0000","msg":"published"}
# Consume a task
$ curl -i "http://127.0.0.1:7777/api/test_ns/q1?ttr=30&timeout=3&&token=01DT931XGSPKNB7E2XFKPY3ZPB"
{"data":"value","elapsed_ms":272612,"job_id":"01DT9323JACNBQ9JESV80G0000","msg":"new job","namespace":"test_ns","queue":"q1","ttl":86127}
# ACK the task to prevent retry
curl -i -XDELETE "http://127.0.0.1:7777/api/test_ns/q1/job/01DT9323JACNBQ9JESV80G0000?token=01DT931XGSPKNB7E2XFKPY3ZPB"More detailed API documentation is available in the README (https://github.com/meitu/lmstfy/blob/master/README.md). SDKs for PHP and Go are provided; other languages can use the HTTP API directly.
Monitoring Metrics
lmstfy exposes extensive metrics for both business health and performance:
Production rate
Consumption rate
Number of delayed tasks
Queue size (backlog)
Dead‑letter size (failed tasks)
Latency distribution from production to consumption (P50, P95, …)
API latency (P95) for production and consumption
Concurrent connection count
Future Plans
Currently a 2 GB Redis instance can handle tens of millions of delayed tasks. For scenarios requiring massive, long‑lived TTL (e.g., object‑storage lifecycle), Redis becomes costly. The team plans to add disk‑based storage options such as local files or KVROCKS (a SSD‑optimized Redis‑compatible KV store). KVROCKS is already open‑source and deployed internally at Meitu.
Project links:
KVROCKS: https://github.com/meitu/kvrocks
lmstfy: https://github.com/meitu/lmstfy
For further technical discussion, contact the author at [email protected].
Meitu Technology
Curating Meitu's technical expertise, valuable case studies, and innovation insights. We deliver quality technical content to foster knowledge sharing between Meitu's tech team and outstanding developers worldwide.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.