Lushan: An Offline Static Data Storage Server for Recommendation Systems
This article details the design, implementation, and performance of Lushan, a high‑throughput offline static data storage server built with libevent that supports dynamic library mounting, key‑value indexing, and efficient query handling for large‑scale recommendation workloads.
Lushan is a storage layer used in the three‑tier architecture of recommendation engines to hold large volumes of offline static data whose update frequency is low (daily or weekly) and is replaced in bulk.
Key characteristics of the offline static data include low update frequency, full‑replace updates, large scale, and optional key‑value format.
Originally created to simplify testing multiple recommendation algorithms without deploying separate services, Lushan evolved into a generic offline data storage server now handling data such as missed posts, second‑degree relationships, interest co‑occurrence, fan similarity, intimacy scores, user groups, follow relationships, and user features, serving roughly 12 billion requests per day across two data‑center clusters with a total storage of over 2 TB.
1. Lushan Features Overview Lushan stores offline static data for recommendation, can mount multiple databases on a single instance without restarting, uses a libevent‑based event‑driven model with IO multiplexing to solve the c10k problem, and speaks the memcached protocol, allowing standard memcached clients to interact with it.
2. Overall Architecture The system consists of clients, Lushan servers, and source data (HDFS or data files). It supports query and library‑mount operations.
3. Communication Model
3.1 Network Model Comparison Traditional blocking sockets require one thread per connection, leading to the c10k problem. Select uses polling with O(n) cost. Asynchronous mechanisms like epoll, kqueue, and IOCP provide O(1) efficiency. libevent abstracts these mechanisms into a unified API.
3.2 libevent Lushan’s communication relies on libevent, an open‑source event‑driven network library.
Key libevent concepts:
IO events: EV_READ EV_WRITE
Timeout events: EV_TIMEOUT
Signal events: EV_SIGNAL
Options: EV_PERSIST (permanent event)Important libevent functions:
event_set() // create new event structure
event_add() // add event to queue
event_dispatch() // start event loopCallback prototype:
typedef void (*event_callback_fn)(evutil_socket_t sockfd, short event_type, void *arg)3.3 Lushan Server Initialization On startup Lushan initializes configuration (settings), statistics (stats), and a free‑connection pool (freeconns). It creates a listening socket, initializes request (REQ) and response (RSP) queues with locks and condition variables, and sets up a pipe for inter‑thread notifications.
Worker threads (default 4) repeatedly:
Blockingly fetch a connection from the REQ queue.
Process commands (e.g., stats, open, close, get) and fill the write buffer.
Push the connection onto the RSP queue.
Write a byte to the pipe to trigger a notification event.
The libevent loop registers a notification event whose callback notify_handler reads from the pipe, pulls a connection from the RSP queue, and hands it to drive_machine , a state machine handling connection states (listening, read, write, closing).
State transitions:
conn_listening : accept new socket, set non‑blocking, create new conn.
conn_read : read command, if complete process immediately, otherwise enqueue to REQ.
conn_write : write response, handle EAGAIN/EWOULDBLOCK, then move to closing.
conn_closing : close connection and recycle it into freeconns .
4. Database (DB) Layer
Each Lushan instance can load multiple DBs, each consisting of an index file (idx) and a data file (dat). The index entry is a 64‑bit key and a 64‑bit position where the high 40 bits are the offset and the low 20 bits are the length (max 1 MiB).
DB structures:
hdict : represents a single DB, implemented as a tail queue with metadata (path, idx_num, fd, open_time, query count, ref count, etc.).
hdb : global container holding all DBs, with open and close lists, a hash table of size 1024 for quick lookup, and reference counting to ensure safe hot‑swap of databases.
Loading a DB involves reading the entire index file into memory (via fread ) and opening the data file descriptor. Replacing a DB adds the new hdict to the open list and moves the old one to the close list; a background thread later frees DBs whose reference count drops to zero.
Data lookup uses binary search on the sorted index to obtain pos , then calls pread on the data file to retrieve the value.
5. Control Script A shell script (not part of the C code) starts Lushan and enables dynamic library mounting.
6. Performance Test Deployed on two CentOS 5.4 machines (Xeon E5620, 24 GB RAM). Randomly sampled 100 k records (8 B – 102 252 B, avg 1 618 B). A Python multi‑process client generated 60 processes × 10 000 requests (600 k total), achieving 9 000 QPS, with 99.9 % of requests under 11 ms and average latency ~5 ms.
7. Outlook Future work includes adding a proxy layer to hide online/offline data differences, supporting string keys, and providing deployment tools for more efficient cluster management.
Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.