Databases 25 min read

RocksDB Fundamentals and Its Application in Vivo Message Push System

The article explains RocksDB’s LSM‑based architecture, column‑family isolation, and snapshot features, and shows how Vivo’s VPUSH MappingTransformServer uses these capabilities with C++ code to store billions of registerId‑to‑ClientId mappings across multiple replicated servers for high‑concurrency, low‑latency, and fast service expansion.

vivo Internet Technology
vivo Internet Technology
vivo Internet Technology
RocksDB Fundamentals and Its Application in Vivo Message Push System

This article introduces the basic principles of RocksDB and demonstrates how Vivo's message‑push system (VPUSH) leverages RocksDB for high‑concurrency mapping between registerId and ClientId . The goal is to share practical insights for readers who use RocksDB.

Background

In the VPUSH service, a client device is identified by a registerId . Internally, the service uses an internal identifier ClientId . A mapping service called MappingTransformServer (MT) stores the registerId ↔ ClientId mapping in RocksDB, which provides fast read/write and low storage cost.

RocksDB Overview

RocksDB is a fork of LevelDB that adds high‑concurrency write support, optimized SST file layout, and multiple compression strategies. It is widely used as the storage engine for distributed databases such as TiDB.

2.1 LSM Design

RocksDB is built on the Log‑Structured Merge‑Tree (LSM) design. LSM avoids random disk writes by first writing data to memory, then flushing to disk in sorted files (SSTs) that are organized into multiple levels (L0 … Ln). The write path is:

Write data to the in‑memory memtable and simultaneously record a write‑ahead log (WAL).

When the memtable reaches a size threshold, it becomes an immutable memtable .

A flush thread persists the immutable memtable as an SST file in level L0.

A compaction thread merges L0 files into higher levels (L1‑Ln).

2.2 Internal Structure

RocksDB stores data in Column Families (CF) , each acting as a namespace. A CF consists of three components:

memtable : in‑memory write buffer.

sstfile : persistent on‑disk file.

WAL : shared write‑ahead log for crash recovery.

Additional metadata files include Manifest (stores LSM tree information) and Meta for snapshots.

2.3 Write Flow

The write flow follows the LSM steps described above, ensuring high throughput and low latency.

2.4 Read Flow

Read operations start from the memtable, then check immutable memtables, and finally search SST files level by level using binary search.

2.5 Summary

RocksDB achieves high performance by writing first to memory, flushing to sorted SST files, and organizing files into multiple levels. Hot data stays in lower levels, while cold data moves to higher levels.

Business Scenario

The MT service stores billions of registerId → ClientId mappings. To achieve high availability, each application’s data is cached on multiple MT servers (e.g., MT1, MT2, MT3). This multi‑replica design reduces the risk of a single point of failure compared with a centralized Redis cache.

3.1 Column Family Usage

Each application is assigned its own column family, allowing independent management (e.g., copying, snapshotting). The default column family is used when no explicit CF is specified.

Example code for initializing RocksDB with column families:

#include "rocksdb/db.h"
#include "rocksdb/slice.h"
#include "rocksdb/options.h"
#include "rocksdb/utilities/checkpoint.h"
#include "rocksdb/metadata.h"
#include "rocksdb/cache.h"
#include "rocksdb/table.h"
#include "rocksdb/slice_transform.h"
#include "rocksdb/filter_policy.h"
#include
using namespace rocksdb;
int32_t RocksDBCache::init(){
    DB *db;
    std::string m_dbPath = "/rocksdb";
    Options options;
    options.IncreaseParallelism();
    options.OptimizeLevelStyleCompaction();
    options.create_missing_column_families = true;
    std::vector
column_families_list;
    DB::ListColumnFamilies(options, m_dbPath, &column_families_list);
    if (column_families_list.empty()) {
        column_families_list.push_back("default");
    }
    std::vector
column_families;
    for (auto cfName : column_families_list) {
        column_families.push_back(ColumnFamilyDescriptor(cfName, ColumnFamilyOptions()));
    }
    std::vector
handles;
    Status s = DB::Open(options, m_dbPath, column_families, &handles, &db);
    if (column_families_list.size() != handles.size()) {
        return FAILURE;
    }
    for (unsigned int i = 0; i < column_families_list.size(); i++) {
        handleMap[column_families_list[i]] = handles[i];
    }
    return SUCCESS;
}

Creating a new column family:

int32_t RocksDBCache::createCF(const std::string &cfName) {
    ColumnFamilyHandle *cf = nullptr;
    if(handleMap.find(cfName) != handleMap.end()) {
        return FAILURE; // already exists
    }
    Status s = db->CreateColumnFamily(ColumnFamilyOptions(), cfName, &cf);
    if (!s.ok()) {
        return FAILURE;
    }
    handleMap[cfName] = cf;
    return SUCCESS;
}

Read and write examples (simplified):

int32_t RocksDBCache::get(const std::string &cf, const std::string &key, std::string &value){
    auto it = handleMap.find(cf);
    if (it == handleMap.end()) return FAILURE;
    Status s = db->Get(ReadOptions(), it->second, key, &value);
    return s.ok() ? SUCCESS : (s.IsNotFound() ? FAILURE : FAILURE);
}

int32_t RocksDBCache::put(const std::string &cf, const std::string &key, const std::string &value){
    auto it = handleMap.find(cf);
    if (it == handleMap.end()) return FAILURE;
    Status s = db->Put(WriteOptions(), it->second, key, value);
    return s.ok() ? SUCCESS : FAILURE;
}

Batch write example:

int32_t RocksDBCache::writeBatch(const std::string &cfName, const std::string &file){
    if(handleMap.find(cfName) == handleMap.end()) return FAILURE;
    WriteBatch batch;
    ColumnFamilyHandle *handle = handleMap[cfName];
    std::string line;
    int count = 0;
    while (std::getline(file, line)) {
        // parse line → key/value
        batch.Put(handle, key, value);
        if (++count >= 1000) {
            db->Write(WriteOptions(), &batch);
            batch.Clear();
            count = 0;
        }
    }
    db->Write(WriteOptions(), &batch);
    return SUCCESS;
}

3.2 Snapshot Usage

To expand a new MT server, the team copies only the required column‑family data using RocksDB snapshots. The snapshot is generated via Checkpoint::ExportColumnFamily , serialized to a JSON meta file, transferred with rsync / scp , and imported on the target machine with CreateColumnFamilyWithImport .

Snapshot export example:

void RocksDBCache::createCfSnapshot(const std::string &cfName){
    if(handleMap.find(cfName) == handleMap.end()) return FAILURE;
    ColumnFamilyHandle* cfHandle = handleMap[cfName];
    std::string exportDir = "/rocksdb_app_snapshot";
    ExportImportFilesMetaData* meta = nullptr;
    Checkpoint* checkpoint;
    Checkpoint::Create(db, &checkpoint);
    checkpoint->ExportColumnFamily(cfHandle, exportDir, &meta);
    // serialize meta to JSON
    std::string jsonMeta;
    metaToJson(meta, jsonMeta);
    std::ofstream ofs(exportDir + "/meta.json");
    if (ofs.is_open()) {
        ofs << jsonMeta << std::endl;
        ofs.close();
    }
}

Importing the snapshot on a new MT server:

int32_t RocksDBCache::importSnapshot(const std::string &cfName, const std::string &path){
    if(handleMap.find(cfName) != handleMap.end()) return FAILURE; // already exists
    std::string metaPath = path + "/meta.json";
    std::ifstream fin(metaPath, std::ios::binary);
    if (!fin.is_open()) return FAILURE;
    ExportImportFilesMetaData meta;
    jsonToMeta(fin, meta);
    fin.close();
    ColumnFamilyHandle* cfHandle;
    db->CreateColumnFamilyWithImport(ColumnFamilyOptions(), cfName,
                                    ImportColumnFamilyOptions(), meta, &cfHandle);
    handleMap[cfName] = cfHandle;
    return SUCCESS;
}

The overall expansion process consists of exporting snapshots from existing MT nodes, copying them to the new node, and loading them via the import API, achieving a fast (1‑2 hours) service rollout.

Conclusion

The article demonstrates how RocksDB’s LSM architecture, column families, and snapshot capabilities enable a scalable, high‑availability mapping service for massive registerId → ClientId datasets. It also provides concrete C++ code snippets for initialization, column‑family management, read/write operations, batch writes, and cross‑machine snapshot import.

LSM Treehigh concurrencymessage-pushsnapshotRocksDBColumn FamilyKey-Value Store
vivo Internet Technology
Written by

vivo Internet Technology

Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.