RocksDB Fundamentals and Its Application in Vivo Message Push System
The article explains RocksDB’s LSM‑based architecture, column‑family isolation, and snapshot features, and shows how Vivo’s VPUSH MappingTransformServer uses these capabilities with C++ code to store billions of registerId‑to‑ClientId mappings across multiple replicated servers for high‑concurrency, low‑latency, and fast service expansion.
This article introduces the basic principles of RocksDB and demonstrates how Vivo's message‑push system (VPUSH) leverages RocksDB for high‑concurrency mapping between registerId and ClientId. The goal is to share practical insights for readers who use RocksDB.
Background
In the VPUSH service, a client device is identified by a registerId. Internally, the service uses an internal identifier ClientId. A mapping service called MappingTransformServer (MT) stores the
registerId ↔ ClientId</strong> mapping in RocksDB, which provides fast read/write and low storage cost.</p>
<p><strong>RocksDB Overview</strong></p>
<p>RocksDB is a fork of LevelDB that adds high‑concurrency write support, optimized SST file layout, and multiple compression strategies. It is widely used as the storage engine for distributed databases such as TiDB.</p>
<p><strong>2.1 LSM Design</strong></p>
<p>RocksDB is built on the Log‑Structured Merge‑Tree (LSM) design. LSM avoids random disk writes by first writing data to memory, then flushing to disk in sorted files (SSTs) that are organized into multiple levels (L0 … Ln). The write path is:</p>
<ol>
<li>Write data to the in‑memory <code>memtableand simultaneously record a write‑ahead log (WAL). When the memtable reaches a size threshold, it becomes an immutable memtable . A flush thread persists the immutable memtable as an SST file in level L0. A compaction thread merges L0 files into higher levels (L1‑Ln). 2.2 Internal Structure RocksDB stores data in Column Families (CF) , each acting as a namespace. A CF consists of three components: memtable: in‑memory write buffer. sstfile: persistent on‑disk file. WAL: shared write‑ahead log for crash recovery.
Additional metadata files include Manifest (stores LSM tree information) and Meta for snapshots. 2.3 Write Flow The write flow follows the LSM steps described above, ensuring high throughput and low latency. 2.4 Read Flow Read operations start from the memtable, then check immutable memtables, and finally search SST files level by level using binary search. 2.5 Summary RocksDB achieves high performance by writing first to memory, flushing to sorted SST files, and organizing files into multiple levels. Hot data stays in lower levels, while cold data moves to higher levels. Business Scenario The MT service stores billions of registerId → ClientId mappings. To achieve high availability, each application’s data is cached on multiple MT servers (e.g., MT1, MT2, MT3). This multi‑replica design reduces the risk of a single point of failure compared with a centralized Redis cache. 3.1 Column Family Usage Each application is assigned its own column family, allowing independent management (e.g., copying, snapshotting). The default column family is used when no explicit CF is specified. Example code for initializing RocksDB with column families:
#include "rocksdb/db.h"
#include "rocksdb/slice.h"
#include "rocksdb/options.h"
#include "rocksdb/utilities/checkpoint.h"
#include "rocksdb/metadata.h"
#include "rocksdb/cache.h"
#include "rocksdb/table.h"
#include "rocksdb/slice_transform.h"
#include "rocksdb/filter_policy.h"
#include <fstream>
using namespace rocksdb;
int32_t RocksDBCache::init(){
DB *db;
std::string m_dbPath = "/rocksdb";
Options options;
options.IncreaseParallelism();
options.OptimizeLevelStyleCompaction();
options.create_missing_column_families = true;
std::vector<std::string> column_families_list;
DB::ListColumnFamilies(options, m_dbPath, &column_families_list);
if (column_families_list.empty()) {
column_families_list.push_back("default");
}
std::vector<ColumnFamilyDescriptor> column_families;
for (auto cfName : column_families_list) {
column_families.push_back(ColumnFamilyDescriptor(cfName, ColumnFamilyOptions()));
}
std::vector<ColumnFamilyHandle*> handles;
Status s = DB::Open(options, m_dbPath, column_families, &handles, &db);
if (column_families_list.size() != handles.size()) {
return FAILURE;
}
for (unsigned int i = 0; i < column_families_list.size(); i++) {
handleMap[column_families_list[i]] = handles[i];
}
return SUCCESS;
}Creating a new column family:
int32_t RocksDBCache::createCF(const std::string &cfName) {
ColumnFamilyHandle *cf = nullptr;
if(handleMap.find(cfName) != handleMap.end()) {
return FAILURE; // already exists
}
Status s = db->CreateColumnFamily(ColumnFamilyOptions(), cfName, &cf);
if (!s.ok()) {
return FAILURE;
}
handleMap[cfName] = cf;
return SUCCESS;
}Read and write examples (simplified):
int32_t RocksDBCache::get(const std::string &cf, const std::string &key, std::string &value){
auto it = handleMap.find(cf);
if (it == handleMap.end()) return FAILURE;
Status s = db->Get(ReadOptions(), it->second, key, &value);
return s.ok() ? SUCCESS : (s.IsNotFound() ? FAILURE : FAILURE);
}
int32_t RocksDBCache::put(const std::string &cf, const std::string &key, const std::string &value){
auto it = handleMap.find(cf);
if (it == handleMap.end()) return FAILURE;
Status s = db->Put(WriteOptions(), it->second, key, value);
return s.ok() ? SUCCESS : FAILURE;
}Batch write example:
int32_t RocksDBCache::writeBatch(const std::string &cfName, const std::string &file){
if(handleMap.find(cfName) == handleMap.end()) return FAILURE;
WriteBatch batch;
ColumnFamilyHandle *handle = handleMap[cfName];
std::string line;
int count = 0;
while (std::getline(file, line)) {
// parse line → key/value
batch.Put(handle, key, value);
if (++count >= 1000) {
db->Write(WriteOptions(), &batch);
batch.Clear();
count = 0;
}
}
db->Write(WriteOptions(), &batch);
return SUCCESS;
}3.2 Snapshot Usage To expand a new MT server, the team copies only the required column‑family data using RocksDB snapshots. The snapshot is generated via Checkpoint::ExportColumnFamily , serialized to a JSON meta file, transferred with rsync / scp , and imported on the target machine with CreateColumnFamilyWithImport . Snapshot export example:
void RocksDBCache::createCfSnapshot(const std::string &cfName){
if(handleMap.find(cfName) == handleMap.end()) return FAILURE;
ColumnFamilyHandle* cfHandle = handleMap[cfName];
std::string exportDir = "/rocksdb_app_snapshot";
ExportImportFilesMetaData* meta = nullptr;
Checkpoint* checkpoint;
Checkpoint::Create(db, &checkpoint);
checkpoint->ExportColumnFamily(cfHandle, exportDir, &meta);
// serialize meta to JSON
std::string jsonMeta;
metaToJson(meta, jsonMeta);
std::ofstream ofs(exportDir + "/meta.json");
if (ofs.is_open()) {
ofs << jsonMeta << std::endl;
ofs.close();
}
}Importing the snapshot on a new MT server:
int32_t RocksDBCache::importSnapshot(const std::string &cfName, const std::string &path){
if(handleMap.find(cfName) != handleMap.end()) return FAILURE; // already exists
std::string metaPath = path + "/meta.json";
std::ifstream fin(metaPath, std::ios::binary);
if (!fin.is_open()) return FAILURE;
ExportImportFilesMetaData meta;
jsonToMeta(fin, meta);
fin.close();
ColumnFamilyHandle* cfHandle;
db->CreateColumnFamilyWithImport(ColumnFamilyOptions(), cfName,
ImportColumnFamilyOptions(), meta, &cfHandle);
handleMap[cfName] = cfHandle;
return SUCCESS;
}The overall expansion process consists of exporting snapshots from existing MT nodes, copying them to the new node, and loading them via the import API, achieving a fast (1‑2 hours) service rollout. Conclusion The article demonstrates how RocksDB’s LSM architecture, column families, and snapshot capabilities enable a scalable, high‑availability mapping service for massive registerId → ClientId datasets. It also provides concrete C++ code snippets for initialization, column‑family management, read/write operations, batch writes, and cross‑machine snapshot import.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
vivo Internet Technology
Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
