How TiDB’s PointGet Operator Executes Inside TiKV: A Deep Dive
This article provides a comprehensive analysis of TiDB's PointGet operator, detailing its execution flow in the TiKV storage layer, including protobuf interfaces, read‑through lock handling, snapshot creation, MVCC logic, and the integration of the Titan plugin for large values.
Background
TiDB is an HTAP‑capable distributed database that supports horizontal scaling and strong reliability through multi‑replica Raft. Its storage component, TiKV, is an open‑source KV store donated to CNCF and used internally by vivo for various products.
PointGet Overview
PointGet, meaning “point query”, is one of TiDB’s most basic operators. It is used in two typical scenarios:
Query by primary key (e.g.,
MySQL [test]> explain select * from user where id = 1024;)
Query by unique index (e.g.,
MySQL [test]> explain select * from users where name = "test";)
TiKV exposes two APIs: RawAPI for simple Set/Get/Del/Scan and TxnAPI for ACID‑compliant multi‑key transactions.
Implementation in TiDB
TiDB parses SQL into an AST, optimizes it, and builds an executor tree based on the volcano model. The PointGet executor is implemented by PointGetExecutor, whose core logic resides in the Next() method. This method interacts with TiKV via gRPC.
Implementation in TiKV
gRPC Interface
The PointGet RPC is defined in pingcap/kvproto as:
service Tikv {
rpc KvGet(kvrpcpb.GetRequest) returns (kvrpcpb.GetResponse) {}
}The request includes a key and a version (the transaction’s start_ts).
Call Stack
+TiKV::kv_get (grpc-poll-thread)
+future_get
+Storage::get
+Storage::snapshot (readpool-thread)
+SnapshotStore::get
+PointGetterBuilder::build
+PointGetter::getRead‑only requests are routed to a dedicated read‑pool thread.
Read‑through Locks
TiKV’s Percolator model introduces locks during transaction writes. The Context message now carries two fields:
message Context {
repeated uint64 resolved_locks = 13; // locks that can be ignored
repeated uint64 committed_locks = 22; // locks that are committed but not yet cleaned
}These enable the “read‑through lock” mechanism, allowing safe reads without waiting for certain locks.
Snapshot Creation
TiKV creates a snapshot via Engine::async_snapshot. The snapshot context ( SnapContext) contains the start_ts, key ranges, and other flags.
pub fn get(&self, mut ctx: Context, key: Key, start_ts: TimeStamp) -> impl Future {
self.read_pool.spawn_handle(async move {
let snap_ctx = prepare_snap_ctx(...);
let snapshot = Self::with_tls_engine(|e| Self::snapshot(e, snap_ctx)).await?;
let snap_store = SnapshotStore::new(...);
let result = snap_store.get(key);
// update metrics …
})
}MVCC and Lock Checking
The SnapshotStore combines the engine snapshot with the request’s start_ts to provide MVCC‑consistent reads. Before reading, need_check_locks determines whether lock checking is required based on the isolation level.
pub fn need_check_locks(iso_level: IsolationLevel) -> bool {
matches!(iso_level, IsolationLevel::Si | IsolationLevel::RcCheckTs)
}If a lock is found, Lock::check_ts_conflict decides whether it can be ignored (e.g., lock.ts > ts, lock_type == Lock, etc.) or whether a KeyIsLocked error must be returned.
RegionSnapshot Get
After locating the correct region, RegionSnapshot::get builds a data key, checks range, and queries RocksDB:
fn get_value_cf_opt(&self, opts: &ReadOptions, cf: &str, key: &[u8]) -> EngineResult<Option<DbVector>> {
check_key_in_range(key, self.region.get_id(), self.region.get_start_key(), self.region.get_end_key())?;
let data_key = keys::data_key(key);
self.snap.get_value_cf_opt(opts, cf, &data_key).map_err(|e| self.handle_get_value_error(e, cf, key))
}Titan Integration
TiKV uses a fork of RocksDB that includes the Titan plugin. Titan stores large values in separate blob files, reducing write amplification. The get path first queries RocksDB; if the value is a BlobIndex, it fetches the actual data from the blob storage.
Status TitanDBImpl::GetImpl(const ReadOptions& options, ColumnFamilyHandle* handle, const Slice& key, PinnableSlice* value) {
s = db_impl_->GetImpl(options, key, gopts);
if (!s.ok() || !is_blob_index) return s;
BlobIndex index; s = index.DecodeFrom(value); assert(s.ok());
// fetch from BlobStorage …
return s;
}Conclusion
TiKV abstracts storage via Engine, Snapshot, and Iterator traits, cleanly separating storage from upper layers.
It builds a Percolator‑style MVCC transaction model on top of RocksDB, adding async‑commit and 1‑PC optimizations.
The Titan plugin separates large values from the LSM‑tree, improving performance for big‑value workloads.
PointGet’s MVCC path shows that locks from previous prewrites must be resolved; developers should avoid issuing reads immediately after committing async‑commit transactions.
vivo Internet Technology
Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
