Databases 19 min read

How CockroachDB’s GEO‑Partitioning Solves Hot‑Cold Data Separation at Scale

This article explains CockroachDB’s partitioning features—especially GEO‑Partitioning—detailing its evolution, implementation principles, advantages over traditional databases, and a step‑by‑step hot‑cold data separation use case with multi‑region deployment, configuration commands, performance considerations, and Q&A.

dbaplus Community
dbaplus Community
dbaplus Community
How CockroachDB’s GEO‑Partitioning Solves Hot‑Cold Data Separation at Scale

1. CockroachDB version history and features

Since 1.0 (May 2017) CockroachDB has released five major versions. 1.x added distributed query, online schema change, rolling upgrades, import and dump. 2.x introduced GEO‑Partitioning, backup/restore, new data types, cost‑based optimizer (CBO), change‑data‑capture (CDC). Later releases (e.g., 19.1) added CBO‑based optimization, static encryption, read‑from‑follower and read‑from‑learner.

2. Partitioning concept

What is partitioning?

Partitioning is a table‑level feature that lets users define how rows are distributed across nodes. Example: a student table (id, name, email, city, graduation_date) can be split into logical partitions stored on different node groups.

Without partitioning, separate tables on separate clusters would be required, increasing operational complexity and breaking uniqueness guarantees. CockroachDB stores data in ranges that are automatically rebalanced; defining partitions directs data to specific node groups while the application sees a single logical table.

How partitioning works

Ranges are replicated on nodes according to Raft. When a partition rule moves rows (e.g., id ≤ 2497) to nodes 1‑3, the corresponding ranges are replicated on those nodes and Raft synchronizes the replicas. After the move, replicas on other nodes are removed, achieving transparent data migration. Migration speed is configurable and invisible to the application.

Difference from traditional databases

Traditional databases implement partitioning by creating physical sub‑tables. CockroachDB’s approach uses directed rebalancing; no sub‑tables are created.

Advantages and limitations

Low cost : Changing a partition only updates metadata; data movement is handled by the rebalancer.

Simple operation : A single ALTER statement adjusts partitions.

High flexibility : Nodes can be labeled with region, datacenter, rack, and storage type, enabling multi‑dimensional placement.

All data remains in a single logical table, so per‑partition schema changes are not possible.

Partition keys must be a prefix of the primary key; only primary‑key columns can be used.

Individual partitions cannot be exported or imported directly.

3. GEO‑Partitioning practical example

Scenario: a PB‑scale dataset is split into hot (recent) data stored on SSDs and cold (older) data stored on SATA HDDs. A time‑based partition (e.g., before 2019‑07‑01) is defined and each partition is bound to a storage class.

Cluster setup: three‑node cluster (node1‑3) where each node has both an SSD store and an HDD store. The locality label defines region and datacenter; the store label defines the disk type.

Table creation steps:

CREATE DATABASE demo;
CREATE TABLE students (
    id               INT,
    name             STRING,
    email            STRING,
    city             STRING,
    graduation_date  DATE,
    PRIMARY KEY (graduation_date, id)   -- partition key must be primary‑key prefix
) PARTITION BY LIST (graduation_date) (
    PARTITION graduated VALUES LESS THAN ('2019-07-01'),
    PARTITION current   VALUES FROM ('2019-07-01') TO MAXVALUE
);

Assign storage zones to partitions:

ALTER PARTITION graduated OF TABLE students
    CONFIGURE ZONE USING
        constraints = '[+store=ssd]';

ALTER PARTITION current OF TABLE students
    CONFIGURE ZONE USING
        constraints = '[+store=hdd]';

Extending to a three‑region, multi‑data‑center deployment adds three nodes per region, labels them with region=dc1, dc2, dc3 and appropriate storage types. The primary key is changed to (city, graduation_date, id) and partitions are created per city (north, south, east) and per time range (graduated/current).

Zone configurations are updated to enforce both storage‑type and region replication constraints, e.g., two replicas per region, one per data center. This yields a globally distributed cluster where reads are routed to the nearest leaseholder and writes succeed after a majority of replicas acknowledge.

Because CockroachDB uses Raft, the cluster can tolerate loss of a single data center while still serving reads and writes, albeit with increased cross‑region latency until the failed replicas recover.

4. Summary of partitioning capabilities

Partitioning in CockroachDB enables hot‑cold data separation, multi‑region deployment, and dynamic scaling without manual data movement. Advantages include low operational cost, simple ALTER -based management, and flexible node labeling. Limitations are the requirement that partition keys be primary‑key prefixes and the inability to export/import individual partitions. Recent releases add “read‑from‑follower” and “read‑from‑learner” to reduce leaseholder read pressure.

5. Technical Q&A

Q1: Are there open‑source online physical backup tools for CockroachDB?

A1: Full backups can be performed with cockroach dump; incremental backup tools are not yet open‑source.

Q2: At what data size should an enterprise consider CockroachDB?

A2: Workloads in the terabyte range, or single tables of several hundred gigabytes, are suitable.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

CockroachDBmulti-regionHot/Cold DataGEO-Partitioning
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.