Big Data 21 min read

How Cold‑Hot Data Separation Boosts Cost Efficiency in Baidu Palo for Apache Doris

This article explains the principles, configuration steps, monitoring metrics, leader selection, data migration granularity, compaction, invalid data cleanup, and cache mechanisms of cold‑hot data separation in Baidu Intelligent Cloud's Palo for Apache Doris, illustrating how tiered storage reduces costs while maintaining query performance.

Baidu Intelligent Cloud Tech Hub

Sep 25, 2024

How Cold‑Hot Data Separation Boosts Cost Efficiency in Baidu Palo for Apache Doris

In real data‑analysis scenarios, hot and cold data often have different query frequencies and response‑time requirements. For example, historical log data is accessed rarely but must be retained for long‑term audit, while recent traffic data needs frequent, low‑latency queries.

Generally, the value of historical data decreases over time, and query demand drops sharply. Storing ever‑growing historical data locally leads to massive resource waste.

Cold‑hot data tiering solves this problem by storing hot and cold data on storage media with different costs. Baidu Intelligent Cloud Data Warehouse Palo 2.0 for Apache Doris provides this feature, moving cold data to object storage to maximize cost efficiency.

Palo, built on the industry‑leading OLAP database Apache Doris, uses a MPP architecture. The article focuses on the usage and implementation principles of the cold‑hot separation feature.

1. How to Use Cold‑Hot Separation

1.1 Main Steps

Add remote storage: create a cooldown policy table and bind the policy.

# Add remote storage, using an object‑storage bucket and AK/SK to create a resource.
CREATE EXTERNAL RESOURCE "baidu_bos_s3"
PROPERTIES(
    "type" = "s3",
    "AWS_ENDPOINT" = "s3.bj.bcebos.com",
    "AWS_REGION" = "bj",
    "AWS_BUCKET" = "${BUCKET}",
    "AWS_ROOT_PATH" = "/palo/storage",
    "AWS_ACCESS_KEY" = "${AWS_ACCESS_KEY}",
    "AWS_SECRET_KEY" = "${AWS_SECRET_KEY}",
    "AWS_MAX_CONNECTIONS" = "50",
    "AWS_REQUEST_TIMEOUT_MS" = "3000",
    "AWS_CONNECTION_TIMEOUT_MS" = "3000"
);
# Create cooldown policy (method 1: set ttl, recommended)
CREATE STORAGE POLICY testPolicy
PROPERTIES(
  "storage_resource" = "baidu_bos_s3",
  "cooldown_ttl" = "5"
);
# Create cooldown policy (method 2: set fixed datetime)
CREATE STORAGE POLICY testPolicy
PROPERTIES(
  "storage_resource" = "baidu_bos_s3",
  "cooldown_datetime" = "2023-06-07 21:00:00"
);
# Bind policy to table (method 1: bind whole table)
CREATE TABLE TestTbl (
    aa BIGINT
) ENGINE=olap
DISTRIBUTED BY HASH (aa) BUCKETS 1
PROPERTIES(
    "replication_num" = "1",
    "storage_policy" = "testPolicy"
);
# Bind policy to partition (method 2, recommended)
ALTER TABLE create_table_partition MODIFY PARTITION (*) SET("storage_policy"="test_policy");

Insert data:

insert into TestTbl values(1);
insert into TestTbl values(2);
insert into TestTbl values(3);
insert into TestTbl values(4);
insert into TestTbl values(5);

After insertion, six data files are generated on the BE.

When data exceeds the ttl, it is cooled down. The key log for cooling is:

grep "Upload rowset" be.INFO

1.2 View Cooling Status

Use show tablets from xxx to view local and remote data sizes for each tablet:

LocalDataSize: size of data files on the BE node

RemoteDataSize: size of data files on remote storage

After cooling, local files are deleted and remote files appear in BOS, including new meta files.

1.3 Monitoring

Doris provides four cold‑hot related metrics that can be configured in Grafana:

Metric

Example

Bytes uploaded to remote storage

doris_be_upload_total_byte

Bytes read from remote storage

doris_be_s3_bytes_read_total

Number of successful rowset uploads

doris_be_upload_rowset_count

Number of failed rowset uploads

doris_be_upload_fail_count

1.4 Enable Cache

Add parameters to conf/be.conf and restart the BE node.

View all BE configuration items at http://Host:HttpPort/varz.

After enabling File Cache, cache hit status can be seen in the query profile.

2. FE Metadata

2.1 Hierarchy

Key concepts:

Partition: Doris supports partitioned tables.

MaterializedIndex: a schema; a rollup is a MaterializedIndex.

Tablet: a data shard, a part of a table.

Replica: a copy of a tablet on a BE node.

Example table definition:

CREATE TABLE `TestTbl` (
  `aa` BIGINT NULL,
  `b` int
) ENGINE=OLAP
DUPLICATE KEY(`aa`)
PARTITION BY RANGE(`aa`)
(
  PARTITION p1 VALUES [("-9223372036854775808"), ("10")],
  PARTITION p2 VALUES [("10"), ("20")]
)
DISTRIBUTED BY HASH(`aa`) BUCKETS 1;
-- Create a materialized view
create materialized view mv_max as select aa, max(bb) from TestTbl group by aa;

2.2 View Hierarchy Information

Use the show proc command to view metadata, similar to Linux’s /proc system.

3. BE Metadata

3.1 Hierarchy

Tablet: a data shard.

Rowset: a version of data; each successful import creates one rowset.

Segment: a data file; a successful import may generate multiple segments (max 256 MB each).

For example, a 10 GB stream load creates 1 rowset and 40 segment files.

3.2 View Hierarchy

Use show tablet to view tablet information, or open the MetaUrl in a browser to see BE hierarchy.

4. Implementation Details

4.1 Leader Selection

Doris randomly selects one replica as the leader for cooling operations (method 2). The leader’s ID and term are stored in BE meta as Cooldownreplicaid and term.

4.2 Cooling Granularity

Doris chooses rowset as the cooling granularity because cooling at the segment level would not guarantee that all data in the segment is eligible, while rowset is the basic unit and handles new writes gracefully.

4.3 Cooling Process

The leader uploads the expired rowset and its meta file to object storage. Followers synchronize by reading the meta file, comparing with local rowsets, and updating if there is overlap. The process handles two cases: no overlap (no local update) and overlap (replace local rowset).

4.3.1 Case 1: No Overlap

When the leader’s cooled rowset does not overlap with follower’s local rowsets, followers only update cooldown_meta_id without changing data.

4.3.2 Case 2: Overlap

If the cooled rowset overlaps, followers delete the overlapping local rowset, copy the remote meta, and update cooldown_meta_id.

4.4 Cold Data Compaction

After ttl expiration, all rowsets on the leader are cooled. Doris 2.0 supports compaction of cooled data in object storage, reducing storage space and improving efficiency.

4.5 Invalid Data Cleanup

When cold data is compacted or partitions are dropped, obsolete files remain in remote storage. Cleanup requires two‑phase confirmation: the leader lists remote rowset files, compares with local versions, and after all replicas have synchronized the same cooldown_meta_id, the obsolete files are deleted.

4.6 Leader Version Lag

If the randomly chosen leader lags behind followers (e.g., some writes failed on the leader), Doris uses clone repair to synchronize missing versions before cooling, ensuring consistency.

4.7 Cold Data Cache Mechanism

Doris 2.0 introduces a File Cache to improve query performance on cold data. Cached blocks are stored on BE disks (not in memory) and managed by LRU. When a query accesses remote data, the system checks the cache, reads missing blocks from remote storage, and writes them to the local cache.

4.7.1 Performance Comparison

On the SSB‑sf500 benchmark, fully cooled data caused a 10× performance drop; enabling a 10 GB cache improved performance threefold.

4.7.2 Data Caching

File Cache splits remote files into 1 MB blocks, stores them locally, and reuses them for subsequent reads, reducing network I/O and benefiting column‑store pre‑read patterns.

4.7.3 Cache Size Tuning

Experiments with 1 GB vs. 50 GB cache sizes showed little difference in performance gain, indicating that a modest cache (e.g., 1/50 of cold data size) is sufficient for most workloads.

4.7.4 Summary

Cache is essential for cold data performance, but its size should be tuned based on workload characteristics rather than maximized blindly.

storage optimization cold data Data Tiering Apache Doris Palo

Written by

Baidu Intelligent Cloud Tech Hub

We share the cloud tech topics you care about. Feel free to leave a message and tell us what you'd like to learn.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.