Databases 15 min read

How Tiered Storage Cuts Data Warehouse Costs: Strategies and Best Practices

This article explains how tiered storage in cloud‑native data warehouses like AnalyticDB MySQL reduces storage costs while maintaining performance by separating hot and cold data, automating migration, and optimizing cold‑data access through caching and archiving techniques.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
How Tiered Storage Cuts Data Warehouse Costs: Strategies and Best Practices

Background

According to IDC's "Data Age 2025" report, global data generation will grow from 33ZB in 2018 to 175ZB in 2025, about 491EB per day. As data volume rises, storage cost becomes a major part of IT budgets. Storing 1PB on high‑performance media versus low‑cost media can differ by an order of magnitude, so enterprises must tier data based on access frequency to balance cost and performance.

What Is Tiered Storage

Tiered storage separates data into hot (frequently accessed) and cold (infrequently accessed) layers. Hot data resides on high‑performance, high‑cost media with limited capacity, while cold data uses low‑cost, high‑capacity media. Over time hot data “cools” and is migrated to the cold layer automatically.

Challenges for Data‑Warehouse Tiered Storage

Selecting appropriate storage media that meet performance, cost, reliability, scalability, and operational simplicity.

Defining business‑level hot and cold data boundaries.

Automatically migrating data as its temperature changes.

Accelerating access to cold data for compliance or analytical queries.

Key Technologies in AnalyticDB MySQL

AnalyticDB MySQL (ADB) implements tiered storage through a three‑layer architecture: access layer (SQL parsing, optimization, scheduling), compute engine layer (query execution), and storage engine layer (sharded data with multiple replicas).

Choosing Storage Media

Hot data is stored on SSDs for high IOPS and bandwidth, with multi‑replica protection. Cold data is stored in Alibaba Cloud Object Storage Service (OSS), offering low cost, high durability (99.9999999999% durability) and virtually unlimited capacity.

Defining Hot/Cold Data

Users specify a storage_policy when creating a table: 'HOT' for all‑SSD tables, 'COLD' for OSS tables, and 'MIXED' for tables that combine both. Example:

Create table t1(
 id int,
 dt datetime
) distribute by hash(id)
storage_policy = 'HOT';
Create table t2(
 id int,
 dt datetime
) distribute by hash(id)
storage_policy = 'COLD';
Create table t3(
 id int,
 dt datetime
) distribute by hash(id)
partition by value(date_format(dt,'%Y%m%d'))
 lifecycle 365
storage_policy = 'MIXED' hot_partition_count = 7;

Modifying Storage Policies

Policies can be altered as workload changes, e.g.,

Alter table t1 storage_policy = 'COLD';
Alter table t3 storage_policy = 'MIXED' hot_partition_count = 14;

Automatic Hot‑Cold Migration

ADB defines a hot‑partition window (hot_partition_count). Partitions outside the window are considered cold and are migrated to OSS automatically. Example: with hot_partition_count = 3, the latest three days of a log table stay hot; older partitions become cold.

Create table Event_log (
 event_id bigint,
 dt datetime,
 event varchar
) distribute by hash(event_id)
partition by value(date_format(dt,'%Y%m%d')) lifecycle 365
storage_policy = 'MIXED' hot_partition_count = 3;

Cold‑Data Access Optimization

Since OSS access incurs higher latency and limited bandwidth, ADB archives files per partition into a single archive with a POSIX‑like interface, caching metadata in memory and SSD. This reduces metadata fetches by hundreds of times.

Data reads also benefit from an SSD cache with multi‑granularity blocks, metadata pre‑warming, lock‑free LRU‑like replacement, and automatic I/O merging, dramatically improving query performance on cold data.

Conclusion

Tiered storage in cloud‑native data warehouses like AnalyticDB MySQL balances cost and performance by defining hot/cold policies, using hot‑partition windows, file archiving, and SSD caching to address hot‑cold definition, migration, and cold‑data access challenges.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Cost Optimizationtiered storagecold hot dataAnalyticDB MySQL
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.