How Tiered Storage Cuts Data Warehouse Costs: Strategies and Best Practices
This article explains how tiered storage in cloud‑native data warehouses like AnalyticDB MySQL reduces storage costs while maintaining performance by separating hot and cold data, automating migration, and optimizing cold‑data access through caching and archiving techniques.
Background
According to IDC's "Data Age 2025" report, global data generation will grow from 33ZB in 2018 to 175ZB in 2025, about 491EB per day. As data volume rises, storage cost becomes a major part of IT budgets. Storing 1PB on high‑performance media versus low‑cost media can differ by an order of magnitude, so enterprises must tier data based on access frequency to balance cost and performance.
What Is Tiered Storage
Tiered storage separates data into hot (frequently accessed) and cold (infrequently accessed) layers. Hot data resides on high‑performance, high‑cost media with limited capacity, while cold data uses low‑cost, high‑capacity media. Over time hot data “cools” and is migrated to the cold layer automatically.
Challenges for Data‑Warehouse Tiered Storage
Selecting appropriate storage media that meet performance, cost, reliability, scalability, and operational simplicity.
Defining business‑level hot and cold data boundaries.
Automatically migrating data as its temperature changes.
Accelerating access to cold data for compliance or analytical queries.
Key Technologies in AnalyticDB MySQL
AnalyticDB MySQL (ADB) implements tiered storage through a three‑layer architecture: access layer (SQL parsing, optimization, scheduling), compute engine layer (query execution), and storage engine layer (sharded data with multiple replicas).
Choosing Storage Media
Hot data is stored on SSDs for high IOPS and bandwidth, with multi‑replica protection. Cold data is stored in Alibaba Cloud Object Storage Service (OSS), offering low cost, high durability (99.9999999999% durability) and virtually unlimited capacity.
Defining Hot/Cold Data
Users specify a storage_policy when creating a table: 'HOT' for all‑SSD tables, 'COLD' for OSS tables, and 'MIXED' for tables that combine both. Example:
Create table t1(
id int,
dt datetime
) distribute by hash(id)
storage_policy = 'HOT'; Create table t2(
id int,
dt datetime
) distribute by hash(id)
storage_policy = 'COLD'; Create table t3(
id int,
dt datetime
) distribute by hash(id)
partition by value(date_format(dt,'%Y%m%d'))
lifecycle 365
storage_policy = 'MIXED' hot_partition_count = 7;Modifying Storage Policies
Policies can be altered as workload changes, e.g.,
Alter table t1 storage_policy = 'COLD';
Alter table t3 storage_policy = 'MIXED' hot_partition_count = 14;Automatic Hot‑Cold Migration
ADB defines a hot‑partition window (hot_partition_count). Partitions outside the window are considered cold and are migrated to OSS automatically. Example: with hot_partition_count = 3, the latest three days of a log table stay hot; older partitions become cold.
Create table Event_log (
event_id bigint,
dt datetime,
event varchar
) distribute by hash(event_id)
partition by value(date_format(dt,'%Y%m%d')) lifecycle 365
storage_policy = 'MIXED' hot_partition_count = 3;Cold‑Data Access Optimization
Since OSS access incurs higher latency and limited bandwidth, ADB archives files per partition into a single archive with a POSIX‑like interface, caching metadata in memory and SSD. This reduces metadata fetches by hundreds of times.
Data reads also benefit from an SSD cache with multi‑granularity blocks, metadata pre‑warming, lock‑free LRU‑like replacement, and automatic I/O merging, dramatically improving query performance on cold data.
Conclusion
Tiered storage in cloud‑native data warehouses like AnalyticDB MySQL balances cost and performance by defining hot/cold policies, using hot‑partition windows, file archiving, and SSD caching to address hot‑cold definition, migration, and cold‑data access challenges.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
