How We Cut MaxCompute Costs Using Information Schema Insights
This article details how a fast‑growing HR SaaS company analyzed MaxCompute billing spikes, identified five key cost drivers, leveraged tenant‑level Information Schema to extract task metadata, applied SQL‑based cost formulas, and implemented targeted optimizations that stabilized their cloud data‑processing expenses.
RenliJia, an HR SaaS startup backed by Alibaba DingTalk and RenliWo, provides human‑resource management, payroll, social security, and value‑added services, serving e‑commerce and retail clients. Their data‑warehouse team faces challenges of stability, accuracy, and timely response while meeting internal CRM data needs and optimizing compute costs.
Because MaxCompute is billed on a pay‑as‑you‑go basis, large compute tasks and QuickBI report queries cause monthly cost fluctuations that exceed expectations, and high‑cost SQL statements and frequent report accesses are hard to detect promptly.
Specific Cause Analysis
The cost volatility stems from five main issues:
High‑cost single SQL queries due to large time ranges or massive base tables.
Unreasonable partitioning, such as querying three‑year partitions.
High report access frequency with varying filter criteria, leading to repeated executions.
Dimension tables added for report compatibility, creating large join queries.
Long execution times for certain compute jobs and reports.
Analyzing with Information Schema
MaxCompute's Information Schema offers metadata and usage history. The tenant‑level schema aggregates all projects under a single SYSTEM_CATALOG, providing read‑only views of project metadata. The Information_Schema.TASKS_HISTORY table records daily task execution time, cost, and count, which is the primary focus for cost analysis.
SQL Script for Data Extraction
set odps.namespace.schema=true;
set odps.sql.decimal.odps2=true;
create table if not exists ads_project_cost_pay_di (
env_type string comment '环境类型',
cost_type string comment '消费类型',
inst_id string comment '唯一id,作业id',
owner_name string comment '作业所属人',
task_type string comment '作业类型 SQL:SQL作业 CUPID:Spark或Mars作业 SQLCost:SQL预估作业 SQLRT:查询加速SQL作业 LOT:MapReduce作业 PS:PAI的Parameter Server AlgoTask:机器学习作业',
input_records string comment '作业输入的records数目',
output_records string comment '作业输出的records数目',
input_bytes string comment '实际扫描的数据量,与Logview相同。',
output_bytes string comment '输出字节数。',
status string comment '数据采集瞬间的运行状态(非实时状态)。包含以下状态:Terminated:作业已执行结束。Failed:作业失败。Cancelled:作业被取消。',
cost_pay DECIMAL(18,5) comment '费用 单位元',
complexity string comment '任务复杂度',
settings string comment '上层调度或用户传入的信息,以JSON格式存储。包含字段:USERAGENT、BIZID、SKYNET_ID和SKYNET_NODENAME。',
sql_script string comment 'sql 代码',
start_time string comment '开始时间',
end_time string comment '结束时间',
data_collection string comment 'quickbi数据集'
) comment 'odps 费用 明细' partitioned by (ds string comment '分区');
insert overwrite table ads_project_cost_pay_di partition(ds=${bizdate})
select case when task_catalog = 'renlijia_ng' then '生产'
when task_catalog = 'renlijia_ng_dev' then '测试'
else task_catalog end as env_type,
if(regexp_count(settings,'quickbi')>0,'quickbi',task_catalog) cost_type,
inst_id,
owner_name,
task_type,
input_records,
output_records,
input_bytes,
output_bytes,
status,
nvl(case when task_type = 'SQL' then cast(input_bytes/1024/1024/1024 * complexity * 0.3 as DECIMAL(18,5))
when task_type = 'SQLRT' then cast(input_bytes/1024/1024/1024 * complexity * 0.3 as DECIMAL(18,5))
when task_type = 'CUPID' and status='Terminated' then cast(cost_cpu/100/3600 * 0.66 as DECIMAL(18,5))
else 0 end,0) cost_pay,
complexity,
settings,
operation_text sql_script,
start_time,
end_time,
regexp_extract(operation_text,'(?<=quickbi=).*?(?==quickbi)',0) data_collection
from SYSTEM_CATALOG.INFORMATION_SCHEMA.TASKS_HISTORY where ds=${bizdate};The official cost calculation formula is:
case
when task_type = 'SQL' then cast(input_bytes/1024/1024/1024 * complexity * 0.3 as DECIMAL(18,5))
when task_type = 'SQLRT' then cast(input_bytes/1024/1024/1024 * complexity * 0.3 as DECIMAL(18,5))
when task_type = 'CUPID' and status='Terminated' then cast(cost_cpu/100/3600 * 0.66 as DECIMAL(18,5))
else 0
end;Cost Comparison Before and After Governance
Report Output Details
Analysis focuses on environment, dataset, and user dimensions. QuickBI datasets are derived from the sql_script field using a regular expression, and a custom field is added to capture the dataset name.
Steps to extract the dataset name:
‘quickbi=xxx数据集=quickbi’ as 数据集自定义字段 regexp_extract(operation_text,'(?<=quickbi=).*?(?==quickbi)',0)Improvement Actions
Replace tables or datasets with unreasonable partitions.
Process dimension tables upstream to minimize downstream dependencies, ideally querying a single table.
Optimize storage size and reduce the number of QuickBI dashboards for high‑frequency datasets.
Shorten report generation time.
In summary, by leveraging MaxCompute's tenant‑level Information Schema to retrieve daily job metadata, the company successfully reduced its MaxCompute costs to a stable, reasonable range.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Big Data AI Platform
The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
