Big Data 6 min read

How Simple Parameter Tweaks Slash MaxCompute Job Costs by Half

This article walks through three real‑world MaxCompute optimization cases—tuning UNION handling, enabling JSON‑object batch extraction, and removing unnecessary GROUP BY—to dramatically reduce CPU, memory, and storage usage while improving job stability.

Alibaba Cloud Developer

Apr 3, 2024

How Simple Parameter Tweaks Slash MaxCompute Job Costs by Half

Introduction

MaxCompute optimization is a diverse and important process; understanding ODPS’s inner workings is essential for pinpointing performance issues and applying targeted improvements. The following daily optimization cases illustrate simple yet effective solutions.

1. Optimizing Multiple UNIONs

Task: After UNION ALL of several tables, a distmapjoin is performed on some of them. The job frequently ran out of memory, and the usual fix of increasing memory to 8192 MB and adding shards was only a temporary measure.

Analysis of the execution graph revealed an inefficient plan where each UNION ALL caused four data distributions, leading to multiple distmapjoin dispatches.

By disabling the UNION split optimization, the execution graph was simplified from 30 steps to 21.

set odps.optimizer.union.split.enable=false;

The result was a significant reduction in CU and memory consumption, and the job became much more stable.

2. JSON Parsing Parameter Optimization

A large‑scale job (1.4 trillion rows) ran for an hour, with the project1 operator consuming 64% of the time due to nine calls to get_json_object. Enabling the new JSON UDF batch mode reduced the operator’s share to 34%.

set odps.sql.udf.getjsonobj.new=true;

After the change, average instance runtime dropped from 15 minutes to 7 minutes, and both CU and memory usage fell by nearly half.

3. Removing an Unnecessary GROUP BY

A colleague faced high resource usage on a job with massive data expansion followed by a GROUP BY that barely changed the data size. By rewriting the SQL to eliminate the GROUP BY and adding a local sort, storage decreased dramatically and downstream read costs were cut.

The optimization also highlighted that when a task only performs simple map operations, it may be better replaced by a view.

Conclusion

Performance tuning in ODPS is a core skill for data developers; thoughtful analysis often leads to simple parameter adjustments that cut costs and make the work more enjoyable.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

PerformanceOptimization SQL MaxCompute ODPS UNION DistMapJoin JSONUDF

Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.