How MaxCompute’s Resource Advisor Cuts Costs by 60% for Large-Scale Data Workloads
This article details how GoTerra migrated from BigQuery to MaxCompute and used Resource Advisor, tiered quota strategies, and the TopN Fair scheduler to dynamically balance performance and cost across dozens of accounts and hundreds of quota groups, achieving up to 60% cost reduction.
GoTerra, a leading Southeast Asian internet group, migrated its data platform from Google BigQuery to Alibaba Cloud MaxCompute. The migration involved more than 10 accounts, 70+ projects, and over 100 quota groups, requiring fine‑grained resource management to balance performance with cost.
Key Challenges
Complex multi‑business resource coordination: large scale across many accounts and projects, with each quota group needing reserved compute units (CU) that can lead to idle resources and cost pressure.
Billing model differences: MaxCompute uses a prepaid CU model combined with time‑based elastic resources, making it hard to predict required CU without historical data.
Conflicting resource demands: long‑running ETL jobs need high throughput, while short‑latency BI jobs require quick, temporary resources, creating scheduling contention.
Resource Advisor and Tiered Quota Strategy
Resource Advisor introduces a layered configuration that mixes prepaid, scheduled elastic, and automatic elastic quotas, enabling flexible, cost‑effective resource allocation.
Flexible composition : any combination of prepaid, time‑based elastic and auto‑elastic resources to meet diverse workload needs.
Extreme cost efficiency : the auto‑elastic portion is billed by actual usage, cheaper than reserved‑capacity models.
Out‑of‑the‑box : load‑aware automatic scaling with simple configuration.
Second‑level elasticity : MaxCompute can scale within seconds, far faster than BigQuery’s scaling windows.
Resource stability : historical data and predictive models guide scheduling to keep elastic inventory stable.
Intelligent Resource Recommendation
The Resource Advisor tool predicts next‑day CU demand using historical job logs, CPU/memory consumption, and job type (ETL or BI) SLA requirements.
Data collection: runtime, CPU, memory, concurrency metrics.
Job classification model automatically distinguishes ETL from BI jobs.
Linear regression provides a baseline demand; a 10‑20% elastic buffer is added to handle load spikes.
Daily feedback loop compares actual consumption with predictions to refine model parameters.
TopN Fair Scheduling
To handle mixed‑load scenarios where long‑running ETL jobs coexist with short‑latency BI jobs, a new TopN Fair strategy was introduced.
Design goals : ensure a minimum concurrency for long jobs ( JobMinimumConcurrency), allow quota groups to lend spare capacity to short jobs, and prioritize based on job type and submission time.
Key parameter : JobMinimumConcurrency – the minimum number of concurrent units each job must receive (e.g., 10).
TopN Fair policy : jobs are ordered by submission time; the first N jobs that can each obtain at least JobMinimumConcurrency resources are allocated quota group capacity.
Dynamic N calculation ensures the cumulative demand of the selected jobs does not exceed the total quota capacity multiplied by the concurrency factor, preventing a few jobs from monopolizing resources.
Results
Applying Resource Advisor and TopN Fair yielded significant improvements:
Overall cluster job count decreased by 15.7% and 95th‑percentile latency dropped 45.7%.
For well‑tuned quota groups, job count fell 31.3% and latency fell 75.4%.
Monthly cost was reduced to roughly 40% of the original BigQuery expense.
Conclusion and Outlook
With the AutoScaleQuota product now GA, MaxCompute can automatically adjust quotas based on real‑time business load, eliminating manual intervention and solving resource shortages during traffic spikes. Future work will analyze quota‑group job execution patterns, auto‑configure JobMinimumConcurrency, and dynamically switch scheduling strategies to further boost utilization while targeting 99.99% availability.
Alibaba Cloud Big Data AI Platform
The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
