What Happens Behind the Scenes When a SQL Query Runs in a Big Data Platform?
This article walks through the end‑to‑end lifecycle of a SQL task in a big‑data environment, covering creation, scheduling metadata, instance generation, resource allocation, ODPS execution, and final processing on the Fuxi distributed engine.
As a newcomer to big data development, the author explores the complete lifecycle of a SQL task, from creation in the DataPhin IDE to execution on the ODPS platform.
1. Overall Process
The article begins with a flow diagram illustrating the end‑to‑end steps of a SQL job, using a simple example that counts prize distribution per activity.
2. Task Development and Deployment
In the IDE, a new offline periodic task is created on the DataPhin development page, and the following SQL is written:
SELECT prize_id,
COUNT(*) AS prize_send_cnt_1d
FROM apcdm.dwd_ap_mkt_eqt_send_di
WHERE dt = '${bizdate}'
AND prize_id IN ('PZ169328936', 'PZ169298703')
GROUP BY prize_id;After writing the SQL, scheduling metadata such as task ID, name, node type, and owner are configured, along with parameters like business date, cron expression, and dependency information (see table).
Basic Information
Task node ID, name, type, owner
Scheduling Parameters
Biz date (previous day), schedule time, etc.
Scheduling Attributes
Instance generation, type, effective dates, retry policy, period, cron expression
Dependencies
Upstream/downstream node relationships
Node Context
Input and output parameters
Execution Info
Engine and resource group
The task is then submitted, optionally passing a smoke test in the development environment before publishing.
3. Instance Generation
At the scheduled time (e.g., 22:00), the Phoenix scheduler compiles task definitions into executable instances, builds a DAG based on lineage and time dependencies, and resolves cron expressions.
4. Resource Allocation
The Alisa execution engine allocates slots from resource groups to the task. Gateways submit jobs to ODPS, and slot management ensures priority for critical business and multi‑tenant fairness.
5. ODPS Job Execution
Submitted jobs enter ODPS’s control layer (Worker, Scheduler, Executor). The Scheduler creates instances, breaks them into tasks, and places them in a priority queue. Executors poll the queue, receive tasks, and perform SQL parsing, logical and physical planning.
6. Physical Execution on Fuxi
The physical plan is transformed into a DAG of Fuxi tasks. The Fuxi Master schedules these tasks on agents, which launch worker processes that read data, perform computation, and write results back.
When all workers finish, the result is written back, the Application Master reports completion, resources are released, and the task status is updated to SUCCESS, ready for the next scheduling cycle.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
