Mastering Inceptor UI: Analyze Jobs and Stages for Better Performance
This guide explains how to use Inceptor's management UI—especially the Jobs and Cluster tabs—to monitor execution metrics, interpret stage details, read DAG visualizations, and diagnose performance issues in large‑scale SQL workloads.
Inceptor Management UI Overview
The Inceptor management interface (port 4040) provides a dashboard for monitoring system performance and the execution status of each machine, executor, and task. When analyzing a query, focus on execution time, errors, executor health, ID continuity, and configuration effectiveness.
Top Navigation Tabs
The UI header contains seven tabs: Jobs, Cluster, Local, Storage, Holodesk, Environment, and Executors. This article covers the Jobs and Cluster tabs; other tabs will be described in future posts.
Cluster Tab Details
The Cluster page shows all SQL statements executed in cluster mode, broken down into stages and tasks. Each stage is a basic execution unit composed of multiple tasks.
Key columns displayed for each stage include:
Stage Id : Identifier of the stage.
Description : Full SQL statement for the stage with a link to detailed task information.
Submitted : Submission timestamp.
Duration : How long the stage has run so far.
Tasks Succeeded/Total : Number of successful tasks versus total tasks.
Input : Data read from Hadoop or Spark storage.
Shuffle Read and Shuffle Write : Amount of data shuffled.
Failure Reason : Reason for stage failure, if any.
Stages are categorized into three types:
Active Stages : Currently running stages; their task counts may change in real time.
Completed Stages : Finished stages with stable metrics; task counts equal total tasks.
Failed Stages : Stages that failed, with detailed failure reasons displayed.
Each category’s current count is shown at the top left of the Cluster page.
Jobs Tab Details
The Jobs page aggregates information at the job level (each job consists of multiple stages). It displays six main fields: Job Id, Description, Submitted, Duration, Stages (Succeeded/Total), and Tasks (for all stages) (Succeeded/Total). Clicking the Description link opens a detailed view of all stages belonging to that job.
A DAG visualization button in the top‑left corner renders the job’s directed acyclic graph, illustrating the order of RDD transformations and helping pinpoint performance bottlenecks.
SQL Example and DAG Analysis
The following SQL query is used as a case study. It involves two WITH clauses, map‑joins on date_dim, and several aggregations.
WITH ssci AS(
SELECT /*+MAPJOIN(date_dim)*/ ss_customer_sk AS customer_sk, ss_item_sk AS item_sk
FROM store_sales, date_dim
WHERE ss_sold_date_sk = d_date_sk
AND d_month_seq BETWEEN 1212 AND 1212 + 11
GROUP BY ss_customer_sk, ss_item_sk
),
csci AS(
SELECT /*+MAPJOIN(date_dim)*/ cs_bill_customer_sk AS customer_sk, cs_item_sk AS item_sk
FROM catalog_sales, date_dim
WHERE cs_sold_date_sk = d_date_sk
AND d_month_seq BETWEEN 1212 AND 1212 + 11
GROUP BY cs_bill_customer_sk, cs_item_sk
)
SELECT
SUM(CASE WHEN ssci.customer_sk IS NOT NULL AND csci.customer_sk IS NULL THEN 1 ELSE 0 END) AS store_only,
SUM(CASE WHEN ssci.customer_sk IS NULL AND csci.customer_sk IS NOT NULL THEN 1 ELSE 0 END) AS catalog_only,
SUM(CASE WHEN ssci.customer_sk IS NOT NULL AND csci.customer_sk IS NOT NULL THEN 1 ELSE 0 END) AS store_and_catalog
FROM ssci FULL OUTER JOIN csci ON (ssci.customer_sk = csci.customer_sk AND ssci.item_sk = csci.item_sk)
LIMIT 100;The query is split into two jobs by the UI. The first job prepares date_dim for map‑join; the second job performs the main processing, involving multiple stages and transformations. The DAG view shows six stages (IDs 2861‑2866), each represented by a vertical stack of RDD operations. Stages 7865‑7866 handle map‑joins with store_sales and catalog_sales, stages 7867‑7868 perform reductions, stage 7869 executes a common join, and stage 7864 finalizes the output. This execution order matches the logical flow of the SQL statement.
Practical Tips for Monitoring
On the Cluster page, watch for excessively long stage durations, high failure counts, large shuffle read/write volumes, and abnormal task counts.
On the Jobs page, monitor overall job duration, failure rates, and the distribution of succeeded versus total stages and tasks.
Use the DAG visualization to map stages to specific SQL operators, helping locate performance hotspots.
Future articles will dive deeper into interpreting metric values and applying optimizations based on these observations.
StarRing Big Data Open Lab
Focused on big data technology research, exploring the Big Data era | [email protected]
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
