Master 10 Advanced SQL Concepts Every Data Scientist Should Know
This article presents ten essential advanced SQL techniques—including CTEs, recursive CTEs, temporary functions, CASE‑WHEN pivots, EXCEPT vs NOT IN, self‑joins, ranking windows, delta calculations, cumulative totals, and date‑time manipulation—each explained with clear examples and code snippets for interview preparation.
1. Common Table Expressions (CTEs)
CTEs let you create a temporary result set that can be referenced later in the query, making complex sub‑queries easier to read and modular. Example query filtering Canadian Toronto residents with a salary above the average female salary:
SELECT name, salary FROM People WHERE NAME IN (SELECT DISTINCT NAME FROM population WHERE country = "Canada" AND city = "Toronto") AND salary >= (SELECT AVG(salary) FROM salaries WHERE gender = "Female");A second example shows how to define multiple CTEs and reference them:
WITH toronto_ppl AS ( SELECT DISTINCT name FROM population WHERE country = "Canada" AND city = "Toronto" ), avg_female_salary AS ( SELECT AVG(salary) AS avgSalary FROM salaries WHERE gender = "Female" ) SELECT name, salary FROM People WHERE name IN (SELECT name FROM toronto_ppl) AND salary >= (SELECT avgSalary FROM avg_female_salary);2. Recursive CTEs
Recursive CTEs reference themselves, similar to recursive functions in programming, and are useful for hierarchical data such as organization charts or file‑system trees. They consist of three parts:
Anchor query – returns the base rows.
Recursive member – repeatedly joins the CTE to itself.
Termination condition – stops recursion.
Example that builds an organizational hierarchy:
WITH org_structure AS ( SELECT id, manager_id FROM staff_members WHERE manager_id IS NULL UNION ALL SELECT sm.id, sm.manager_id FROM staff_members sm INNER JOIN org_structure os ON os.id = sm.manager_id );3. Temporary Functions
Temporary functions let you encapsulate reusable logic, similar to functions in Python, improving readability and avoiding repetition. Example mapping employee tenure to seniority levels using a CASE expression:
SELECT name, CASE WHEN tenure < 1 THEN "analyst" WHEN tenure BETWEEN 1 AND 3 THEN "associate" WHEN tenure BETWEEN 3 AND 5 THEN "senior" WHEN tenure > 5 THEN "vp" ELSE "n/a" END AS seniority FROM employees;The same logic can be defined as a temporary function and reused:
CREATE TEMPORARY FUNCTION get_seniority(tenure INT64) AS ( CASE WHEN tenure < 1 THEN "analyst" WHEN tenure BETWEEN 1 AND 3 THEN "associate" WHEN tenure BETWEEN 3 AND 5 THEN "senior" WHEN tenure > 5 THEN "vp" ELSE "n/a" END ); SELECT name, get_seniority(tenure) AS seniority FROM employees;4. Pivoting with CASE WHEN
CASE WHEN can be used to pivot rows into columns. The example transforms a table with rows for each month into a single row per ID with separate columns for each month’s revenue.
-- Input table (id, revenue, month) and resulting pivoted table shown in the article.5. EXCEPT vs NOT IN
Both operators compare results of two queries, but they differ subtly: EXCEPT removes duplicates and returns rows present in the first query but not the second, while NOT IN checks for non‑membership and can return NULL‑related results. Understanding these nuances avoids logical errors.
6. Self Joins
A self‑join joins a table to itself, useful when hierarchical relationships are stored in a single table. Example finds employees whose salary exceeds that of their manager:
SELECT a.Name AS Employee FROM Employee a JOIN Employee b ON a.ManagerID = b.Id WHERE a.Salary > b.Salary;7. Rank, DenseRank, RowNumber
SQL provides three window functions for ranking rows: ROW_NUMBER() – unique sequential number for each row. RANK() – same rank for ties, leaves gaps. DENSE_RANK() – same rank for ties, no gaps.
Example query on a student grades table:
SELECT Name, GPA, ROW_NUMBER() OVER (ORDER BY GPA DESC), RANK() OVER (ORDER BY GPA DESC), DENSE_RANK() OVER (ORDER BY GPA DESC) FROM student_grades;Illustrative image:
8. Calculating Deltas
Window functions LAG() and LEAD() compute differences between successive rows. Examples show month‑over‑month and year‑over‑year sales delta calculations:
SELECT month, sales, sales - LAG(sales, 1) OVER (ORDER BY month) AS month_delta FROM monthly_sales; SELECT month, sales, sales - LAG(sales, 12) OVER (ORDER BY month) AS yoy_delta FROM monthly_sales;9. Cumulative Totals
The SUM() window function with OVER (ORDER BY …) produces a running total:
SELECT Month, Revenue, SUM(Revenue) OVER (ORDER BY Month) AS Cumulative FROM monthly_revenue;Illustrative image:
10. Date‑Time Manipulation
Common date functions include DATE_ADD, DATE_SUB, and DATE_TRUNC. Example finds days where temperature is higher than the previous day:
SELECT a.Id FROM Weather a, Weather b WHERE a.Temperature > b.Temperature AND DATEDIFF(a.RecordDate, b.RecordDate) = 1;Mastering these ten concepts equips data‑science candidates to tackle a wide range of SQL interview questions with confidence.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
