10 Advanced SQL Concepts Every Data Scientist Should Master
This guide walks through ten essential advanced SQL techniques—including CTEs, recursive CTEs, temporary functions, CASE‑WHEN pivots, EXCEPT vs NOT IN, self‑joins, ranking functions, delta calculations with LAG/LEAD, cumulative sums, and date‑time manipulation—to help data professionals ace interview challenges and write cleaner, more powerful queries.
1. Common Table Expressions (CTEs)
CTEs let you break complex sub‑queries into reusable temporary result sets, making queries easier to read and maintain. Example:
SELECT
name,
salary
FROM
People
WHERE
NAME IN (SELECT DISTINCT NAME FROM population WHERE country = "Canada" AND city = "Toronto")
AND salary >= (
SELECT AVG(salary)
FROM salaries
WHERE gender = "Female"
);A recursive version demonstrates how to chain CTEs:
WITH toronto_ppl AS (
SELECT DISTINCT name FROM population WHERE country = "Canada" AND city = "Toronto"
), avg_female_salary AS (
SELECT AVG(salary) AS avgSalary FROM salaries WHERE gender = "Female"
)
SELECT name, salary
FROM People
WHERE name IN (SELECT name FROM toronto_ppl)
AND salary >= (SELECT avgSalary FROM avg_female_salary);2. Recursive CTEs
Recursive CTEs reference themselves, similar to recursive functions in programming languages, and are ideal for traversing hierarchical data such as org charts or file systems.
Anchor member – returns the base rows.
Recursive member – joins the CTE to itself to produce the next level.
Termination condition – stops recursion.
Example that retrieves each employee’s manager ID:
WITH org_structure AS (
SELECT id, manager_id FROM staff_members WHERE manager_id IS NULL
UNION ALL
SELECT sm.id, sm.manager_id
FROM staff_members sm
INNER JOIN org_structure os ON os.id = sm.manager_id
);3. Temporary Functions
Temporary (inline) functions let you encapsulate reusable logic within a query, improving readability and avoiding repetition.
SELECT name,
CASE WHEN tenure < 1 THEN "analyst"
WHEN tenure BETWEEN 1 AND 3 THEN "associate"
WHEN tenure BETWEEN 3 AND 5 THEN "senior"
WHEN tenure > 5 THEN "vp"
ELSE "n/a"
END AS seniority
FROM employees;Using a temporary function:
CREATE TEMPORARY FUNCTION get_seniority(tenure INT64) AS (
CASE WHEN tenure < 1 THEN "analyst"
WHEN tenure BETWEEN 1 AND 3 THEN "associate"
WHEN tenure BETWEEN 3 AND 5 THEN "senior"
WHEN tenure > 5 THEN "vp"
ELSE "n/a"
END
);
SELECT name, get_seniority(tenure) AS seniority FROM employees;4. Pivoting Data with CASE WHEN
CASE WHEN can be used to transform rows into columns. Example: turning a month column into separate revenue columns for each month.
-- Input table
+----+--------+-------+
| id | revenue| month |
+----+--------+-------+
| 1 | 8000 | Jan |
| 2 | 9000 | Jan |
| 3 | 10000 | Feb |
| 1 | 7000 | Feb |
| 1 | 6000 | Mar |
+----+--------+-------+
-- Pivot query
SELECT
id,
MAX(CASE WHEN month = 'Jan' THEN revenue END) AS Jan_Revenue,
MAX(CASE WHEN month = 'Feb' THEN revenue END) AS Feb_Revenue,
MAX(CASE WHEN month = 'Mar' THEN revenue END) AS Mar_Revenue
FROM sales
GROUP BY id;5. EXCEPT vs NOT IN
Both operators compare two result sets, but EXCEPT removes duplicates and returns rows present in the first query but not the second, while NOT IN checks for non‑membership on a per‑row basis and can behave differently with NULLs.
6. Self‑Join
A self‑join links a table to itself, useful when hierarchical relationships are stored in a single table.
Example: find employees whose salary exceeds their manager’s salary.
SELECT a.Name AS Employee
FROM Employee a
JOIN Employee b ON a.ManagerID = b.Id
WHERE a.Salary > b.Salary;7. Rank, Dense_Rank, and Row_Number
These window functions assign ranking numbers to rows based on an ordering column.
SELECT Name,
GPA,
ROW_NUMBER() OVER (ORDER BY GPA DESC) AS row_num,
RANK() OVER (ORDER BY GPA DESC) AS rank,
DENSE_RANK() OVER (ORDER BY GPA DESC) AS dense_rank
FROM student_grades;8. Calculating Deltas with LAG/LEAD
LAG and LEAD let you compare a row’s value with a previous or next row, useful for month‑over‑month or year‑over‑year differences.
# Compare each month’s sales to the previous month
SELECT month,
sales,
sales - LAG(sales, 1) OVER (ORDER BY month) AS month_delta
FROM monthly_sales;
# Compare each month’s sales to the same month last year
SELECT month,
sales,
sales - LAG(sales, 12) OVER (ORDER BY month) AS year_delta
FROM monthly_sales;9. Cumulative Totals
Use the SUM window function to compute running totals.
SELECT Month,
Revenue,
SUM(Revenue) OVER (ORDER BY Month) AS Cumulative
FROM monthly_revenue;10. Date‑Time Manipulation
Common functions for handling dates include EXTRACT, DATE_ADD, DATE_SUB, and DATE_TRUNC.
Example: find days where temperature is higher than the previous day.
SELECT a.Id
FROM Weather a, Weather b
WHERE a.Temperature > b.Temperature
AND DATEDIFF(a.RecordDate, b.RecordDate) = 1;Source: towardsdatascience.com/ten-advanced-sql-concepts-you-should-know-for-data-science-interviews-4d7015ec74b0
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Liangxu Linux
Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
