Ten Advanced SQL Concepts Every Data‑Science Interview Candidate Should Master
This article explains ten advanced SQL techniques—including common table expressions, recursive CTEs, temporary functions, CASE‑WHEN pivots, EXCEPT vs NOT IN, self‑joins, ranking functions, delta calculations, cumulative sums, and date‑time manipulation—providing clear explanations and complete query examples to help readers ace data‑science interview questions.
Below is a concise guide to ten advanced SQL concepts that are frequently asked in data‑science interviews.
1. Common Table Expressions (CTEs)
CTEs let you define a temporary result set that can be referenced within a SELECT, INSERT, UPDATE, or DELETE statement, making queries more modular and easier to read.
Example:
SELECT name, salary
FROM People
WHERE NAME IN (
SELECT DISTINCT NAME
FROM population
WHERE country = "Canada" AND city = "Toronto"
)
AND salary >= (
SELECT AVG(salary)
FROM salaries
WHERE gender = "Female"
)The query uses a sub‑query in the WHERE clause; CTEs would simplify this by breaking it into named parts.
with toronto_ppl as (
SELECT DISTINCT name
FROM population
WHERE country = "Canada" AND city = "Toronto"
), avg_female_salary as (
SELECT AVG(salary) as avgSalary
FROM salaries
WHERE gender = "Female"
)
SELECT name, salary
FROM People
WHERE name in (SELECT name FROM toronto_ppl)
AND salary >= (SELECT avgSalary FROM avg_female_salary)CTEs improve readability and allow you to assign meaningful names such as toronto_ppl and avg_female_salary .
2. Recursive CTEs
A recursive CTE references itself, similar to a recursive function in programming, and is useful for hierarchical data such as organization charts or file‑system trees.
Recursive CTEs consist of three parts: an anchor query, a recursive member, and a termination condition.
Example – retrieving each employee’s manager chain:
with org_structure as (
SELECT id, manager_id
FROM staff_members
WHERE manager_id IS NULL
UNION ALL
SELECT sm.id, sm.manager_id
FROM staff_members sm
INNER JOIN org_structure os ON os.id = sm.manager_id
)3. Temporary Functions
Temporary (inline) functions let you encapsulate reusable logic inside a query, improving clarity and avoiding repetition.
Example – mapping tenure to seniority level using a CASE expression:
SELECT name,
CASE WHEN tenure < 1 THEN "analyst"
WHEN tenure BETWEEN 1 AND 3 THEN "associate"
WHEN tenure BETWEEN 3 AND 5 THEN "senior"
WHEN tenure > 5 THEN "vp"
ELSE "n/a"
END AS seniority
FROM employeesCreating a temporary function for the same logic:
CREATE TEMPORARY FUNCTION get_seniority(tenure INT64) AS (
CASE WHEN tenure < 1 THEN "analyst"
WHEN tenure BETWEEN 1 AND 3 THEN "associate"
WHEN tenure BETWEEN 3 AND 5 THEN "senior"
WHEN tenure > 5 THEN "vp"
ELSE "n/a"
END
);
SELECT name, get_seniority(tenure) AS seniority
FROM employees;4. CASE WHEN Pivoting Data
CASE WHEN can also be used to pivot rows into columns. For example, turning a month column into separate revenue columns per month.
-- Input table
+----+---------+-------+
| id | revenue | month |
+----+---------+-------+
| 1 | 8000 | Jan |
| 2 | 9000 | Jan |
| 3 | 10000 | Feb |
| 1 | 7000 | Feb |
| 1 | 6000 | Mar |
+----+---------+-------+
-- Pivot query
SELECT id,
SUM(CASE WHEN month = 'Jan' THEN revenue END) AS Jan_Revenue,
SUM(CASE WHEN month = 'Feb' THEN revenue END) AS Feb_Revenue,
SUM(CASE WHEN month = 'Mar' THEN revenue END) AS Mar_Revenue
FROM sales
GROUP BY id;5. EXCEPT vs NOT IN
Both operators compare two result sets, but they differ in handling duplicates and NULLs. EXCEPT removes duplicates and returns rows present in the first query but not the second, while NOT IN returns rows where the column value is not present in the sub‑query, with special NULL semantics.
6. Self‑Join
A self‑join joins a table to itself, useful for hierarchical relationships such as employees and their managers.
Example – find employees whose salary exceeds that of their manager:
SELECT a.Name AS Employee
FROM Employee a
JOIN Employee b ON a.ManagerID = b.Id
WHERE a.Salary > b.Salary;7. ROW_NUMBER, RANK, DENSE_RANK
These window functions assign ranking numbers to rows based on an ORDER BY clause.
ROW_NUMBER() gives a unique sequential number to each row.
RANK() assigns the same rank to ties and leaves gaps.
DENSE_RANK() also assigns the same rank to ties but without gaps.
SELECT Name, GPA,
ROW_NUMBER() OVER (ORDER BY GPA DESC) AS row_num,
RANK() OVER (ORDER BY GPA DESC) AS rnk,
DENSE_RANK() OVER (ORDER BY GPA DESC) AS dense_rnk
FROM student_grades;8. Calculating Deltas
Use LAG() or LEAD() to compare a value with its previous (or next) row, e.g., month‑over‑month sales change.
# Compare each month’s sales to the previous month
SELECT month,
sales,
sales - LAG(sales, 1) OVER (ORDER BY month) AS delta_month
FROM monthly_sales;
# Compare each month’s sales to the same month last year
SELECT month,
sales,
sales - LAG(sales, 12) OVER (ORDER BY month) AS delta_year
FROM monthly_sales;9. Cumulative Totals
Window SUM() can compute running totals.
SELECT Month,
Revenue,
SUM(Revenue) OVER (ORDER BY Month) AS Cumulative
FROM monthly_revenue;10. Date‑Time Manipulation
Typical interview tasks involve extracting parts of dates or comparing rows to the previous day.
SELECT a.Id
FROM Weather a, Weather b
WHERE a.Temperature > b.Temperature
AND DATEDIFF(a.RecordDate, b.RecordDate) = 1;Mastering these concepts will greatly improve your ability to solve real‑world SQL interview problems.
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.