Databases 11 min read

10 Advanced SQL Concepts Every Data Scientist Should Master

This guide walks through ten essential advanced SQL concepts—including CTEs, recursive queries, temporary functions, CASE‑based pivoting, EXCEPT vs NOT IN, self‑joins, ranking functions, delta calculations, cumulative totals, and date‑time manipulation—providing clear explanations and runnable examples to help data‑science professionals ace interview challenges.

Liangxu Linux
Liangxu Linux
Liangxu Linux
10 Advanced SQL Concepts Every Data Scientist Should Master

1. Common Table Expressions (CTEs)

CTEs let you define a temporary result set that can be referenced within a SELECT, INSERT, UPDATE, or DELETE statement, making complex queries easier to read and maintain.

SELECT
    name,
    salary
FROM
    People
WHERE
    NAME IN (SELECT DISTINCT NAME FROM population WHERE country = "Canada" AND city = "Toronto")
  AND salary >= (
    SELECT AVG(salary)
    FROM salaries
    WHERE gender = "Female")

Using a CTE to break the same logic into reusable parts:

WITH toronto_ppl AS (
    SELECT DISTINCT name
    FROM population
    WHERE country = "Canada" AND city = "Toronto"
), avg_female_salary AS (
    SELECT AVG(salary) AS avgSalary
    FROM salaries
    WHERE gender = "Female"
)
SELECT name, salary
FROM People
WHERE name IN (SELECT name FROM toronto_ppl)
  AND salary >= (SELECT avgSalary FROM avg_female_salary);

2. Recursive CTEs

Recursive CTEs reference themselves, similar to recursive functions in programming languages, and are ideal for traversing hierarchical data such as organizational charts or graph structures.

Anchor member – returns the base rows.

Recursive member – repeatedly joins the CTE to itself.

Termination condition – stops the recursion.

WITH org_structure AS (
    SELECT id, manager_id
    FROM staff_members
    WHERE manager_id IS NULL
    UNION ALL
    SELECT sm.id, sm.manager_id
    FROM staff_members sm
    INNER JOIN org_structure os ON os.id = sm.manager_id
)

3. Temporary Functions

Temporary functions let you encapsulate reusable logic inside a query, improving readability and avoiding repetition.

Break code into smaller, maintainable blocks.

Promote clean, DRY SQL.

Reuse logic similarly to functions in programming languages.

SELECT name,
       CASE WHEN tenure < 1 THEN "analyst"
            WHEN tenure BETWEEN 1 AND 3 THEN "associate"
            WHEN tenure BETWEEN 3 AND 5 THEN "senior"
            WHEN tenure > 5 THEN "vp"
            ELSE "n/a"
       END AS seniority
FROM employees;

Using a temporary function to achieve the same result:

CREATE TEMPORARY FUNCTION get_seniority(tenure INT64) AS (
   CASE WHEN tenure < 1 THEN "analyst"
        WHEN tenure BETWEEN 1 AND 3 THEN "associate"
        WHEN tenure BETWEEN 3 AND 5 THEN "senior"
        WHEN tenure > 5 THEN "vp"
        ELSE "n/a"
   END
);
SELECT name, get_seniority(tenure) AS seniority FROM employees;

4. CASE WHEN for Pivoting Data

CASE WHEN can be used to transform rows into columns, effectively pivoting data without a dedicated PIVOT clause.

Initial table:
+----+----------+-------+
| id | revenue  | month |
+----+----------+-------+
| 1  | 8000     | Jan   |
| 2  | 9000     | Jan   |
| 3  | 10000    | Feb   |
| 1  | 7000     | Feb   |
| 1  | 6000     | Mar   |
+----+----------+-------+

Result table:
+----+------------+------------+------------+-----+------------+
| id | Jan_Revenue| Feb_Revenue| Mar_Revenue| ... | Dec_Revenue|
+----+------------+------------+------------+-----+------------+
| 1  | 8000       | 7000       | 6000       | ... | null       |
| 2  | 9000       | null       | null       | ... | null       |
| 3  | null       | 10000      | null       | ... | null       |
+----+------------+------------+------------+-----+------------+

5. EXCEPT vs NOT IN

Both operators compare two result sets, but they differ subtly: EXCEPT removes duplicate rows and returns rows present in the first query but not in the second, while NOT IN checks for non‑membership on a column‑by‑column basis and can behave differently with NULL values.

6. Self‑Join

A self‑join joins a table to itself, useful when hierarchical relationships are stored in a single table.

+----+-------+--------+-----------+
| Id | Name  | Salary | ManagerId |
+----+-------+--------+-----------+
| 1  | Joe   | 70000  | 3         |
| 2  | Henry | 80000  | 4         |
| 3  | Sam   | 60000  | NULL      |
| 4  | Max   | 90000  | NULL      |
+----+-------+--------+-----------+

SELECT a.Name AS Employee
FROM Employee a
JOIN Employee b ON a.ManagerID = b.Id
WHERE a.Salary > b.Salary;

7. Rank vs Dense_Rank vs Row_Number

These window functions assign ranking numbers to rows based on an ordering criterion, each handling ties differently.

SELECT Name,
       GPA,
       ROW_NUMBER() OVER (ORDER BY GPA DESC) AS row_num,
       RANK()        OVER (ORDER BY GPA DESC) AS rank,
       DENSE_RANK() OVER (ORDER BY GPA DESC) AS dense_rank
FROM student_grades;

8. Calculating Delta Values

Use LAG or LEAD to compare a row with a previous or next row, enabling calculations such as month‑over‑month changes.

# Comparing each month's sales to last month
SELECT month,
       sales,
       sales - LAG(sales, 1) OVER (ORDER BY month) AS delta_month
FROM monthly_sales;

# Comparing each month's sales to the same month last year
SELECT month,
       sales,
       sales - LAG(sales, 12) OVER (ORDER BY month) AS delta_year
FROM monthly_sales;

9. Computing Cumulative Totals

The SUM() window function with an ORDER BY clause produces a running total.

SELECT Month,
       Revenue,
       SUM(Revenue) OVER (ORDER BY Month) AS Cumulative
FROM monthly_revenue;

10. Date‑Time Manipulation

Common functions for handling dates include EXTRACT, DATE_ADD, DATE_SUB, and DATE_TRUNC. They allow grouping, formatting, and calculating differences between dates.

Extract components (year, month, day).

Adjust dates with DATE_ADD / DATE_SUB.

Truncate to a specific granularity with DATE_TRUNC.

Example: Find IDs of days where the temperature was higher than the previous day.

SELECT a.Id
FROM Weather a, Weather b
WHERE a.Temperature > b.Temperature
  AND DATEDIFF(a.RecordDate, b.RecordDate) = 1;
SQLdatabasedata scienceWindow FunctionsCTEAdvanced Queries
Liangxu Linux
Written by

Liangxu Linux

Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.