10 Advanced MySQL Query Techniques Every Data Engineer Should Know
This article presents ten essential advanced MySQL concepts—including CTEs, recursive CTEs, temporary functions, CASE‑WHEN pivots, EXCEPT vs NOT IN, self‑joins, ranking window functions, delta calculations, running totals, and date‑time manipulation—each explained with clear examples and practical SQL snippets.
As data volumes grow, the demand for professionals fluent in SQL continues to rise, especially at the intermediate and advanced levels. Drawing on insights from Stratascratch founder Nathan Rosidi, the author lists ten crucial MySQL concepts for data‑science interviews.
1. Common Table Expressions (CTEs)
CTEs let you break complex queries into modular, temporary result sets, similar to dividing an article into sections. They simplify nested subqueries and improve readability.
SELECT name, salary
FROM People
WHERE name IN (
SELECT DISTINCT name FROM population
WHERE country = "Canada" AND city = "Toronto"
)
AND salary >= (
SELECT AVG(salary) FROM salaries WHERE gender = "Female"
);A CTE version makes the same logic clearer:
WITH toronto_ppl AS (
SELECT DISTINCT name FROM population WHERE country = "Canada" AND city = "Toronto"
), avg_female_salary AS (
SELECT AVG(salary) AS avgSalary FROM salaries WHERE gender = "Female"
)
SELECT name, salary
FROM People
WHERE name IN (SELECT name FROM toronto_ppl)
AND salary >= (SELECT avgSalary FROM avg_female_salary);2. Recursive CTEs
Recursive CTEs reference themselves, enabling hierarchical queries such as organizational charts or file‑system trees. They consist of three parts: an anchor query, a recursive member, and a termination condition.
WITH org_structure AS (
SELECT id, manager_id FROM staff_members WHERE manager_id IS NULL
UNION ALL
SELECT sm.id, sm.manager_id
FROM staff_members sm
INNER JOIN org_structure os ON os.id = sm.manager_id
)3. Temporary Functions
Temporary functions let you encapsulate reusable logic within a query, keeping code clean and avoiding repetition.
SELECT name,
CASE WHEN tenure < 1 THEN "analyst"
WHEN tenure BETWEEN 1 AND 3 THEN "associate"
WHEN tenure BETWEEN 3 AND 5 THEN "senior"
WHEN tenure > 5 THEN "vp"
ELSE "n/a"
END AS seniority
FROM employees;Using a temporary function the same logic becomes:
CREATE TEMPORARY FUNCTION get_seniority(tenure INT64) AS (
CASE WHEN tenure < 1 THEN "analyst"
WHEN tenure BETWEEN 1 AND 3 THEN "associate"
WHEN tenure BETWEEN 3 AND 5 THEN "senior"
WHEN tenure > 5 THEN "vp"
ELSE "n/a"
END
);
SELECT name, get_seniority(tenure) AS seniority FROM employees;4. CASE WHEN Pivoting Data
Beyond conditional logic, CASE WHEN can pivot rows into columns. For a table with monthly revenue rows, the query produces one column per month.
-- Input
+----+----------+-------+
| id | revenue | month |
+----+----------+-------+
| 1 | 8000 | Jan |
| 2 | 9000 | Jan |
| 3 | 10000 | Feb |
| 1 | 7000 | Feb |
| 1 | 6000 | Mar |
+----+----------+-------+
-- Output
+----+------------+------------+------------+
| id | Jan_Revenue| Feb_Revenue| Mar_Revenue|
+----+------------+------------+------------+
| 1 | 8000 | 7000 | 6000 |
| 2 | 9000 | NULL | NULL |
| 3 | NULL | 10000 | NULL |
+----+------------+------------+------------+5. EXCEPT vs NOT IN
Both operators compare rows between two queries, but they differ subtly. EXCEPT removes duplicates and returns rows present in the first query but not the second, while NOT IN filters rows where a column value does not appear in a subquery result. They also behave differently when column counts differ.
6. Self‑Join
A self‑join links a table to itself, useful when hierarchical relationships are stored in a single table. Example: find employees whose salary exceeds that of their manager.
SELECT a.Name AS Employee
FROM Employee AS a
JOIN Employee AS b ON a.ManagerID = b.Id
WHERE a.Salary > b.Salary;7. Rank vs Dense_Rank vs Row_Number
These window functions assign ranking numbers to rows. ROW_NUMBER() gives a unique sequential number, RANK() gives the same number to ties and leaves gaps, and DENSE_RANK() gives the same number to ties without gaps.
SELECT Name, GPA,
ROW_NUMBER() OVER (ORDER BY GPA DESC) AS row_num,
RANK() OVER (ORDER BY GPA DESC) AS rnk,
DENSE_RANK() OVER (ORDER BY GPA DESC) AS dense_rnk
FROM student_grades;In the accompanying image, the difference between the three functions is illustrated: Daniel receives rank 3 with DENSE_RANK but rank 4 with RANK.
8. Calculating Delta Values
To compare values across periods, use LAG() or LEAD(). Examples:
# Compare each month's sales to the previous month
SELECT month, sales,
sales - LAG(sales, 1) OVER (ORDER BY month) AS month_over_month
FROM monthly_sales;
# Compare each month's sales to the same month last year
SELECT month, sales,
sales - LAG(sales, 12) OVER (ORDER BY month) AS year_over_year
FROM monthly_sales;9. Running Totals
Windowed SUM() computes cumulative totals.
SELECT Month,
Revenue,
SUM(Revenue) OVER (ORDER BY Month) AS Cumulative
FROM monthly_revenue;10. Date/Time Manipulation
Common functions include DATE_ADD, DATE_SUB, and DATE_TRUNC. Example: find dates where the temperature is higher than the previous day.
SELECT a.Id
FROM Weather a, Weather b
WHERE a.Temperature > b.Temperature
AND DATEDIFF(a.RecordDate, b.RecordDate) = 1;These ten techniques equip SQL practitioners with the tools needed for complex data‑analysis tasks and interview challenges.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Programmer XiaoFu
xiaofucode.com – a programmer learning guide driven by the pursuit of profit
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
