Master Essential SQL Techniques for Data Analyst Interviews
This article presents a comprehensive guide to essential SQL skills for data‑analysis positions, covering join operations, difference detection, duplicate removal, missing‑value handling, multi‑condition and fuzzy filtering, aggregation, conditional calculations, trimming, cost computation, and includes a collection of practical interview questions with sample queries.
SQL Techniques for Data‑Analysis Interviews
This summary collects a set of practical SQL patterns that are frequently tested in data‑analysis interview questions. The examples use a sample table sale (sales data) and illustrate how to perform joins, set operations, de‑duplication, missing‑value handling, filtering, aggregation, and derived‑column calculations.
1. Join (VLOOKUP‑style lookup)
Create a filtered copy of sale that contains only rows for Guangzhou, then join it back to the original table to retrieve the order number and profit columns.
CREATE TABLE sale_guang AS
SELECT * FROM sale WHERE city = '广州';
SELECT a.*
FROM sale a
INNER JOIN (
SELECT ordernum, profit FROM sale_guang
) b ON a.ordernum = b.ordernum;2. Find rows present in one table but not the other
Identify order numbers that exist in sale but are missing from sale_guang.
SELECT *
FROM sale a
WHERE a.ordernum NOT IN (
SELECT b.ordernum FROM sale_guang b
);3. Remove duplicate salesman codes
Delete rows whose salesman appears more than once. The sub‑query groups by salesman and keeps only those with a count greater than 1.
SELECT *
FROM sale
WHERE salesman NOT IN (
SELECT salesman
FROM sale
GROUP BY salesman
HAVING COUNT(*) > 1
);4. Missing‑value handling
Either replace NULL values in the city column with 0 or delete the rows entirely.
-- Replace NULL with 0
UPDATE sale SET city = 0 WHERE city IS NULL;
-- Delete rows with NULL city
DELETE FROM sale WHERE city IS NULL;5. Multi‑condition filtering
Retrieve orders sold by salesman "张爱" in Beijing with an order amount of at least 6000.
SELECT *
FROM sale
WHERE salesman = '张爱'
AND city = '北京'
AND orderaccount >= 6000;6. Fuzzy (LIKE) filtering
Select rows where the inventory name contains the substring "三星" or "索尼".
SELECT *
FROM sale
WHERE inventoryname LIKE '%三星%'
OR inventoryname LIKE '%索尼%';7. Grouped aggregation
Calculate total profit for each city; the example shows the result for Beijing.
SELECT city, SUM(profit) AS total_profit
FROM sale
WHERE city = '北京'
GROUP BY city;8. Conditional calculation (count, sum, average)
Count orders whose inventory name contains "三星" and whose tax exceeds 1000, then compute the sum and average profit for those orders.
-- Number of matching orders
SELECT COUNT(*)
FROM sale
WHERE inventoryname LIKE '%三星%'
AND tax > 1000;
-- Sum and average profit per order
SELECT ordernum,
SUM(profit) AS profit_sum,
AVG(profit) AS profit_avg
FROM sale
WHERE inventoryname LIKE '%三星%'
AND tax > 1000
GROUP BY ordernum;9. Trim whitespace
Remove leading and trailing spaces from the inventoryname column.
SELECT TRIM(inventoryname) AS inventoryname_trimmed
FROM sale;10. Derived column and sorting
Compute a cost column defined as Nontaxamount - profit and order the result from highest to lowest cost.
SELECT city,
ordernum,
(Nontaxamount - profit) AS cost
FROM sale
ORDER BY cost DESC;Typical Interview‑Style Questions
The following examples are common interview tasks that involve basic SELECT, WHERE, JOIN, GROUP BY and set‑operation clauses.
Student / Score tables
-- 1. List name, sex and class from the Student table
SELECT sname, ssex, class FROM student;
-- 2. Retrieve scores between 60 and 80
SELECT * FROM score WHERE degree BETWEEN 60 AND 80;
-- 3. Average score for classes 95033 and 95031
SELECT class, AVG(degree) AS avg_score
FROM score s
JOIN student st ON s.sno = st.sno
WHERE class IN (95033, 95031)
GROUP BY class;Mobile‑game company schema
-- Create a Student_new table with primary‑key constraint
CREATE TABLE Student_new (
sno VARCHAR(20) PRIMARY KEY,
sname VARCHAR(10),
ssex CHAR(2),
sage INT,
sdept VARCHAR(25)
);
-- Query computer‑science students ordered by student number
SELECT *
FROM student
WHERE sdept = '计算机'
ORDER BY sno;
-- Join three tables to find students with Ccredit = 5 and Grade > 60
SELECT a.sno, a.sname, a.ssex
FROM student a
JOIN (Course b, SC c) ON a.sno = c.sno AND b.cno = c.cno
WHERE b.Ccredit = 5 AND c.Grade > 60;Internet‑finance company set operations
-- Intersection of tables A and B
SELECT a.cus_id
FROM `表a` AS a
INNER JOIN `表b` AS b ON a.cus_id = b.cus_id;
-- Union of tables A and B (duplicates removed)
SELECT * FROM `表a`
UNION
SELECT * FROM `表b`;
-- Symmetric difference (rows in A not in B plus rows in B not in A)
SELECT * FROM `表a`
WHERE cus_id NOT IN (SELECT cus_id FROM `表b`)
UNION
SELECT * FROM `表b`
WHERE cus_id NOT IN (SELECT cus_id FROM `表a`);
-- Rows present in A but absent from B
SELECT * FROM `表a`
WHERE cus_id NOT IN (SELECT cus_id FROM `表b`);These patterns cover the essential SQL operations that interviewers expect candidates to know: table creation, primary‑key definition, filtering, joins, aggregation, set operations, and basic data‑cleansing techniques.
Big Data Tech Team
Focuses on big data, data analysis, data warehousing, data middle platform, data science, Flink, AI and interview experience, side‑hustle earning and career planning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
