Databases 11 min read

Where Does GROUP BY Fall Short? Using Window Functions to Bridge Advanced Data Analysis Gaps

This article reviews MySQL's GROUP BY fundamentals, highlights its limitations and execution order, then demonstrates how window functions—including ROW_NUMBER, RANK, DENSE_RANK, cumulative totals, moving averages, and TOP N techniques—can overcome those gaps for richer analytical queries.

Senior Xiao Ying
Senior Xiao Ying
Senior Xiao Ying
Where Does GROUP BY Fall Short? Using Window Functions to Bridge Advanced Data Analysis Gaps

GROUP BY Basics and HAVING

GROUP BY groups rows and applies aggregate functions such as SUM, COUNT, and AVG. Basic syntax:

SELECT col1, col2, AGG_FUNC(col3)
FROM table_name
GROUP BY col1, col2;

Multi‑level grouping can compute department‑wise statistics, e.g., employee count, average salary, and total salary, ordered by average salary descending.

HAVING for Post‑Group Filtering

HAVING filters grouped results after aggregation. Example:

SELECT department,
       COUNT(*) AS employee_count,
       AVG(salary) AS avg_salary,
       MIN(salary) AS min_salary,
       MAX(salary) AS max_salary
FROM employees
WHERE hire_date > '2020-01-01'
GROUP BY department
HAVING avg_salary > 5000
   AND employee_count > 3
   AND MAX(salary) < 20000
ORDER BY avg_salary DESC;

Execution order: WHERE → GROUP BY → aggregate functions → HAVING.

Window Functions Overview

Window functions compute over a set of rows without collapsing them, preserving the original row count. Basic syntax:

function_name() OVER (
    [PARTITION BY col1, col2, ...]
    [ORDER BY col1 [ASC|DESC], col2, ...]
    [frame_clause]
);

Ranking Functions Comparison

ROW_NUMBER()

Assigns a unique sequential number within each partition, ordered by a specified column.

SELECT employee_id, name, department, salary,
       ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS salary_rank
FROM employees;

RANK()

Gives the same rank to ties and skips subsequent ranks.

SELECT employee_id, name, department, salary,
       RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS salary_rank
FROM employees;

DENSE_RANK()

Gives the same rank to ties but does not skip ranks.

SELECT employee_id, name, department, salary,
       DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS salary_rank
FROM employees;

Comparative Example

SELECT employee_id, name, salary,
       ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num,
       RANK()        OVER (ORDER BY salary DESC) AS rank_num,
       DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank_num
FROM employees
WHERE department = 'IT'
ORDER BY salary DESC;

Result shows how the three functions differ for identical salary values.

Cumulative Calculations and Moving Averages

Running Total

SELECT salesperson_id, sale_month, monthly_amount,
       SUM(monthly_amount) OVER (
           PARTITION BY salesperson_id
           ORDER BY sale_month
           ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
       ) AS cumulative_amount
FROM monthly_sales
ORDER BY salesperson_id, sale_month;

The window frame accumulates sales from the first month up to the current row.

7‑Day Moving Average

SELECT sale_date, daily_amount,
       AVG(daily_amount) OVER (
           ORDER BY sale_date
           ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
       ) AS moving_avg_7days,
       AVG(daily_amount) OVER (
           ORDER BY sale_date
           RANGE BETWEEN INTERVAL 6 DAY PRECEDING AND CURRENT ROW
       ) AS moving_avg_7days_range
FROM daily_sales
ORDER BY sale_date;
ROWS

counts rows; RANGE uses actual date intervals.

Window Frame Variants

SELECT sale_date, daily_amount,
       AVG(daily_amount) OVER (ORDER BY sale_date ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS avg_3days,
       SUM(daily_amount) OVER (ORDER BY sale_date ROWS UNBOUNDED PRECEDING) AS cumulative_total,
       MIN(daily_amount) OVER (ORDER BY sale_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS min_last_3days
FROM daily_sales;

Solving TOP N Within Groups

ROW_NUMBER for Strict TOP N

WITH ranked_employees AS (
    SELECT employee_id, name, department, salary,
           ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS salary_rank
    FROM employees
)
SELECT *
FROM ranked_employees
WHERE salary_rank <= 3
ORDER BY department, salary_rank;

DENSE_RANK for Ties

WITH ranked_employees AS (
    SELECT employee_id, name, department, salary,
           DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS salary_rank
    FROM employees
)
SELECT *
FROM ranked_employees
WHERE salary_rank <= 3
ORDER BY department, salary_rank, employee_id;

Complex TOP N: Monthly Top‑3 Products

WITH monthly_product_sales AS (
    SELECT DATE_FORMAT(order_date, '%Y-%m') AS sale_month,
           product_id, product_name,
           SUM(amount) AS monthly_amount,
           COUNT(DISTINCT order_id) AS order_count,
           ROW_NUMBER() OVER (
               PARTITION BY DATE_FORMAT(order_date, '%Y-%m')
               ORDER BY SUM(amount) DESC
           ) AS monthly_rank
    FROM sales
    WHERE order_date >= '2023-01-01'
    GROUP BY DATE_FORMAT(order_date, '%Y-%m'), product_id, product_name
)
SELECT sale_month, product_id, product_name, monthly_amount, order_count, monthly_rank
FROM monthly_product_sales
WHERE monthly_rank <= 3
ORDER BY sale_month DESC, monthly_rank;

Advanced: Highest and Lowest Salaries per Department

WITH extreme_salaries AS (
    SELECT employee_id, name, department, salary,
           ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS highest_rank,
           ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary ASC)  AS lowest_rank
    FROM employees
    WHERE salary IS NOT NULL
)
SELECT employee_id, name, department, salary,
       CASE WHEN highest_rank <= 2 THEN '最高工资'
            WHEN lowest_rank  <= 2 THEN '最低工资'
       END AS salary_type
FROM extreme_salaries
WHERE highest_rank <= 2 OR lowest_rank <= 2
ORDER BY department, salary_type, salary DESC;

Key Takeaways

Execution Order : FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY.

When to Use GROUP BY vs. Window Functions : GROUP BY reduces rows; window functions keep the original row count for richer analysis.

Choosing Ranking Functions : ROW_NUMBER for unique sequencing, RANK for gaps on ties, DENSE_RANK for continuous ties.

Performance Consideration : Use appropriate indexes to avoid full‑table scans.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

SQLRankingMySQLWindow FunctionsGROUP BYTOP N
Senior Xiao Ying
Written by

Senior Xiao Ying

Dedicated to sharing Java backend technical experience and original tutorials, offering career transition advice and resume editing. Recognized as a rising star in CSDN's Java backend community and ranked Top 3 in the 2022 New Star Program for Java backend.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.