Databases 6 min read

Understanding the Execution Order of SQL Queries

The article explains that SQL queries are not executed in the order written—SELECT is evaluated fifth after FROM, WHERE, GROUP BY, and HAVING—illustrates this logical sequence with diagrams, discusses alias handling, optimizer reordering, and shows comparable patterns in LINQ, pandas, and dplyr.

Java Captain

Oct 28, 2019

Understanding the Execution Order of SQL Queries

Many SQL statements start with SELECT, but the logical execution order is different. The author investigated the order and found that SELECT is actually evaluated fifth, after FROM, WHERE, GROUP BY, and HAVING.

The accompanying diagram clarifies where clauses such as WHERE, GROUP BY, window functions, ORDER BY, and LIMIT fit in the logical sequence, answering common questions like whether you can filter after a window function or use WHERE after GROUP BY.

Database engines may reorder operations for performance, so the strict logical order is not always followed during execution.

Column aliases can be confusing: a query like

SELECT CONCAT(first_name, ' ', last_name) AS full_name, count(*)
FROM table
GROUP BY full_name

appears to use the alias in GROUP BY, but the engine can rewrite it as

SELECT CONCAT(first_name, ' ', last_name) AS full_name, count(*)
FROM table
GROUP BY CONCAT(first_name, ' ', last_name)

showing that GROUP BY is still logically evaluated before SELECT. The engine also performs checks before generating an execution plan.

Optimizations can change the physical order. For example, in

SELECT * FROM owners
LEFT JOIN cats ON owners.id = cats.owner
WHERE cats.name = 'mr darcy'

filtering WHERE cats.name = 'mr darcy' before the left join can avoid processing unnecessary rows, while still producing the same result.

Similar logical ordering appears in other query languages. A LINQ example follows the FROM…WHERE…SELECT pattern:

var teenAgerStudent = from s in studentList
                      where s.Age > 12 && s.Age < 20
                      select s;

Pandas code often mirrors this order, though it is not required:

df = thing1.join(thing2)      # JOIN
df = df[df.created_at > 1000] # WHERE
df = df.groupby('something', num_yes = ('yes', 'sum')) # GROUP BY
df = df[df.num_yes > 2]       # HAVING
df = df[['num_yes', 'something1', 'something']] # SELECT
df.sort_values('sometthing', ascending=True)[:30] # ORDER BY and LIMIT
df[:30]

The article concludes that while the logical order helps understand query semantics, actual execution may differ due to engine optimizations. The original post can be found at the provided link.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Optimization SQL Query Execution Databases Window Functions

Written by

Java Captain

Focused on Java technologies: SSM, the Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading; occasionally covers DevOps tools like Jenkins, Nexus, Docker, ELK; shares practical tech insights and is dedicated to full‑stack Java development.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.