Databases 11 min read

Elegant Techniques for Group‑Then‑Sort and Top‑Record Retrieval in MySQL & Elasticsearch

The article walks through a common database requirement—grouping rows, sorting within each group, and extracting the first (or top N) records—by preparing sample data, comparing MySQL window functions, subqueries, and JOIN solutions, adding index optimizations, and demonstrating equivalent Elasticsearch aggregations, all backed by concrete performance measurements.

Shepherd Advanced Notes
Shepherd Advanced Notes
Shepherd Advanced Notes
Elegant Techniques for Group‑Then‑Sort and Top‑Record Retrieval in MySQL & Elasticsearch

Background

Need to group rows, sort within each group, and fetch the first (or first few) rows per group – a common SQL pattern.

Preparation

Created table tb_user with columns id, name, birthday, etc., and inserted >5 million rows to simulate realistic workload.

MySQL implementations

1. Window function (ROW_NUMBER())

Query uses ROW_NUMBER() OVER (PARTITION BY name ORDER BY id DESC) to rank rows per name and selects rows with rank = 1. Requires MySQL 8.0+.

SELECT * FROM (
  SELECT id, name, birthday,
         ROW_NUMBER() OVER (PARTITION BY name ORDER BY id DESC) AS row_num
  FROM tb_user
  WHERE name IN ('徐千云','李亿石')
) AS u
WHERE u.row_num = 1;

Execution time: 1.547 s

2. Subquery

Find maximum id per name then join back.

SELECT id, name, birthday
FROM tb_user
WHERE id IN (
  SELECT MAX(id)
  FROM tb_user
  WHERE name IN ('徐千云','李亿石')
  GROUP BY name
);

Execution time: 3.687 s

3. JOIN with subquery

SELECT u.id, u.name, u.birthday
FROM tb_user u
INNER JOIN (
  SELECT MAX(id) AS max_id
  FROM tb_user
  WHERE name IN ('徐千云','李亿石')
  GROUP BY name
) t ON u.id = t.max_id;

Execution time: 1.418 s

4. Index optimization

Added index on name:

ALTER TABLE tb_user ADD INDEX idx_name (name) USING BTREE;

After indexing the timings became:

Window function: 0.026 s

Subquery: 2.229 s

JOIN: 0.014 s

When birthday is omitted, the subquery can use a covering index, reducing time to 1.77 s.

Elasticsearch implementation

Used terms aggregation to group by name and a top_hits sub‑aggregation to sort by birthday descending and return the first document per bucket.

GET user_info/_search
{
  "size": 0,
  "query": {
    "bool": {
      "filter": [
        { "terms": { "name": ["徐千云","李亿石"] } }
      ]
    }
  },
  "aggs": {
    "group_by_name": {
      "terms": { "field": "name", "size": 1000 },
      "aggs": {
        "latest_user": {
          "top_hits": {
            "sort": [{ "birthday": { "order": "desc" } }],
            "_source": ["id","name","org_id","birthday"],
            "size": 1
          }
        }
      }
    }
  }
}

The size parameter controls the number of groups returned; for larger datasets a composite aggregation can be used to paginate.

Discussion

Changing the sort order in the window function or the top_hits aggregation retrieves the youngest record per name. The subquery approach may return no rows if the inner query is empty.

Conclusion

For large‑scale queries the window function ROW_NUMBER() is the preferred solution, followed by the subquery method when window functions are unavailable. Adding an index on the grouping column dramatically improves performance. When data volume is massive, Elasticsearch can offload the workload using terms + top_hits aggregations.

Git repository for the supporting SpringBoot starter: https://github.com/plasticene/plasticene-boot-starter-parent

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

performance optimizationSQLElasticsearchMySQLWindow Functions
Shepherd Advanced Notes
Written by

Shepherd Advanced Notes

Dedicated to sharing advanced Java technical insights, daily work snippets, and the power of persistent effort.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.