Backend Development 7 min read

Optimizing Large-Scale Data Queries with Multithreaded Java and MySQL Indexing

This article demonstrates how to accelerate a one‑million‑row database query by adding indexes and using a multithreaded Java Spring controller with MyBatis, showing code examples, performance tests, and suggestions for further optimization.

Selected Java Interview Questions

Aug 21, 2023

Optimizing Large-Scale Data Queries with Multithreaded Java and MySQL Indexing

Business Requirement

During an interview the HR asked: "Our company does data analysis and needs to query one million rows from the database for each analysis without using pagination. How would you optimize the SQL or Java code?"

A plain query takes more than five minutes, so the solution uses indexes combined with multithreading.

Database Design

Define Database Fields

Generate one million records.

Add indexes to the database.

I'm not very familiar with index optimization; any suggestions are welcome.

Code Implementation

Java Implementation

Controller class implementation:

package com.neu.controller;

import com.neu.mapper.UserMapper;
import com.neu.pojo.User;
import org.springframework.stereotype.Controller;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.servlet.ModelAndView;

import javax.annotation.Resource;
import java.util.*;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

/**
 * User query with multithreading controller
 * @author 薄荷蓝柠
 * @since 2023/6/6
 */
@Controller
public class ExecutorUtils {

    @Resource
    private UserMapper userMapper;

    // Maximum records processed per thread
    private static final int THREAD_COUNT_SIZE = 5000;

    @RequestMapping("Executor")
    public List<User> executeThreadPool() {
        // total record count
        Integer integer = userMapper.UserSum();
        long start = System.currentTimeMillis();
        List<User> threadList = new ArrayList<>(integer);
        int round = integer / THREAD_COUNT_SIZE + 1;
        Map<Integer, ArrayList> temporaryMap = new HashMap<>(round);
        final CountDownLatch count = new CountDownLatch(round);
        ExecutorService executor = Executors.newFixedThreadPool(round);
        for (int i = 0; i < round; i++) {
            int startLen = i * THREAD_COUNT_SIZE;
            int k = i + 1;
            executor.execute(new Runnable() {
                @Override
                public void run() {
                    ArrayList<User> users = userMapper.subList(startLen);
                    temporaryMap.put(k, users);
                    System.out.println("正在处理线程【" + k + "】的数据，数据大小为：" + users.size());
                    count.countDown();
                }
            });
        }
        try {
            count.await();
            long end = System.currentTimeMillis();
            System.out.println("100万数据查询耗时:" + (end - start) + "ms");
            temporaryMap.keySet().forEach(k -> {
                threadList.addAll(temporaryMap.get(k));
            });
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            temporaryMap.clear();
            executor.shutdown();
        }
        System.out.println("list长度为：" + threadList.size());
        return threadList;
    }
}

Mapper implementation:

package com.neu.mapper;

import java.util.ArrayList;
import java.util.List;

import org.apache.ibatis.annotations.*;

import com.neu.pojo.User;

@Mapper
public interface UserMapper {

    /**
     * Retrieve total number of rows in sysuser table
     */
    @Select("SELECT count(*) as sum FROM sysuser")
    Integer UserSum();

    /**
     * Retrieve a page of 5000 rows starting from startLen
     */
    @Select("select * from sysuser LIMIT #{startLen},5000")
    ArrayList<User> subList(@Param("startLen") int startLen);
}

After writing the code we run a test:

Result: the query finishes within 20 seconds, much faster than the original approach.

Fuzzy Query

We modify the mapper to support a fuzzy search on the id column:

package com.neu.mapper;

import java.util.ArrayList;
import java.util.List;

import org.apache.ibatis.annotations.*;

import com.neu.pojo.User;

@Mapper
public interface UserMapper {

    @Select("SELECT count(*) as sum FROM sysuser where id like concat('%',0,'%')")
    Integer UserSum();

    @Select("select * from sysuser where id like concat('%',0,'%') LIMIT #{startLen},5000")
    ArrayList<User> subList(@Param("startLen") int startLen);
}

Testing the fuzzy query yields a runtime of about 5 seconds, satisfying the business requirement.

Conclusion

The basic multithreaded query has been implemented.

Further optimization ideas include:

Improve index design.

Determine the optimal number of rows per thread.

Configure thread pool size based on total rows and desired per‑thread load.

Refactor code to eliminate other performance bottlenecks.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java Spring MyBatis multithreading sql-optimization

Written by

Selected Java Interview Questions

A professional Java tech channel sharing common knowledge to help developers fill gaps. Follow us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.