Optimizing Large-Scale Data Queries with Multithreaded Java and MySQL Indexing
This article demonstrates how to accelerate a one‑million‑row database query by adding indexes and using a multithreaded Java Spring controller with MyBatis, showing code examples, performance tests, and suggestions for further optimization.
Business Requirement
During an interview the HR asked: "Our company does data analysis and needs to query one million rows from the database for each analysis without using pagination. How would you optimize the SQL or Java code?"
A plain query takes more than five minutes, so the solution uses indexes combined with multithreading.
Database Design
Define Database Fields
Generate one million records.
Add indexes to the database.
I'm not very familiar with index optimization; any suggestions are welcome.
Code Implementation
Java Implementation
Controller class implementation:
package com.neu.controller;
import com.neu.mapper.UserMapper;
import com.neu.pojo.User;
import org.springframework.stereotype.Controller;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.servlet.ModelAndView;
import javax.annotation.Resource;
import java.util.*;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
/**
* User query with multithreading controller
* @author 薄荷蓝柠
* @since 2023/6/6
*/
@Controller
public class ExecutorUtils {
@Resource
private UserMapper userMapper;
// Maximum records processed per thread
private static final int THREAD_COUNT_SIZE = 5000;
@RequestMapping("Executor")
public List<User> executeThreadPool() {
// total record count
Integer integer = userMapper.UserSum();
long start = System.currentTimeMillis();
List<User> threadList = new ArrayList<>(integer);
int round = integer / THREAD_COUNT_SIZE + 1;
Map<Integer, ArrayList> temporaryMap = new HashMap<>(round);
final CountDownLatch count = new CountDownLatch(round);
ExecutorService executor = Executors.newFixedThreadPool(round);
for (int i = 0; i < round; i++) {
int startLen = i * THREAD_COUNT_SIZE;
int k = i + 1;
executor.execute(new Runnable() {
@Override
public void run() {
ArrayList<User> users = userMapper.subList(startLen);
temporaryMap.put(k, users);
System.out.println("正在处理线程【" + k + "】的数据,数据大小为:" + users.size());
count.countDown();
}
});
}
try {
count.await();
long end = System.currentTimeMillis();
System.out.println("100万数据查询耗时:" + (end - start) + "ms");
temporaryMap.keySet().forEach(k -> {
threadList.addAll(temporaryMap.get(k));
});
} catch (Exception e) {
e.printStackTrace();
} finally {
temporaryMap.clear();
executor.shutdown();
}
System.out.println("list长度为:" + threadList.size());
return threadList;
}
}Mapper implementation:
package com.neu.mapper;
import java.util.ArrayList;
import java.util.List;
import org.apache.ibatis.annotations.*;
import com.neu.pojo.User;
@Mapper
public interface UserMapper {
/**
* Retrieve total number of rows in sysuser table
*/
@Select("SELECT count(*) as sum FROM sysuser")
Integer UserSum();
/**
* Retrieve a page of 5000 rows starting from startLen
*/
@Select("select * from sysuser LIMIT #{startLen},5000")
ArrayList<User> subList(@Param("startLen") int startLen);
}After writing the code we run a test:
Result: the query finishes within 20 seconds, much faster than the original approach.
Fuzzy Query
We modify the mapper to support a fuzzy search on the id column:
package com.neu.mapper;
import java.util.ArrayList;
import java.util.List;
import org.apache.ibatis.annotations.*;
import com.neu.pojo.User;
@Mapper
public interface UserMapper {
@Select("SELECT count(*) as sum FROM sysuser where id like concat('%',0,'%')")
Integer UserSum();
@Select("select * from sysuser where id like concat('%',0,'%') LIMIT #{startLen},5000")
ArrayList<User> subList(@Param("startLen") int startLen);
}Testing the fuzzy query yields a runtime of about 5 seconds, satisfying the business requirement.
Conclusion
The basic multithreaded query has been implemented.
Further optimization ideas include:
Improve index design.
Determine the optimal number of rows per thread.
Configure thread pool size based on total rows and desired per‑thread load.
Refactor code to eliminate other performance bottlenecks.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Selected Java Interview Questions
A professional Java tech channel sharing common knowledge to help developers fill gaps. Follow us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
