Performance Comparison of IN vs JOIN in MySQL Queries
This article experimentally compares the efficiency of using IN clauses versus JOIN operations in MySQL by testing small, medium, and large datasets, measuring query costs, execution times, and highlighting the scalability limits of IN statements.
The article investigates whether using IN or JOIN is more efficient for retrieving related data from MySQL tables, using a simple schema with a user table and an order table.
It first presents the table definitions:
CREATE TABLE `user` (
`id` int NOT NULL AUTO_INCREMENT,
`name` varchar(64) NOT NULL COMMENT '姓名',
`gender` smallint DEFAULT NULL COMMENT '性别',
`mobile` varchar(11) NOT NULL COMMENT '手机号',
`create_time` datetime NOT NULL COMMENT '创建时间',
PRIMARY KEY (`id`),
UNIQUE KEY `mobile` (`mobile`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=1005 DEFAULT CHARSET=utf8mb4;
CREATE TABLE `order` (
`id` int unsigned NOT NULL AUTO_INCREMENT,
`price` decimal(18,2) NOT NULL,
`user_id` int NOT NULL,
`product_id` int NOT NULL,
`status` smallint NOT NULL DEFAULT '0' COMMENT '订单状态',
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`),
KEY `product_id` (`product_id`)
) ENGINE=InnoDB AUTO_INCREMENT=202 DEFAULT CHARSET=utf8mb4;Three experimental scenarios are executed:
Small data set (1,000 users, 100 orders)
Medium data set (10,000 users, 10,000 orders)
Large data set (100,000 users, 1,000,000 orders)
For each scenario the author runs:
A JOIN query (e.g., SELECT o.id, o.price, u.name FROM `order` o JOIN user u ON o.user_id = u.id; )
A separate‑query approach using IN (first fetch order rows, then fetch users with WHERE id IN (...) )
PHP scripts that issue the queries and measure response time with ab -n 100 -c 10
Sample PHP code for the IN method:
$mysqli = new mysqli('127.0.0.1','root','root','test');
$result = $mysqli->query('select `id`,price,user_id from `order`');
$orders = $result->fetch_all(MYSQLI_ASSOC);
$userIds = implode(',', array_column($orders, 'user_id'));
$result = $mysqli->query("select `id`,`name` from `user` where id in ({$userIds})");
$users = $result->fetch_all(MYSQLI_ASSOC);
// combine results ...
$mysqli->close();Sample PHP code for the JOIN method:
$mysqli = new mysqli('127.0.0.1','root','root','test');
$result = $mysqli->query('select o.id, price, u.name from `order` o join user u on o.user_id = u.id;');
$orders = $result->fetch_all(MYSQLI_ASSOC);
var_dump($orders);
$mysqli->close();Results show that with a small dataset the JOIN is cheaper and faster. With medium data the JOIN still outperforms the IN approach, though the cost gap narrows. With the large dataset the IN query often fails because the generated SQL string becomes excessively long, while the JOIN continues to execute more quickly despite higher raw cost.
The author concludes that JOIN generally provides better performance and scalability, while IN should be avoided for large result sets due to SQL length limits and increased application‑level processing overhead.
Architect's Guide
Dedicated to sharing programmer-architect skills—Java backend, system, microservice, and distributed architectures—to help you become a senior architect.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.