Is JOIN Faster Than IN in MySQL Queries? A Practical Performance Test
The article experimentally compares MySQL JOIN and IN queries across small, medium, and large data sets, showing that JOIN is generally faster for modest volumes while IN can become impractically long and error‑prone with massive lists, leaving neither approach ideal for all scenarios.
1. Table Setup
Two tables are created: user (id, name, gender, mobile, create_time) and order (id, price, user_id, product_id, status). The CREATE statements are shown below.
CREATE TABLE `user` (
`id` int NOT NULL AUTO_INCREMENT,
`name` varchar(64) NOT NULL COMMENT '姓名',
`gender` smallint DEFAULT NULL COMMENT '性别',
`mobile` varchar(11) NOT NULL COMMENT '手机号',
`create_time` datetime NOT NULL COMMENT '创建时间',
PRIMARY KEY (`id`),
UNIQUE KEY `mobile` (`mobile`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=1005 DEFAULT CHARSET=utf8mb4;
CREATE TABLE `order` (
`id` int unsigned NOT NULL AUTO_INCREMENT,
`price` decimal(18,2) NOT NULL,
`user_id` int NOT NULL,
`product_id` int NOT NULL,
`status` smallint NOT NULL DEFAULT '0' COMMENT '订单状态',
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`),
KEY `product_id` (`product_id`)
) ENGINE=InnoDB AUTO_INCREMENT=202 DEFAULT CHARSET=utf8mb4;2. Small‑Data Experiment
Insert 1,000 random users and 100 random orders. The goal is to retrieve all orders together with the corresponding user name. Three approaches are examined:
JOIN query:
explain format=json select order.id, price, user.`name` from `order` join user on order.user_id = user.id;Separate queries with IN:
select `id`,price,user_id from `order`;
select `id`,`name` from `user` where id in (8,11,20,...,995);PHP script that runs the two queries and measures response time with ab -n 100 -c 10.
The author notes the cost model: Multi‑table join cost = driver‑table cost + (rows fetched from driver) × cost of each driven table . The benchmark shows the JOIN version completes noticeably faster than the IN version.
3. Medium‑Data Experiment
Scale up to 10,000 users and 10,000 orders. The same three steps (JOIN, separate IN, PHP timing) are repeated. Result screenshots (omitted here) indicate that the JOIN query still outperforms the IN approach, although the gap narrows.
4. Large‑Data Experiment
Insert 100,000 users and 1,000,000 orders. The JOIN query runs, but the IN query generates an extremely long IN(...) list that exceeds MySQL’s statement length limit, causing the SQL to fail. The author observes that the JOIN still executes (though slower due to data volume) while the IN approach becomes unusable.
5. Conclusions
When the data set is small enough to fit on a single page, JOIN offers both lower cost and higher speed. For very large data sets, separating the queries can reduce the raw join cost, but the overhead of constructing large IN lists and the risk of SQL length errors make this approach fragile. The author suggests that neither method is universally optimal and invites readers to propose better solutions.
Key observations:
JOIN is consistently faster for small‑to‑medium data.
IN queries become problematic when the list grows large, leading to execution failures.
Code‑level overhead (looping over results) adds latency for the separate‑query approach.
Architect's Guide
Dedicated to sharing programmer-architect skills—Java backend, system, microservice, and distributed architectures—to help you become a senior architect.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
