JOIN vs IN in MySQL: Which Query Strategy Performs Better?
This article experimentally compares MySQL JOIN and IN (separate queries) across small, medium, and large data sets, providing PHP benchmark code, performance results, and practical conclusions on when each approach is more efficient.
Table Structure
User Table
CREATE TABLE `user` (
`id` int NOT NULL AUTO_INCREMENT,
`name` varchar(64) NOT NULL COMMENT '姓名',
`gender` smallint DEFAULT NULL COMMENT '性别',
`mobile` varchar(11) NOT NULL COMMENT '手机号',
`create_time` datetime NOT NULL COMMENT '创建时间',
PRIMARY KEY (`id`),
UNIQUE KEY `mobile` (`mobile`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=1005 DEFAULT CHARSET=utf8mb4;Order Table
CREATE TABLE `order` (
`id` int unsigned NOT NULL AUTO_INCREMENT,
`price` decimal(18,2) NOT NULL,
`user_id` int NOT NULL,
`product_id` int NOT NULL,
`status` smallint NOT NULL DEFAULT '0' COMMENT '订单状态',
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`),
KEY `product_id` (`product_id`)
) ENGINE=InnoDB AUTO_INCREMENT=202 DEFAULT CHARSET=utf8mb4;Experiment 1: Small Data Set
Inserted 1,000 random users and 100 random orders, then retrieved all orders together with their corresponding user information using three methods: a direct JOIN, separate queries with an IN clause, and a PHP script that issues two queries and merges the results.
JOIN
EXPLAIN FORMAT=JSON SELECT o.id, o.price, u.name FROM `order` o JOIN `user` u ON o.user_id = u.id;Separate Queries (IN)
SELECT id, price, user_id FROM `order`;
SELECT name FROM `user` WHERE id IN (8,11,20,...,995);PHP Benchmark (using ab -n 100 -c 10)
$mysqli = new mysqli('127.0.0.1','root','root','test');
$result = $mysqli->query('SELECT `id`,price,user_id FROM `order`');
$orders = $result->fetch_all(MYSQLI_ASSOC);
$userIds = implode(',', array_column($orders,'user_id'));
$result = $mysqli->query("SELECT `id`,`name` FROM `user` WHERE id IN ({$userIds})");
$users = $result->fetch_all(MYSQLI_ASSOC);
// merge results into $res ...
$mysqli->close();The benchmark showed that the JOIN query completed noticeably faster than the two‑step IN approach.
Experiment 2: Medium Data Set (10,000 rows each)
Scaled the tables to 10,000 users and 10,000 orders and repeated the three methods. Images of the query plans and result sets are omitted for brevity, but the same pattern emerged: JOIN remained faster, while the IN‑based approach incurred additional overhead from the large IN list.
Experiment 3: Large Data Set (100,000 users, 1,000,000 orders)
Further increased the data volume to 100 k users and 1 M orders. The JOIN query still outperformed the separate‑query method in raw execution time, but the IN list grew to a size that caused MySQL to reject the statement with a “SQL too long” error, demonstrating a practical limit of the IN approach.
Conclusion
When the dataset is small, JOIN is both cheaper and faster. With larger datasets, the cost of a JOIN rises, but the overhead of building and executing massive IN lists (including potential SQL length limits) often makes the overall solution slower. Neither technique is universally optimal; developers should consider indexing, pagination, or alternative query designs (e.g., subqueries, temporary tables) based on the specific data volume and performance requirements.
Architect's Guide
Dedicated to sharing programmer-architect skills—Java backend, system, microservice, and distributed architectures—to help you become a senior architect.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
