Databases 7 min read

Is JOIN Faster Than IN in MySQL? A Practical Performance Comparison

The article experimentally compares MySQL JOIN and IN queries across small, medium, and large data sets, showing that JOIN generally outperforms IN for modest data, while IN suffers from long‑list limitations and higher overhead as data grows.

IoT Full-Stack Technology
IoT Full-Stack Technology
IoT Full-Stack Technology
Is JOIN Faster Than IN in MySQL? A Practical Performance Comparison

1. Table Structure

The test uses two tables:

CREATE TABLE `user` (
  `id` int NOT NULL AUTO_INCREMENT,
  `name` varchar(64) NOT NULL COMMENT '姓名',
  `gender` smallint DEFAULT NULL COMMENT '性别',
  `mobile` varchar(11) NOT NULL COMMENT '手机号',
  `create_time` datetime NOT NULL COMMENT '创建时间',
  PRIMARY KEY (`id`),
  UNIQUE KEY `mobile` (`mobile`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=1005 DEFAULT CHARSET=utf8mb4;

CREATE TABLE `order` (
  `id` int unsigned NOT NULL AUTO_INCREMENT,
  `price` decimal(18,2) NOT NULL,
  `user_id` int NOT NULL,
  `product_id` int NOT NULL,
  `status` smallint NOT NULL DEFAULT '0' COMMENT '订单状态',
  PRIMARY KEY (`id`),
  KEY `user_id` (`user_id`),
  KEY `product_id` (`product_id`)
) ENGINE=InnoDB AUTO_INCREMENT=202 DEFAULT CHARSET=utf8mb4;

2. Small‑Data Test (1 000 users, 100 orders)

Three approaches are evaluated:

JOIN query:

EXPLAIN FORMAT=JSON SELECT order.id, price, user.`name`
FROM `order` JOIN user ON order.user_id = user.id;

Separate queries (first fetch orders, then fetch users with an IN list).

PHP implementation that runs the two queries and merges results in code.

The cost model described is "multi‑table join cost = cost of driving table + (rows from driving table × cost of driven table)". The measurements (using ab -n 100 -c 10) show that the JOIN executes noticeably faster than the separate‑query method.

3. Medium‑Data Test (10 000 users, 10 000 orders)

Running the same three approaches on a larger dataset still shows the JOIN to be faster; the cost gap between JOIN and separate queries does not widen dramatically.

4. Large‑Data Test (100 000 users, 1 000 000 orders)

When the tables grow further, the JOIN’s absolute cost increases, but it remains faster than the separate‑query method. The IN‑based query, however, produces an extremely long SQL statement that may exceed the server’s limit, causing execution failures. The article notes that very large IN lists often need to be split into multiple statements, which adds overhead and breaks pagination.

5. Conclusion

Overall, for small result sets a JOIN is both cheaper and quicker. As data volume grows, the JOIN’s cost rises but it still tends to be faster than issuing separate queries, while the IN approach becomes impractical due to statement length limits and additional looping cost in application code. The author suggests that neither method is ideal for massive data sets and invites readers to propose better solutions.

PerformanceSQLMySQLPHPJOININ
IoT Full-Stack Technology
Written by

IoT Full-Stack Technology

Dedicated to sharing IoT cloud services, embedded systems, and mobile client technology, with no spam ads.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.