Why MySQL Joins Lag Behind PostgreSQL and How to Optimize Multi‑Table Queries
This article examines MySQL’s limited join capabilities compared to PostgreSQL, explains why multi‑table queries over three tables can be inefficient, and explores strategies such as service‑layer joins, query decomposition, and caching to improve performance and scalability in database‑centric applications.
MySQL only supports a single join type—nested‑loop join—while it lacks sort‑merge and hash joins that PostgreSQL provides; consequently, for queries involving more than three tables, MySQL often performs worse than PostgreSQL.
1. Summary
Keeping join depth to three tables or fewer improves efficiency.
A more generic approach better prepares systems for distributed architectures.
Below we briefly explore MySQL’s multi‑table join characteristics.
2. Multi‑Table Joins
Is a MySQL multi‑table join more efficient than executing several single‑table queries?
Consider two tables A and B, each with hundreds of thousands of rows and no indexes. If the join is a Cartesian product, the result set can explode to billions of rows, making network I/O the bottleneck; pulling two 100 k‑row result sets may be far cheaper than pulling a single billion‑row set, so merging in the service layer can be faster.
In real business scenarios, joins usually have conditions and indexes, producing a relatively small intermediate result that is then used to fetch additional data from the other table.
If the join is performed in the service layer, the fastest pattern is: query table A to obtain a small result set (one RPC), build query conditions for table B from that set (second RPC), fetch B’s result set, and finally merge the two results in the service (third RPC).
Using a database join pulls the combined result in a single RPC, saving two RPC calls; the database performs a nested‑loop join, which is common in practice.
However, most businesses still prefer moving this merging logic to the service layer for several reasons:
1. Compute resources on a single‑node database are expensive. The database must serve both reads and writes, consuming CPU; to increase DB throughput while tolerating a few hundred microseconds of latency, businesses shift heavy computation to the service layer and treat the DB mainly as a transactional key‑value store.
2. Many complex systems use multiple databases behind middleware. Cross‑database joins are not possible, so a service layer abstracts the underlying databases and reduces coupling.
3. Large companies often shard data across many physical databases. Cross‑shard joins are limited unless the sharding key guarantees that the two tables reside in the same shard; middleware typically lacks robust cross‑shard join support.
For example, when two tables in different shards need to be updated simultaneously, a distributed transaction with a global lock can severely impact performance. Some applications tolerate brief inconsistency and instead use a scheduled task to reconcile failed updates, performing the merge in the service layer.
Decomposing Join Queries
High‑performance applications often break a join into separate single‑table queries and associate the results in the application.
For instance, the following query:
select * from tag
join tag_post on tag_post.tag_id=tag.id
join post on tag_post.post_id=post.id
where tag.tag='mysql';Can be decomposed into:
Select * from tag where tag='mysql';
Select * from tag_post where tag_id=1234;
Select * from post where id in (123,456,567,9989,8909);Although the original single query is replaced by multiple queries, the returned data remains identical.
Using decomposed joins offers several advantages:
Higher cache efficiency.
Single‑table results are easier to cache; if only one table changes, its cached query can be reused.
Reduced lock contention by executing smaller queries.
Facilitates database sharding and improves scalability.
Potential overall query performance gains.
Eliminates redundant record retrieval.
Effectively implements a hash join in the application, which can be much faster in certain scenarios.
3. Explanation
RPC (Remote Procedure Call) is a method of requesting a service from a program on another computer over a network without needing to understand the underlying network details.
Source: blog.csdn.net/NumberOneStudent/article/details/102776289
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Programmer DD
A tinkering programmer and author of "Spring Cloud Microservices in Action"
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
