Databases 14 min read

Understanding LEFT JOIN, Common Pitfalls, and Practical Solutions in SQL

This article explains how LEFT JOIN works in SQL, illustrates why duplicate rows can appear when joining tables, and provides practical techniques such as DISTINCT, GROUP BY, MAX, and IN subqueries to ensure a one‑to‑one result set while also covering RIGHT JOIN, INNER JOIN, update and delete operations, and the concept of Cartesian products.

Top Architect
Top Architect
Top Architect
Understanding LEFT JOIN, Common Pitfalls, and Practical Solutions in SQL

This guide demonstrates the creation of two sample tables ( table1 and table2 ) and populates them with test data using standard CREATE TABLE and INSERT statements.

CREATE TABLE `table1` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(60) DEFAULT NULL,
  `age` varchar(200) DEFAULT NULL,
  `sponsor_id` varchar(20) DEFAULT NULL COMMENT '业务发起人',
  `gmt_create_user` int(11) NOT NULL COMMENT '创建人id',
  `gmt_create` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
  `gmt_modified` datetime DEFAULT NULL COMMENT '修改时间',
  `gmt_modified_user` int(11) DEFAULT NULL COMMENT '修改人id',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=12 DEFAULT CHARSET=utf8mb4 COMMENT='测试表1';

CREATE TABLE `table2` (
  `kid` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(60) DEFAULT NULL,
  `sponsor_id` varchar(20) DEFAULT NULL COMMENT '业务发起人',
  `type` int(11) NOT NULL COMMENT '创建人id',
  `gmt_create` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
  `gmt_modified` datetime DEFAULT NULL COMMENT '修改时间',
  `gmt_modified_user` int(11) DEFAULT NULL COMMENT '修改人id',
  PRIMARY KEY (`kid`)
) ENGINE=InnoDB AUTO_INCREMENT=12 DEFAULT CHARSET=utf8mb4 COMMENT='测试表2';

Data is inserted into both tables with multiple rows, illustrating typical values for IDs, names, ages, sponsor identifiers, and timestamps.

INSERT INTO `table1`(`id`,`name`,`age`,`sponsor_id`,`gmt_create_user`,`gmt_create`,`gmt_modified`,`gmt_modified_user`) VALUES
(1,'t1','11','10',1,'2018-10-10 20:34:03',NULL,NULL),
(2,'t2','12','10',2,'2018-10-10 20:34:03',NULL,NULL),
(3,'t3','13','10',3,'2018-10-10 20:34:03',NULL,NULL),
(4,'t4','14','20',4,'2018-10-10 20:34:03',NULL,NULL);

INSERT INTO `table2`(`kid`,`name`,`sponsor_id`,`type`,`gmt_create`,`gmt_modified`,`gmt_modified_user`) VALUES
(1,'t1','10',1,'2018-10-10 20:38:10',NULL,NULL),
(2,'t2','10',1,'2018-10-10 20:38:10',NULL,NULL),
(3,'t3','10',1,'2018-10-10 20:38:10',NULL,NULL),
(4,'t4','10',1,'2018-10-10 20:38:10',NULL,NULL),
(5,'t5','10',1,'2018-10-10 20:38:10',NULL,NULL),
(6,'t6','10',1,'2018-10-10 20:38:10',NULL,NULL),
(7,'t7','10',2,'2018-10-10 20:38:10',NULL,NULL),
(8,'t1','11',1,'2018-10-10 20:38:10',NULL,NULL);

A problematic query that joins the two tables is shown:

SELECT a.*, b.type
FROM table1 a
LEFT JOIN table2 b ON a.sponsor_id = b.sponsor_id
WHERE b.type = 1 AND a.sponsor_id = 10;

Because table2 can contain multiple rows with the same sponsor_id , the LEFT JOIN may produce a one‑to‑many result, inflating the row count. The article then presents four typical remedies:

DISTINCT : Use SELECT DISTINCT(id) FROM a LEFT JOIN b ON a.id=b.aid or SELECT DISTINCT a.*, b.type … to keep only unique rows.

GROUP BY : Join a sub‑query that groups table2 by sponsor_id so that each sponsor appears only once.

MAX : Select the maximum id (or another unique column) per group and join on that result.

IN sub‑query : Filter the left table using WHERE a.sponsor_id IN (SELECT sponsor_id FROM table2 WHERE type=1 AND sponsor_id=10) , avoiding duplicate matches.

The article also clarifies the semantics of other join types:

-- LEFT JOIN (keeps all rows from the left table)
SELECT * FROM A LEFT JOIN B ON A.aid = B.bid;

-- RIGHT JOIN (keeps all rows from the right table)
SELECT * FROM A RIGHT JOIN B ON A.aid = B.bid;

-- INNER JOIN (keeps only rows that satisfy the join condition)
SELECT * FROM A INNER JOIN B ON A.aid = B.bid;

It demonstrates how UPDATE statements can be combined with LEFT JOIN to modify data in the left table based on matching rows in the right table, and warns that in Access the UPDATE syntax cannot contain a FROM clause.

UPDATE A LEFT JOIN B ON A.aid = B.bid
SET A.aname = B.bname;

UPDATE A LEFT JOIN B ON A.aid = B.bid
SET A.aname = B.bname
WHERE A.aid <> 5;

Similarly, DELETE operations using joins are discussed; Access does not support DELETE with JOIN directly, so a two‑step approach using an IN sub‑query is recommended.

DELETE FROM A
WHERE A.aid IN (SELECT bid FROM B WHERE B.bname="b1991");

The concept of a Cartesian product is explained: joining two tables without a condition multiplies the row counts (e.g., 20 rows × 30 rows = 600 rows), and the various join types are essentially different ways of filtering that product.

Overall, the article provides a comprehensive tutorial on SQL join behavior, common pitfalls when multiple matching rows exist, and practical SQL patterns to enforce a one‑to‑one relationship in query results.

SQLDatabaseQuery OptimizationMySQLjoinLEFT JOIN
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.