
# Introduction
INNER JOIN and LEFT JOIN deal with most SQL queries. A smaller class of issues wants different be a part of sorts: counting set-returning operate outcomes row by row, filtering rows by existence in one other desk, and returning rows that haven’t any match in one other desk.
Three less-common joins deal with these cleanly. LATERAL joins let a subquery within the FROM clause reference columns from earlier in the identical FROM clause. Semi joins return rows the place a match exists in one other desk, with out duplicating these rows. Anti joins return rows the place no match exists.
Let’s discover find out how to apply these patterns in apply.

# LATERAL Joins
A LATERAL subquery within the FROM clause can reference columns from previous tables in the identical FROM clause. With out LATERAL, a subquery in FROM is evaluated independently and can’t see these columns.
This issues most when calling a set-returning operate (one which returns a number of rows per enter). Set-returning features will be referred to as within the SELECT record, however to use them row-by-row to a column from an outer desk contained in the FROM clause, LATERAL is required.
Widespread circumstances:
- Calling
unnest()on an array column to get one row per array factor - Calling
regexp_matches()with the'g'flag to extract each match per row - Computing a top-N-per-group consequence with a correlated subquery in FROM
- Splitting JSON arrays per row
// Instance: Counting Phrase Occurrences
This Google query asks us to depend what number of occasions the phrases “bull” and “bear” seem in a contents column. Matches have to be case-insensitive, and substrings like bullish or bearing ought to be excluded.
Information: the google_file_store desk is:
| filename | contents |
|---|---|
| draft1.txt | The inventory change predicts a bull market which might make many traders comfortable. |
| draft2.txt | The inventory change predicts a bull market… however analysts warn… we’re awaiting a bear market. |
| ultimate.txt | The inventory change predicts a bull market… a bear market. As all the time predicting the long run market is unsure… |
Code: regexp_matches() returns one row per match. To run it as soon as per row of google_file_store and depend all matches throughout the desk, we put it within the FROM clause with LATERAL. The m and M anchors are PostgreSQL phrase boundaries, which is what excludes “bullish” and “bearing”.
SELECT 'bull' AS phrase,
COUNT(*) AS nentry
FROM google_file_store,
LATERAL regexp_matches(LOWER(contents), 'm(bull)M', 'g')
UNION ALL
SELECT 'bear' AS phrase,
COUNT(*) AS nentry
FROM google_file_store,
LATERAL regexp_matches(LOWER(contents), 'm(bear)M', 'g');
// Output
| phrase | nentry |
|---|---|
| bull | 3 |
| bear | 2 |
# Semi Joins
A semi be a part of returns rows from the left desk the place a minimum of one match exists in the best desk, with every left-table row showing at most as soon as. INNER JOIN duplicates left-table rows when the best aspect has a number of matches. Semi joins don’t.
Two SQL implementations:
WHERE EXISTS (SELECT 1 FROM ...)WHERE col IN (SELECT col FROM ...)
EXISTS is the extra normal kind as a result of it handles multi-column be a part of circumstances and correlated subqueries with out rewriting the question.
// Instance: Discovering Excessive-Worth Clients
This query asks us to seek out prospects who’ve positioned a minimum of one order over $100 and return their buyer ID and title.
Information: Previews of online_store_customers and online_store_orders:
| customer_id | customer_name |
|---|---|
| 1 | Alice Johnson |
| 2 | Bob Smith |
| 3 | Carol Williams |
| … | … |
| 10 | Jack Anderson |
| order_id | customer_id | quantity | standing |
|---|---|---|---|
| 101 | 1 | 150 | paid |
| 102 | 1 | 200 | paid |
| 103 | 1 | 75 | paid |
| … | … | … | … |
| 115 | 9 | 450 | paid |
Code: The EXISTS subquery checks, per buyer, whether or not any order over $100 exists. SELECT 1 is the conference as a result of EXISTS solely cares whether or not any row comes again, not what’s in it.
SELECT
c.customer_id,
c.customer_name
FROM online_store_customers c
WHERE EXISTS (
SELECT 1
FROM online_store_orders o
WHERE o.customer_id = c.customer_id
AND o.quantity > 100
);
If we used INNER JOIN as a substitute, buyer 1 would seem twice within the consequence as a result of two orders match. EXISTS returns buyer 1 as soon as.
// Output
| customer_id | customer_name |
|---|---|
| 1 | Alice Johnson |
| 2 | Bob Smith |
| 3 | Carol Williams |
| … | … |
| 9 | Ivy Taylor |
# Anti Joins
An anti be a part of returns rows from the left desk the place no match exists in the best desk. It’s the inverse of a semi be a part of.
Two SQL implementations:
LEFT JOIN ... WHERE right_table.col IS NULLWHERE NOT EXISTS (SELECT 1 FROM ...)
Each produce the identical consequence. NOT EXISTS usually produces a greater question plan in trendy PostgreSQL variations and reads extra immediately. The LEFT JOIN + IS NULL sample is older and helpful once you additionally want columns from the best aspect for non-matching rows.
// Instance: Free Customers With No April Calls
This query asks us to return free customers who didn’t make any calls in April 2020.
Information: Previews of rc_calls and rc_users:
| user_id | call_id | call_date |
|---|---|---|
| 1218 | 0 | 2020-04-19 01:06:00 |
| 1554 | 1 | 2020-03-01 16:51:00 |
| 1857 | 2 | 2020-03-29 07:06:00 |
| 1525 | 3 | 2020-03-07 02:01:00 |
| … | … | … |
| 1910 | 39 | 2020-03-11 08:33:00 |
| user_id | standing | company_id |
|---|---|---|
| 1218 | free | 1 |
| 1554 | inactive | 1 |
| 1857 | free | 2 |
| … | … | … |
| 1884 | free | 1 |
Code: The date filter sits within the ON clause, not WHERE. That distinction is what makes this an anti be a part of. Placing the date filter in WHERE would drop rows the place the LEFT JOIN produced NULLs, collapsing it again to an INNER JOIN. With the filter in ON, free customers with no qualifying April name nonetheless produce a row, with NULLs on the best aspect, and the IS NULL test retains solely these rows.
SELECT DISTINCT u.user_id
FROM rc_users u
LEFT JOIN rc_calls c
ON u.user_id = c.user_id
AND c.call_date BETWEEN '2020-04-01' AND '2020-04-30'
WHERE u.standing="free"
AND c.user_id IS NULL;
// Output
# Conclusion

These three joins clear up circumstances the place INNER JOIN and LEFT JOIN are awkward or improper:
- LATERAL is the best way to name set-returning features row by row inside FROM.
- EXISTS offers you “rows with a match” with out the duplication that INNER JOIN causes.
- NOT EXISTS or LEFT JOIN + IS NULL offers you “rows with no match” cleanly.
The sample to recollect is brief. When INNER JOIN duplicates rows you don’t need, use EXISTS. Once you want rows that haven’t any match, use NOT EXISTS or LEFT JOIN + IS NULL. When a subquery in FROM must reference columns from an outer desk, add LATERAL.
Observe these on actual SQL interview questions, and the syntax turns into computerized.
Nate Rosidi is a knowledge scientist and in product technique. He is additionally an adjunct professor instructing analytics, and is the founding father of StrataScratch, a platform serving to knowledge scientists put together for his or her interviews with actual interview questions from high corporations. Nate writes on the most recent traits within the profession market, offers interview recommendation, shares knowledge science initiatives, and covers every part SQL.
