Thursday, June 18, 2026
HomeArtificial IntelligenceSuperior Be a part of Strategies: LATERAL Joins, Semi Joins, Anti Joins

Superior Be a part of Strategies: LATERAL Joins, Semi Joins, Anti Joins

Superior Be a part of Strategies: LATERAL Joins, Semi Joins, Anti Joins
 

Introduction

 
INNER JOIN and LEFT JOIN deal with most SQL queries. A smaller class of issues wants different be a part of sorts: counting set-returning operate outcomes row by row, filtering rows by existence in one other desk, and returning rows that haven’t any match in one other desk.

Three less-common joins deal with these cleanly. LATERAL joins let a subquery within the FROM clause reference columns from earlier in the identical FROM clause. Semi joins return rows the place a match exists in one other desk, with out duplicating these rows. Anti joins return rows the place no match exists.

Let’s discover find out how to apply these patterns in apply.

 
Advanced Join Techniques

 

LATERAL Joins

 
A LATERAL subquery within the FROM clause can reference columns from previous tables in the identical FROM clause. With out LATERAL, a subquery in FROM is evaluated independently and can’t see these columns.

This issues most when calling a set-returning operate (one which returns a number of rows per enter). Set-returning features will be referred to as within the SELECT record, however to use them row-by-row to a column from an outer desk contained in the FROM clause, LATERAL is required.

Widespread circumstances:

  • Calling unnest() on an array column to get one row per array factor
  • Calling regexp_matches() with the 'g' flag to extract each match per row
  • Computing a top-N-per-group consequence with a correlated subquery in FROM
  • Splitting JSON arrays per row

 

// Instance: Counting Phrase Occurrences

This Google query asks us to depend what number of occasions the phrases “bull” and “bear” seem in a contents column. Matches have to be case-insensitive, and substrings like bullish or bearing ought to be excluded.

Information: the google_file_store desk is:
 

filename contents
draft1.txt The inventory change predicts a bull market which might make many traders comfortable.
draft2.txt The inventory change predicts a bull market… however analysts warn… we’re awaiting a bear market.
ultimate.txt The inventory change predicts a bull market… a bear market. As all the time predicting the long run market is unsure…

 

Code: regexp_matches() returns one row per match. To run it as soon as per row of google_file_store and depend all matches throughout the desk, we put it within the FROM clause with LATERAL. The m and M anchors are PostgreSQL phrase boundaries, which is what excludes “bullish” and “bearing”.

SELECT 'bull' AS phrase,
       COUNT(*) AS nentry
FROM google_file_store,
     LATERAL regexp_matches(LOWER(contents), 'm(bull)M', 'g')
UNION ALL
SELECT 'bear' AS phrase,
       COUNT(*) AS nentry
FROM google_file_store,
     LATERAL regexp_matches(LOWER(contents), 'm(bear)M', 'g');

 

// Output

 

phrase nentry
bull 3
bear 2

 

Semi Joins

 
A semi be a part of returns rows from the left desk the place a minimum of one match exists in the best desk, with every left-table row showing at most as soon as. INNER JOIN duplicates left-table rows when the best aspect has a number of matches. Semi joins don’t.

Two SQL implementations:

  • WHERE EXISTS (SELECT 1 FROM ...)
  • WHERE col IN (SELECT col FROM ...)

EXISTS is the extra normal kind as a result of it handles multi-column be a part of circumstances and correlated subqueries with out rewriting the question.

 

// Instance: Discovering Excessive-Worth Clients

This query asks us to seek out prospects who’ve positioned a minimum of one order over $100 and return their buyer ID and title.

Information: Previews of online_store_customers and online_store_orders:
 

customer_id customer_name
1 Alice Johnson
2 Bob Smith
3 Carol Williams
10 Jack Anderson

 

order_id customer_id quantity standing
101 1 150 paid
102 1 200 paid
103 1 75 paid
115 9 450 paid

 

Code: The EXISTS subquery checks, per buyer, whether or not any order over $100 exists. SELECT 1 is the conference as a result of EXISTS solely cares whether or not any row comes again, not what’s in it.

SELECT
    c.customer_id,
    c.customer_name
FROM online_store_customers c
WHERE EXISTS (
    SELECT 1
    FROM online_store_orders o
    WHERE o.customer_id = c.customer_id
      AND o.quantity > 100
);

 

If we used INNER JOIN as a substitute, buyer 1 would seem twice within the consequence as a result of two orders match. EXISTS returns buyer 1 as soon as.

 

// Output

 

customer_id customer_name
1 Alice Johnson
2 Bob Smith
3 Carol Williams
9 Ivy Taylor

 

Anti Joins

 
An anti be a part of returns rows from the left desk the place no match exists in the best desk. It’s the inverse of a semi be a part of.

Two SQL implementations:

  • LEFT JOIN ... WHERE right_table.col IS NULL
  • WHERE NOT EXISTS (SELECT 1 FROM ...)

Each produce the identical consequence. NOT EXISTS usually produces a greater question plan in trendy PostgreSQL variations and reads extra immediately. The LEFT JOIN + IS NULL sample is older and helpful once you additionally want columns from the best aspect for non-matching rows.

 

// Instance: Free Customers With No April Calls

This query asks us to return free customers who didn’t make any calls in April 2020.

Information: Previews of rc_calls and rc_users:
 

user_id call_id call_date
1218 0 2020-04-19 01:06:00
1554 1 2020-03-01 16:51:00
1857 2 2020-03-29 07:06:00
1525 3 2020-03-07 02:01:00
1910 39 2020-03-11 08:33:00

 

user_id standing company_id
1218 free 1
1554 inactive 1
1857 free 2
1884 free 1

 

Code: The date filter sits within the ON clause, not WHERE. That distinction is what makes this an anti be a part of. Placing the date filter in WHERE would drop rows the place the LEFT JOIN produced NULLs, collapsing it again to an INNER JOIN. With the filter in ON, free customers with no qualifying April name nonetheless produce a row, with NULLs on the best aspect, and the IS NULL test retains solely these rows.

SELECT DISTINCT u.user_id
FROM rc_users u
LEFT JOIN rc_calls c
       ON u.user_id = c.user_id
      AND c.call_date BETWEEN '2020-04-01' AND '2020-04-30'
WHERE u.standing="free"
  AND c.user_id IS NULL;

 

// Output

 

 

Conclusion

 
Advanced Join Techniques
 

These three joins clear up circumstances the place INNER JOIN and LEFT JOIN are awkward or improper:

  • LATERAL is the best way to name set-returning features row by row inside FROM.
  • EXISTS offers you “rows with a match” with out the duplication that INNER JOIN causes.
  • NOT EXISTS or LEFT JOIN + IS NULL offers you “rows with no match” cleanly.

The sample to recollect is brief. When INNER JOIN duplicates rows you don’t need, use EXISTS. Once you want rows that haven’t any match, use NOT EXISTS or LEFT JOIN + IS NULL. When a subquery in FROM must reference columns from an outer desk, add LATERAL.

Observe these on actual SQL interview questions, and the syntax turns into computerized.
 
 

Nate Rosidi is a knowledge scientist and in product technique. He is additionally an adjunct professor instructing analytics, and is the founding father of StrataScratch, a platform serving to knowledge scientists put together for his or her interviews with actual interview questions from high corporations. Nate writes on the most recent traits within the profession market, offers interview recommendation, shares knowledge science initiatives, and covers every part SQL.


RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments