source: kdnuggets: advanced join techniques: lateral joins, semi joins, anti joins

level: technical

inner join and left join cover most sql queries, but some tasks need other join types. lateral joins let a subquery in the from clause reference columns from earlier tables in the same from clause. this is useful when calling set-returning functions like unnest() or regexp_matches() row by row. without lateral, a subquery in from cannot see those columns. semi joins return rows from the left table where at least one match exists in the right table, without duplicating left rows. anti joins return rows where no match exists.

a lateral join example counts word occurrences in a text column. using regexp_matches() with the 'g' flag inside a lateral subquery extracts every match per row. the query then counts all matches across the table. for semi joins, an exists subquery finds customers with at least one order over $100. this avoids the duplication that an inner join would cause when a customer has multiple qualifying orders. the result lists each customer only once.

anti joins can be written with not exists or a left join plus is null. an example finds free users who made no calls in april 2020. the date filter goes in the on clause of the left join, not the where clause, to keep non-matching rows. the is null check then selects only those users. these three join types handle cases where inner and left joins are awkward or wrong. use exists to avoid duplication, not exists or left join with is null for missing matches, and lateral for row-by-row function calls.

why it matters: mastering these joins helps data scientists write cleaner, more efficient sql for filtering, deduplication, and complex row-wise operations.


source: kdnuggets: advanced join techniques: lateral joins, semi joins, anti joins