How to Use Lateral Joins

SQLFEATURE RELEASE
5 min read

Dolt is the first SQL database that versions data with Git semantics. Using Dolt is as easy as downloading the binary, picking a directory for your data (mydb), running dolt sql-server, and using a MySQL client to connect mysql -h 0.0.0.0 -u root.

The path to full MySQL compatibility is full of obscure SQL operators. Many of those SQL operators are "syntactic sugar." This happens when a short-hand pattern is accepted as an official replacement for a common but verbose long-hand. For example, NATURAL JOIN and USING join are both short-hands for joining tables on columns of the same name.

James recently implemented LATERAL joins. LATERAL joins are still syntactic sugar, but can also be more complicated. LATERAL joins duplicate left-hand side inputs (like joins). But unlike joins, LATERAL operator right-hand side (RHS) has access to left-hand side (LHS) column references (like scalar subqueries). My rule of thumb is that LATERAL joins connect relations in a way that accumulates both (1) projections and (2) rows from each side of the join.

diagram

Rather than writing another word or math heavy explanation of LATERAL, this blog will be example-based. We will share a lot of simple queries that show subtle aspects of LATERAL behavior. Many of the examples are perhaps trivial! But I found LATERALS to be so unlike regular SQL that I needed simple examples to learn. After mastering the basics, you might find LATERAL queries to be a more natural way to write some queries.

Lets jump in!

Simple examples of LATERAL behavior

Add trivial projection

LATERAL joins can be used to append a projection column to every output row of its LHS:

Select * from test, lateral (select 1) one;
+---+---+
| x | 1 |
+---+---+
| 0 | 1 |
| 1 | 1 |
| 2 | 1 |
+---+---+

The three rows in test are all returned with an extra column, 1, appended.

Add trivial multi projection

The RHS of a lateral join can be arbitrarily complex, including multiple return values (unlike scalar subquery expressions):

Select * from test, lateral (select 1, 2) one_two;
+---+---+---+
| x | 1 | 2 |
+---+---+---+
| 0 | 1 | 2 |
| 1 | 1 | 2 |
| 2 | 1 | 2 |
+---+---+---+

For each row in test, we now add two extra columns, '1' and '2'.

LATERAL reference to left-hand side

Like scalar subquery expressions, the RHS of a lateral join can reference outputs of the LHS:

Select * from test, lateral (select x) one_two;
+---+---+
| x | x |
+---+---+
| 0 | 0 |
| 1 | 1 |
| 2 | 2 |
+---+---+

Here the RHS serves to duplicate the input column from the left.

Cartesian join

A LATERAL join extends projection and reference properties with join duplication and filtering:

Select * from test, lateral (select 1 union select 2) one_two;
+---+---+
| x | 1 |
+---+---+
| 0 | 2 |
| 0 | 1 |
| 1 | 2 |
| 1 | 1 |
| 2 | 2 |
| 2 | 1 |
+---+---+

The RHS of the LATERAL inputs each from from x, and will return two new rows: the first with 1 appended, the second with '2' appended.

Join with filter

Not all joins have to be cross products. We can add a filter to make the LATERAL act like a INNER_JOIN. Below we add a new table, test2, with values 1,2,3.

Select * from test as test1,
lateral (select * from test2 where test1.x = test2.x) test2;
+---+---+
| x | x |
+---+---+
| 1 | 1 |
| 2 | 2 |
+---+---+

All rows from test1 feed forward into the test2 RHS. Within test2 we concatenate the two input source columns (test1.x, test2.x) and then filter for rows with a matching join condition (test1.x = test2.x1).

Restrict output rows

We have been returning select *, but the first projection in a LATERAL join is considered special:

Select test2.x from test as test1,
lateral (select * from test2 where test1.x = test2.x) test2;
+---+
| x |
+---+
| 1 |
| 2 |
+---+

This is an easy way to select a subset of all of the columns accumulated during the LATERAL stages.

You may say, these examples are too simple! You may also say, LATERALS are too complicated! Why wouldn't I just inline an extra projection? Why wouldn't I just join the two tables!

I'm not here on a LATERAL join crusade. You should mostly avoid LATERAL if you can. But most of us are not fortunate enough to live in a world with only simple SQL queries. The linear/feed forward nature of LATERAL joins can make complicated queries easier to read for humans, and slow queries easier to optimize. We'll consider one more example to try to spark your imagination for your own work.

Complex query

The opposite of a linear pipeline of SQL operators is a highly nested set of subqueries. We will consider a small database of school data:

Create table students (
  Id int primary key,
  Name varchar(50),
  Major int
);
Create table classes (
  Id int primary key,
  Name varchar(50),
  Department int
);
Create table grades (
  Grade float,
  Student int,
  Class int,
  Primary key (class, student)
);
Create table majors (
  Id int,
  Name varchar(50),
  Department int,
  Primary key (name, department)
);
Create table departments (
  Id int primary key,
  Name varchar(50)
);

At a high level, we want to search for each student's highest grade among classes in their major. We assume that each major department can have several different major names within. For example. Roman and Bronze Age are two majors within the history department.

Here is a small SQL pyramid of doom for this search:

-- outer scope iterates students table
Select
  name,
  (
    -- first subquery gets the class_name for the max grade within the major department
    Select classes.name
    from grades
    Join classes
      On grades.class = classes.id
    where
      classes.department in (
        select department
        from majors
        where
          majors.id = students.major
      ) and
      Grade = (
        Select max(grade)
        from grades
        Where grades.student = students.id
	  )
  ) class_name,
  (
    -- second subquery backtracks to lookup the max_grade for the student and class combo`
    select max(grade)
    from classes
    Join grades
    Where classes.name = class_name and
      grades.class = classes.id and
      students.id = grades.student
) max_grade
From students;
+--------+-----------------+-----------+
| Name   | class_name      | max_grade |
+--------+-----------------+-----------+
| Elle   | ESG Studies     | 97        |
| Latham | Greek Mythology | 92        |
+--------+-----------------+-----------+

The subquery logic is equivalent to the rewrite below, but uses several layers of nesting to accomplish the same goal. We use a subquery to compute class_name and max_grade, and do not share intermediate logic between the two.

For any engineers that find imperative code easy, but struggle to translate the same work into SQL, LATERAL joins might be for you:

  Select students.name, class.class_name, grade.max_grade
  -- (1) start with students table
  from students,
  LATERAL (
    -- (2) find student's major department
    Select departments.id as did
    from majors join departments
    On majors.department = departments.id
    where majors.id = students.major
  ) dept,
  LATERAL (
    -- (3) get student's best performing class within their major's department
    Select
      grade as max_grade,
      classes.id as cid
    From grades
    Join classes
      On grades.class = classes.id
    Where grades.student = students.id and
          classes.department = dept.did
    Order by grade desc limit 1
  ) grade,
  LATERAL (
    -- (4) convert class_id into a class_name
    Select name as class_name from classes where grade.cid = classes.id
  ) class
+--------+-----------------+-----------+
| Name   | class_name      | max_grade |
+--------+-----------------+-----------+
| Elle   | ESG Studies     | 97        |
| Latham | Greek Mythology | 92        |
+--------+-----------------+-----------+

The nesting is now decoupled into a series of phases. We (1) read the student information that we need. Then (2) convert the student's major into a department name. Then (3) select the class with the best grade filtering for the student's classes in their major's department. Finally, we (4) convert the class identifier into a class name. A top-level projection makes sure we only return name, max_grade, class to the client.

Which strategy to use depends on personal preference. I usually suspect highly nested subqueries as good candidates for LATERAL joins, but you should try and judge for yourself. The last benefit of LATERAL joins are that they are usually easier for query optimizers to make performant. Internally in fact, we try to convert from subqueries to LATERAL joins when possible to simplify and linearize the execution plan.

Summary

This blog gives small examples of LATERAL join behavior. In some ways, LATERAL join is syntactic sugar that is easy to replace with subquery or joins syntax. LATERAL join can be a different frame of reference for writing complex SQL queries. Like common table expressions, LATERAL is a new tool that makes it easier to build more complicated queries.

If you have any questions about Dolt, databases, or Golang performance reach out to us on Twitter, Discord, and GitHub!

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt

Or join our mailing list to get product updates.