Improved SQL Alias Support

SQL
5 min read

Dolt is a MySQL-compatible database with Git's distributed versioning features. It's the first relational database that versions your data changes AND your schema changes. Dolt allows you to branch and merge, push and pull, fork and clone, diff and audit history for your relational data, just like Git allows you to do with files.

In today's blog post, we're looking at some of the work we've done recently to improve SQL alias support to better match MySQL's behavior.

What are Aliases?

Aliases are a key feature of SQL. They allow you to make your queries more expressive and readable and they enable you to write certain types of queries that you just couldn't do without aliases.

Alias

Much like the J. J. Abrams TV series, Alias, where Jennifer Garner played a quirky spy who assumed alternate identities, SQL aliases allow you to assign alternate identities to the tables and expressions in your SQL queries. Unlike the TV show however, we make no promises that these aliases will help you foil nefarious plans of international criminal organizations.

There are two types of aliases in SQL: "expression" (or "column") aliases, and "table" aliases:

  • Table aliases allow you to give a table an alternate name and require that you always reference that instance of the table using that alternate name in your query. This is often handy for making your queries more readable and is required when you need to join a table with itself.
  • Expression aliases allow you to give a name to any projected expression so that you can easily reference that same expression and also so that you can control its name in the projected schema.

Alias support is tricky because aliases can be used in just about any part of a query and because the rules they follow are nuanced and sometimes rather opaque and surprising. Additionally, as with just about any major SQL feature, each database vendor tends to go slightly off script from the SQL standard in order to support other features custom to their product.

So, for Dolt to be fully MySQL-compliant, we need to deeply understand all the standard SQL rules for aliases, as well as any deviations from the spec that MySQL has implemented. For example, standard SQL does not allow aliases to be referenced in the GROUP BY clause or the HAVING clause. MySQL relaxes this restriction and allows aliases to be used in both, which allows you to write a query such as:

SELECT id as alias1, FLOOR(value/100) AS alias2
  FROM tbl_name
  GROUP BY alias1, alias2
  HAVING alias1 > 100;

Examples

Let's take a look at some specific queries with aliases that used to cause problems for Dolt just a month ago...

For each of the examples below, we compared the results from an older version of Dolt (0.41.5, released on 9/20/2022), with the results from MySQL (8.0.30), and from the current version of Dolt (0.50.6) using queries against the following sample data:

create table xy (x int primary key, y int);
create table uv (u int primary key, v int);
insert into xy values (0,0),(1,1),(2,2),(3,3);
insert into uv values (0,3),(3,0),(2,1),(1,2);

Correctly resolving ambiguity between aliases and column names is a critical feature for the query analyzer. In older versions of Dolt, ambiguity between multiple aliases with the same name was allowed in the ORDER BY clause, but the same query against MySQL results in an error.

> select u as u, v as u from uv order by u;

-- Dolt 0.41.5 Results:
+---+---+
| u | u |
+---+---+
| 0 | 3 |
| 1 | 2 |
| 2 | 1 |
| 3 | 0 |
+---+---+

-- MySQL 8.0.30 Results:
ERROR 1052 (23000): Column 'u' in order clause is ambiguous
    
-- Dolt 0.50.6 Results:
ambiguous column or alias name "u"

Not surprisingly, digging through all sorts of different uses (and misuses) of aliases turned up lots of interesting edge cases. For example, we found some GROUP BY queries that implicitly use a single group for all results weren't able to correctly filter with a HAVING clause that used a projected alias. Kudos to Jennifer for tackling this fix!

> select t1.u as a from uv as t1 having a = t1.u order by a;

-- Dolt 0.41.5 Results:
found HAVING clause with no GROUP BY
    
-- MySQL 8.0.30 Results:
+---+
| a |
+---+
| 0 |
| 1 |
| 2 |
| 3 |
+---+

-- Dolt 0.50.6 Results:
+---+
| a |
+---+
| 0 |
| 1 |
| 2 |
| 3 |
+---+

Sometimes we found some really weird stuff! In the query below, older versions of Dolt didn't correctly resolve the alias dummy and let fake data from the dual table infiltrate query results. In addition to fixing the issue with the dual table, fixing this query also meant fixing alias resolution in subquery expressions, which fixed several other queries such as SELECT 1 as a, (select a) as a; that rely on visibility to expression aliases defined in outer scopes.

> select "foo" as dummy, (select dummy);

-- Dolt 0.41.5 Results:
+-------+----------------+
| dummy | (select dummy) |
+-------+----------------+
| foo   | x              |
+-------+----------------+

-- MySQL 8.0.30 Results:
+-------+----------------+
| dummy | (select dummy) |
+-------+----------------+
| foo   | foo            |
+-------+----------------+
    
-- Dolt 0.50.6 Results:
+-------+----------------+
| dummy | (select dummy) |
+-------+----------------+
| foo   | foo            |
+-------+----------------+

This next broken query came from a customer-reported issue. The root problem here was again around resolution of ambiguity between column and alias names. (btw... in case you hadn't heard... we LOVE crushing bugs quickly for customers, so please send us GitHub issues if you find anything odd! 🦟)

> select x as v, (select u from uv where v = y) as u from xy;

-- Dolt 0.41.5 Results:
the subquery returned more than 1 row
    
-- MySQL 8.0.30 Results:
+---+---+
| v | u |
+---+---+
| 0 | 3 |
| 1 | 2 |
| 2 | 1 |
| 3 | 0 |
+---+---+
    
-- Dolt 0.50.6 Results:
+---+---+
| v | u |
+---+---+
| 0 | 3 |
| 1 | 2 |
| 2 | 1 |
| 3 | 0 |
+---+---+

Alias visibility and ambiguity resolution is nuanced and varies depending on what clause of the statement they are used in, if they are in a subquery expression or a derived table, and many other factors. For example, the SQL standard (and MySQL) disallows aliases from being used in a WHERE clause in the same scope where the expression alias was defined. Older versions of Dolt were not compliant with this, but newer versions now correctly match MySQL's behavior.

> select 1 as a where a > 0;

-- Dolt 0.41.5 Results:
+---+
| a |
+---+
| 1 |
+---+
    
-- MySQL 8.0.30 Results:
ERROR 1054 (42S22): Unknown column 'a' in 'where clause'
    
-- Dolt 0.50.6 Results:
column "a" could not be found in any table in scope

Conclusion

Aliases are an important feature of SQL and Dolt's support for table and expression aliases has steadily improved over time. There are still some broken edge cases with aliases in some queries (e.g. conflating multiple projections with the same alias, disallowing forward-references in projections), but we will keep hunting them down, and moving closer to full MySQL compliance, with the same tenacity that Sydney Bristow showed whenever she went after covert agents plotting to destroy the world.

If you run into any issues using aliases with Dolt, please cut us an issue on GitHub! Identifying broken queries is a huge help to us, plus we love fixing bugs quickly for customers. 💜 The Dolt dev team, as well as the Dolt customer community, is also active on the Dolt Discord server, if you want to come by and chat with us!

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt

Or join our mailing list to get product updates.