Dolt is the world’s first SQL database with Git-style version control built in. Dolt gives you all the power of SQL combined with the ability to branch, merge, diff, clone, and push your data, just like you would with source code in Git. Today we’re excited to announce that Dolt now supports functional indexes!
Functional indexes have been part of MySQL since MySQL 8.0. They’re not something every application needs, but when they are needed, they can have a big impact on query performance, as we’ll see later in this post. This is a feature we’ve wanted to build for a long time, and after receiving a few customer requests for it, we decided to prioritize it. We’ve launched the initial support in Dolt and are now expanding that to add this feature to Doltgres, add additional syntax support, enable usage of the functional indexes in more queries, and fine tune performance.
In this blog post, we’ll walk through what functional indexes are, how they work, how to use them, and see them in action.
What is a Functional Index?#
A regular index stores the values of one or more columns and lets Dolt quickly look up rows by those raw column values. A functional index is similar, but instead of storing a column’s raw value, it stores the result of a function or expression applied to that column. That stored result is what gets indexed, so when a query contains the same expression in its WHERE clause, Dolt can use the index to satisfy the query efficiently — without scanning the full table and evaluating the expression row by row.
The syntax uses an extra set of parentheses to mark the expression:
CREATE INDEX idx_lower_email ON users ((LOWER(email)));
The double parentheses are important — that’s how MySQL and Dolt distinguish a functional expression from a regular column reference inside an index definition.
When Does This Help?#
Let’s look at a common scenario for functional indexes. Imagine you have a users table and you want to look up users by email address, but you don’t want User@Example.COM and user@example.com to be treated as different entries. There are several ways to handle this, including normalizing values to lowercase when filtering on them.
SELECT * FROM users WHERE LOWER(email) = LOWER('User@Example.COM');
Without a functional index on LOWER(email), satisfying that query requires a full table scan. The database engine has to evaluate LOWER(email) for every single row and compare it against your search value. On a large table, that’s going to be slow.
A functional index solves this cleanly, without requiring you to store a redundant email_lower column purely for indexing purposes.
Other common cases for using a functional index include:
- Date part extraction — index on
YEAR(created_at)orMONTH(order_date)for reporting queries that filter by year or month - JSON path extraction — index on
JSON_UNQUOTE(JSON_EXTRACT(metadata, '$.type'))to speed up queries on specific fields inside a JSON column - String transformations — indexing trimmed or normalized versions of values without duplicating the data in an extra column
Getting Started#
Let’s see functional indexes in action. We’ll use the dolthub/employees database from DoltHub — a sample database with 300,024 employees and 443,308 title records, which gives us enough data to see a real difference in query performance.
If you don’t have Dolt yet, install it here. Then clone the database:
$ dolt clone dolthub/employees
$ cd employees
From here, you can run dolt sql to start a SQL shell:
$ dolt sql
# Welcome to the DoltSQL shell.
# Statements must be terminated with ';'.
# "exit" or "quit" (or Ctrl-D) to exit. "\help" for help.
employees/main>
Then you can start exploring the database:
employees/main> DESCRIBE employees;
+------------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+---------------+------+-----+---------+-------+
| emp_no | int | NO | PRI | NULL | |
| birth_date | date | NO | | NULL | |
| first_name | varchar(14) | NO | | NULL | |
| last_name | varchar(16) | NO | | NULL | |
| gender | enum('M','F') | NO | | NULL | |
| hire_date | date | NO | | NULL | |
+------------+---------------+------+-----+---------+-------+
Out of the box, the only index is the primary key on emp_no. There’s no index on last_name. Suppose we want to find all employees by last name, but we need the search to be case-insensitive — the data was imported from multiple sources and the casing isn’t consistent. We write:
SELECT emp_no, first_name, last_name FROM employees WHERE LOWER(last_name) = 'facello';
Without a Functional Index#
Let’s look at the query plan Dolt uses when there’s no index. We’ll use EXPLAIN FORMAT=TREE which gives us Dolt’s detailed execution plan:
employees/main> EXPLAIN FORMAT=TREE SELECT emp_no, first_name, last_name FROM employees WHERE LOWER(last_name) = 'facello';
+----------------------------------------------------------------------------+
| plan |
+----------------------------------------------------------------------------+
| Project |
| ├─ columns: [employees.emp_no, employees.first_name, employees.last_name] |
| └─ Filter |
| ├─ (lower(employees.last_name) = 'facello') |
| └─ Table |
| ├─ name: employees |
| └─ columns: [emp_no first_name last_name] |
+----------------------------------------------------------------------------+
7 rows in set (0.00 sec)
The plan is a Project of three columns, over a Filter node on top of a full Table scan. Dolt reads all 300,024 rows, evaluates LOWER(last_name) on each one, and discards the ones that don’t match. Against 300K rows, on my Apple M1 Max laptop, this query takes about 150ms.
Adding the Functional Index#
Now let’s create the functional index, using the same syntax we covered earlier:
employees/main> CREATE INDEX idx_lower_last_name ON employees ((LOWER(last_name)));
Let’s look at the query plan for the same query and see if it’s different after creating a functional index on LOWER(last_name):
employees/main> EXPLAIN FORMAT=TREE SELECT emp_no, first_name, last_name FROM employees WHERE LOWER(last_name) = 'facello';
+----------------------------------------------------------------------------+
| plan |
+----------------------------------------------------------------------------+
| Project |
| ├─ columns: [employees.emp_no, employees.first_name, employees.last_name] |
| └─ IndexedTableAccess(employees) |
| ├─ index: [employees.!hidden!idx_lower_last_name!0!0] |
| └─ filters: [{[facello, facello]}] |
+----------------------------------------------------------------------------+
5 rows in set (0.00 sec)
The full table scan is gone. The plan now shows IndexedTableAccess — Dolt goes directly to the index, seeks to the entries where the indexed expression equals 'facello', and returns only those rows. The same query now takes under 2ms, a ~75x improvement on a 300K row table.
Indexed lookups into a large table are one of the areas where functional indexes can have a huge impact on query performance. Going from a full table scan to a direct lookup resulted in a 75x improvement for this query against 300k rows, and as the table gets bigger, the full table scan performance gets slower, while the indexed lookup performance remains constant, resulting in a larger and larger improvement.
Functional Indexes in Joins#
The functional index doesn’t just speed up point lookups — it also improves queries where the filtered table is one side of a join. Let’s look at a query that finds the current job title for every employee with the last name ‘Facello’:
SELECT f.emp_no, f.first_name, f.last_name, t.title, t.from_date
FROM (SELECT emp_no, first_name, last_name FROM employees WHERE LOWER(last_name) = 'facello') f
JOIN titles t ON f.emp_no = t.emp_no
WHERE t.to_date = '9999-01-01';
Without the Index#
Let’s start off by looking at the query plan for this query, just like we did earlier. To remove the index we just created on the employees table, we can run:
employees/main> DROP INDEX idx_lower_last_name ON employees;
And here’s the query plan:
employees/main> EXPLAIN FORMAT=TREE
SELECT f.emp_no, f.first_name, f.last_name, t.title, t.from_date
FROM (SELECT emp_no, first_name, last_name FROM employees WHERE LOWER(last_name) = 'facello') f
JOIN titles t ON f.emp_no = t.emp_no
WHERE t.to_date = '9999-01-01';
+--------------------------------------------------------------------------+
| plan |
+--------------------------------------------------------------------------+
| Project |
| ├─ columns: [f.emp_no, f.first_name, f.last_name, t.title, t.from_date] |
| └─ LookupJoin |
| ├─ SubqueryAlias |
| │ ├─ name: f |
| │ ├─ outerVisibility: false |
| │ ├─ isLateral: false |
| │ ├─ cacheable: true |
| │ ├─ colSet: (7-9) |
| │ ├─ tableId: 2 |
| │ └─ Filter |
| │ ├─ (lower(employees.last_name) = 'facello') |
| │ └─ Table |
| │ ├─ name: employees |
| │ └─ columns: [emp_no first_name last_name] |
| └─ Filter |
| ├─ (t.to_date = '9999-01-01') |
| └─ TableAlias(t) |
| └─ IndexedTableAccess(titles) |
| ├─ index: [titles.emp_no,titles.title,titles.from_date] |
| ├─ columns: [emp_no title from_date to_date] |
| └─ keys: f.emp_no |
+--------------------------------------------------------------------------+
22 rows in set (0.00 sec)
The subquery on one side of the LookupJoin does a full table scan of all 300K employees to find the Facello employees. Then, for each matching employee, it does a fast indexed lookup into the titles table using the primary key. The bottleneck is that first full scan — the query runs in about 150ms. This is very similar performance to our first un-indexed query, which makes sense, because the most significant factor in the performance of both queries is the full table scan over the 300k items in the employees table.
With the Index#
Just like before, let’s create a functional index over LOWER(last_name):
employees/main> CREATE INDEX idx_lower_last_name ON employees ((LOWER(last_name)));
Now let’s look at the query plan:
employees/main> EXPLAIN FORMAT=TREE
SELECT f.emp_no, f.first_name, f.last_name, t.title, t.from_date
FROM (SELECT emp_no, first_name, last_name FROM employees WHERE LOWER(last_name) = 'facello') f
JOIN titles t ON f.emp_no = t.emp_no
WHERE t.to_date = '9999-01-01';
+----------------------------------------------------------------------------------------+
| plan |
+----------------------------------------------------------------------------------------+
| Project |
| ├─ columns: [f.emp_no, f.first_name, f.last_name, t.title, t.from_date] |
| └─ LookupJoin |
| ├─ SubqueryAlias |
| │ ├─ name: f |
| │ ├─ outerVisibility: false |
| │ ├─ isLateral: false |
| │ ├─ cacheable: true |
| │ ├─ colSet: (15-17) |
| │ ├─ tableId: 3 |
| │ └─ Project |
| │ ├─ columns: [employees.emp_no, employees.first_name, employees.last_name] |
| │ └─ IndexedTableAccess(employees) |
| │ ├─ index: [employees.!hidden!idx_lower_last_name!0!0] |
| │ └─ filters: [{[facello, facello]}] |
| └─ Filter |
| ├─ (t.to_date = '9999-01-01') |
| └─ TableAlias(t) |
| └─ IndexedTableAccess(titles) |
| ├─ index: [titles.emp_no,titles.title,titles.from_date] |
| ├─ columns: [emp_no title from_date to_date] |
| └─ keys: f.emp_no |
+----------------------------------------------------------------------------------------+
22 rows in set (0.00 sec)
The join structure is identical — a LookupJoin with a per-employee index seek into titles — but now the employees side uses IndexedTableAccess with the functional index instead of a full table scan. Dolt resolves only the 186 matching employees before doing any join work at all, which brings the query down to under 2ms.
Here’s a sample of the results:
+--------+------------+-----------+------------------+------------+
| emp_no | first_name | last_name | title | from_date |
+--------+------------+-----------+------------------+------------+
| 10001 | Georgi | Facello | Senior Engineer | 1986-06-26 |
| 15346 | Kirk | Facello | Senior Engineer | 1997-12-06 |
| 15685 | Kasturi | Facello | Senior Engineer | 1997-03-13 |
| 18686 | Kwangyoen | Facello | Senior Staff | 1994-05-02 |
| 21947 | Taisook | Facello | Senior Engineer | 1998-08-28 |
| 23938 | Nahum | Facello | Senior Engineer | 1985-09-15 |
| 24774 | Uno | Facello | Senior Staff | 2002-05-15 |
| 24806 | Charmane | Facello | Senior Engineer | 1999-03-27 |
| 25955 | Christoph | Facello | Technique Leader | 1995-08-21 |
| 27732 | Girolamo | Facello | Senior Engineer | 1986-06-30 |
| ... | | | | |
+--------+------------+-----------+------------------+------------+
186 rows in set
Functional Indexes and Dolt Branches#
One thing worth noting for Dolt users: functional indexes are part of your schema, and like everything else in Dolt, your schema lives on branches. If you create a functional index on a feature branch, it exists only on that branch until you merge it. Schema changes, including functional index additions and removals, show up in dolt diff and in pull requests on DoltHub, so your team can review index changes the same way they review any other change to the database.
This means you can safely experiment with different indexing strategies on isolated branches and only merge the ones that work the way you need them to. It also allows you to offload the work to build the initial index to be performed in the branch, then Dolt can often reuse the index data directly and update it from data differences when you merge it into another branch.
How It Works#
Under the hood, Dolt implements functional indexes the same way MySQL does: by automatically creating a hidden virtual generated column that stores the result of the expression, then building a regular index on that hidden column. The virtual column is entirely transparent — it won’t appear in SHOW COLUMNS or SELECT * output. Dolt handles computing and maintaining it automatically as rows are inserted and updated.
Future Enhancements#
Our initial release of functional index support includes the features that customers told us they needed from functional indexes (e.g. support for a single functional expression, using functional indexes in joins and filters). Now that those customer use cases are unblocked, we’re following up with a few more enhancements:
Multiple expressions per functional index. Each functional index currently supports exactly one functional expression. If you need indexes on multiple expressions, you’ll need a separate index for each one. This was an expedient way to build what our first customers needed, and we’re already working on expanding support for mixing functional expressions with column references in an index, and using multiple functional expressions in a single index. We expect to deliver this as a quick follow-up to the initial support.
Doltgres support. Doltgres is our PostgreSQL-compatible database engine. Dolt has a head start of a few years over Doltgres, but Doltgres is catching up quickly and moving towards a 1.0 milestone. We’ve already started working on enabling functional index support in Doltgres and expect to deliver this as another fast follow-up.
Use functional indexes for more queries. Today, functional indexes are used in joins and filters to speed up queries. These are the main places where functional index speed up queries, but there are other places where functional indexes could also be used, such as to optimize sorting. For example, a query like SELECT * FROM users ORDER BY LOWER(email) won’t yet use the functional index to avoid a sort.
Performance testing. Our initial performance testing for functional indexes shows a dramatic speed up for queries that performed a full table scan before being optimized with a functional index. When compared to MySQL performance for those same queries and same index, Dolt matches or beats MySQL’s performance for those queries. We’ll be adding sysbench tests to track query performance with functional indexes and digging into other cases, like JSON usage with functional indexes to continue tuning performance.
Wrap Up#
The ability to index functional expressions on your data and then use those precomputed values to speed up joins and filtering in queries is a big feature for Dolt. These functional indexes can turn slow table scans over large tables into lightning fast point lookups. In the examples we walked through here, on a table with 300k rows, a functional index improved performance by 75x. The performance impact gets larger as the table size grows.
If you want to dig deeper into how Dolt uses indexes and plans joins, check out Nick’s recent post on improving index selection for join queries.
If you haven’t tried Dolt yet, install it here and give functional indexes a spin! If you have questions or feedback, swing by our Discord server or file an issue on GitHub. We love hearing from customers and are always happy to prioritize features or bug fixes that customers tell us they need.