PRODUCTS

KEYWORDS

Announcing Functional Index Support in Doltgres

Doltgres is the world’s first PostgreSQL-compatible database with Git-style version control built in. Today we’re excited to announce that Doltgres now supports functional indexes!

A few weeks ago, we announced functional index support in Dolt, and in that post we mentioned that Doltgres support was on the way. Well, here it is! Functional indexes aren’t something every application needs, but when they are needed, they can have a big impact on query performance, as we’ll see later in this post.

In this blog post, we’ll walk through what functional indexes are, how they work in Doltgres, and how to use them, as well as see them in action.

What is a Functional Index?#

A regular index stores the values of one or more columns and allows the database engine to quickly look up rows by those column values. A functional index is similar, but instead of storing a column’s value, it stores the result of a function or expression. That computed value is what gets indexed, so when a query contains the same expression in its WHERE clause, Doltgres can use the index to satisfy the query efficiently, without having to scan the full table and evaluate the expression for every row.

PostgreSQL refers to these as indexes on expressions in the documentation, though the term “functional index” is widely used as well. Creating one looks like this:

CREATE INDEX idx_lower_email ON users (LOWER(email));

In the ON clause, instead of listing column names, we include the expression LOWER(email). The new index will store the result of calling the LOWER() function on the email column, giving us an index of lower-cased email addresses, where each row in the index also points back to the corresponding full row in the table.

When Does This Help?#

Let’s look at a common scenario for functional indexes. Imagine you have a users table and you want to look up users by email address, but you don’t want User@Example.COM and user@example.com to be treated as different entries. There are several clever ways to handle this, but for the sake of this demonstration, we’ll normalize the values to lowercase when filtering on them.

SELECT * FROM users WHERE LOWER(email) = LOWER('User@Example.COM');

Without a functional index on LOWER(email), satisfying that query requires a full table scan. The database engine has to evaluate LOWER(email) for every single row and compare it against your search value. On a large table, that’s going to be slow.

A functional index solves this cleanly, without requiring you to store a redundant email_lower column purely for indexing purposes.

Here are some other examples of good cases to use a functional index:

  • Date part extraction — index on DATE_PART('year', created_at) or DATE_TRUNC('month', order_date) for reporting queries that filter by year or month
  • JSON path extraction — index on (metadata->>'type') to speed up queries filtering on specific fields inside a JSONB column
  • String transformations — indexing trimmed or normalized versions of values without duplicating the data in an extra column

Getting Started#

Let’s see functional indexes in action. We’ll create a synthetic employees table with 300,000 rows — large enough to see a real performance difference, but still easy to set up.

If you don’t have Doltgres yet, install it here. Then start the server:

$ doltgres

In another terminal window, connect with psql, create a database, and switch to it:

$ PGPASSWORD=password psql -U postgres
psql (15.17 (Homebrew))
Type "help" for help.

postgres=# CREATE DATABASE employees;
CREATE DATABASE
postgres=# \c employees
You are now connected to database "employees" as user "postgres".

Now create the employees and titles tables and populate them with some sample data:

CREATE TABLE employees (
    emp_no     SERIAL PRIMARY KEY,
    first_name TEXT NOT NULL,
    last_name  TEXT NOT NULL,
    hire_date  DATE NOT NULL
);

CREATE TABLE titles (
    id        SERIAL PRIMARY KEY,
    emp_no    INTEGER NOT NULL REFERENCES employees(emp_no),
    title     TEXT NOT NULL,
    from_date DATE NOT NULL,
    to_date   DATE
);

INSERT INTO employees (first_name, last_name, hire_date)
SELECT
    (ARRAY['Georgi','Bezalel','Parto','Kyoichi','Anneke',
           'Tzvetan','Saniya','Sumant','Duangkaew','Mary'])[1 + (n % 10)],
    CASE WHEN n % 1500 = 0
         THEN 'Facello'
         ELSE (ARRAY['Smith','Johnson','Williams','Brown','Jones',
                     'Garcia','Miller','Davis','Wilson','Moore',
                     'Taylor','Anderson','Thomas','Jackson','White'])[1 + (n % 15)]
    END,
    '1985-01-01'::date + (n % 9000)
FROM (SELECT generate_series(1, 300000) AS n) AS seq;

INSERT INTO titles (emp_no, title, from_date, to_date)
SELECT
    emp_no,
    (ARRAY['Engineer','Senior Engineer','Staff',
           'Senior Staff','Technique Leader'])[1 + (emp_no % 5)],
    hire_date,
    CASE WHEN emp_no % 4 = 0
         THEN NULL
         ELSE hire_date + ((emp_no % 3000) || ' days')::interval
    END
FROM employees;

Next, turn on timing mode in psql. This will return the client side latency for the queries we run. We’ll use this to compare the performance of queries running with and without a functional index.

\timing
Timing is on.

Our database now has 300,000 employees with about 200 named ‘Facello’ and one title record per employee. Suppose we need to find employees by last name in a case-insensitive way. We can run the query below to normalize the last names to lowercase for filtering purposes but still return the original casing in the results:

SELECT emp_no, first_name, last_name FROM employees WHERE LOWER(last_name) = 'facello';

Without a Functional Index#

Let’s look at the query plan Doltgres uses when there’s no index:

employees=# EXPLAIN SELECT emp_no, first_name, last_name FROM employees WHERE LOWER(last_name) = 'facello';
                      plan
------------------------------------------------
 Filter
  ├─ lower(employees.last_name) = 'facello'
  └─ Table
      ├─ name: employees
      └─ columns: [emp_no first_name last_name]
(5 rows)

The plan is a sequential scan over the entire employees table. Doltgres reads all 300,000 rows, evaluates LOWER(last_name) on each one, and discards the ones that don’t match. On my Apple M1 Max laptop, this query takes about 150ms.

Adding the Functional Index#

Now let’s create the functional index:

employees=# CREATE INDEX idx_lower_last_name ON employees (LOWER(last_name));

Let’s look at the query plan for the same query after adding the index:

employees=# EXPLAIN SELECT emp_no, first_name, last_name FROM employees WHERE LOWER(last_name) = 'facello';
                                    plan
----------------------------------------------------------------------------
 Project
  ├─ columns: [employees.emp_no, employees.first_name, employees.last_name]
  └─ IndexedTableAccess(employees)
      ├─ index: [employees.!hidden!idx_lower_last_name!0!0]
      └─ filters: [{[facello, facello]}]
(5 rows)

Notice that the sequential scan is gone. The plan now shows an IndexedTableAccess on the employees table, using our new functional index. Doltgres goes directly to the index, jumps to the entries where the indexed expression equals 'facello', looks up each corresponding row in the main table, and returns only those rows.

When we run this query, with the timing option enabled in psql, we can see the total client-side latency for the query:

SELECT emp_no, first_name, last_name FROM employees WHERE LOWER(last_name) = 'facello';
...
...
Time: 4.133 ms

The same query now goes from running in 150ms to running in just about 4ms, a ~37x improvement on a 300K row table. As the size of the table grows, the difference between the direct lookup via the functional index, and doing a full table scan and evaluating the function expression for every single row, continues to grow – the direct lookup performance will stay relatively constant, while the full table scan performance will continue to increase in latency, resulting in a larger and larger performance gain.

Functional Indexes in Joins#

Functional indexes don’t just speed up point lookups, they can also improve queries where the filtered table is one side of a join. Using the same database we just created, let’s look at a query that finds the current title for every employee named ‘Facello’:

SELECT f.emp_no, f.first_name, f.last_name, t.title, t.from_date
FROM (SELECT emp_no, first_name, last_name FROM employees WHERE LOWER(last_name) = 'facello') f
JOIN titles t ON f.emp_no = t.emp_no
WHERE t.to_date IS NULL;

Without the Index#

First, let’s drop the index we just created and look at the plan:

employees=# DROP INDEX idx_lower_last_name;
employees=# EXPLAIN
  SELECT f.emp_no, f.first_name, f.last_name, t.title, t.from_date
  FROM (SELECT emp_no, first_name, last_name FROM employees WHERE LOWER(last_name) = 'facello') f
  JOIN titles t ON f.emp_no = t.emp_no
  WHERE t.to_date IS NULL;
                                   plan
--------------------------------------------------------------------------
 Project
  ├─ columns: [f.emp_no, f.first_name, f.last_name, t.title, t.from_date]
  └─ LookupJoin
      ├─ t.to_date IS NULL
      ├─ SubqueryAlias
      │   ├─ name: f
      │   ├─ outerVisibility: false
      │   ├─ isLateral: false
      │   ├─ cacheable: false
      │   ├─ colSet: (5-7)
      │   ├─ tableId: 2
      │   └─ Filter
      │       ├─ lower(employees.last_name) = 'facello'
      │       └─ Table
      │           ├─ name: employees
      │           └─ columns: [emp_no first_name last_name]
      └─ TableAlias(t)
          └─ IndexedTableAccess(titles)
              ├─ index: [titles.emp_no]
              ├─ columns: [emp_no title from_date to_date]
              └─ keys: f.emp_no
(21 rows)

The subquery on one side of the join does a full table scan of all 300K rows in the employees table and evalutes the LOWER() expression on each row and compares it with facello. Then, for each matching employee, it does a fast indexed lookup into the titles table. The bottleneck is that first full table scan — the query runs in about 145ms on my laptop. As the employees table gets larger, this query takes longer to run.

With the Index#

Let’s recreate the functional index:

employees=# CREATE INDEX idx_lower_last_name ON employees (LOWER(last_name));

And look at the query plan:

employees=# EXPLAIN
  SELECT f.emp_no, f.first_name, f.last_name, t.title, t.from_date
  FROM (SELECT emp_no, first_name, last_name FROM employees WHERE LOWER(last_name) = 'facello') f
  JOIN titles t ON f.emp_no = t.emp_no
  WHERE t.to_date IS NULL;
                                          plan
----------------------------------------------------------------------------------------
 Project
  ├─ columns: [f.emp_no, f.first_name, f.last_name, t.title, t.from_date]
  └─ LookupJoin
      ├─ t.to_date IS NULL
      ├─ SubqueryAlias
      │   ├─ name: f
      │   ├─ outerVisibility: false
      │   ├─ isLateral: false
      │   ├─ cacheable: false
      │   ├─ colSet: (11-13)
      │   ├─ tableId: 3
      │   └─ Project
      │       ├─ columns: [employees.emp_no, employees.first_name, employees.last_name]
      │       └─ IndexedTableAccess(employees)
      │           ├─ index: [employees.!hidden!idx_lower_last_name!0!0]
      │           └─ filters: [{[facello, facello]}]
      └─ TableAlias(t)
          └─ IndexedTableAccess(titles)
              ├─ index: [titles.emp_no]
              ├─ columns: [emp_no title from_date to_date]
              └─ keys: f.emp_no
(21 rows)

The overall join structure is the same — a lookup join between the employees table and the titles table. But there’s one major difference in the query plan… instead of doing a full table scan over employees, notice the IndexedTableAccess operation on our new idx_lower_last_name index. Inside that operation you can see the filters applied on the index, which enable the engine to jump directly to the rows with the value facello and not need to process any other rows. When we run this same query with the functional index in place, it now runs in about 5ms.

Here’s a sample of the results:

  emp_no | first_name | last_name |  title   | from_date
--------+------------+-----------+----------+------------
   1500 | Georgi     | Facello   | Engineer | 1989-02-09
   3000 | Georgi     | Facello   | Engineer | 1993-03-20
   4500 | Georgi     | Facello   | Engineer | 1997-04-28
   6000 | Georgi     | Facello   | Engineer | 2001-06-06
   7500 | Georgi     | Facello   | Engineer | 2005-07-15
   9000 | Georgi     | Facello   | Engineer | 1985-01-01
   ...
(200 rows)
Time: 5.168 ms

Functional Indexes and Doltgres Branches#

One thing worth noting for Doltgres users: functional indexes are part of your schema, and like everything else in Doltgres, your schema lives on branches. If you create a functional index on a feature branch, it exists only on that branch until you merge it. Schema changes, including functional index additions and removals, show up in diffs, so your team can review index changes just like they review any other change to the database.

This means you can safely experiment with different indexing strategies on isolated branches and only merge the ones that work the way you need them to.

How It Works#

Under the hood, Doltgres implements functional indexes by automatically creating a hidden virtual generated column that stores the result of the expression, then builds a regular index on that hidden column. The virtual column is invisible — it won’t appear in \d output or SELECT * results. Doltgres handles computing and maintaining it automatically as rows are inserted and updated.

Future Enhancements#

The initial release of functional index support in Doltgres covers the core use cases: a single functional expression per index, used in filters and joins. We have a few follow-up enhancements planned:

Multiple expressions per functional index. Each functional index currently supports exactly one functional expression. If you need indexes on multiple expressions, you’ll need a separate index for each one. We’re already working on expanding support for mixing functional expressions with column references in an index, and using multiple functional expressions in a single index.

Use functional indexes for more queries. Today, functional indexes are used in joins and filters to speed up queries. There are other places where functional indexes could also be used, such as to optimize sorting. For example, a query like SELECT * FROM users ORDER BY LOWER(email) won’t yet use the functional index to avoid a sort.

Performance testing. Our initial performance testing shows a dramatic speed up for queries that performed a full table scan before being optimized with a functional index. We’ll be expanding our benchmarks to cover more functional index scenarios and continuing to tune performance.

Wrap Up#

Functional indexes in Doltgres let you index the results of expressions on your data and use those precomputed values to speed up filters and joins — turning slow sequential scans over large tables into fast point lookups. In the examples we walked through here, a functional index on a 300K row table improved query performance by 37x. The improvement grows larger as the table size increases, because sequential scan performance degrades while indexed lookup performance stays constant.

If you haven’t tried Doltgres yet, install it here and give functional indexes a spin! If you have questions or feedback, swing by our Discord server or file an issue on GitHub. We love hearing from customers and are always happy to prioritize features or bug fixes that customers tell us they need.