90% Faster Tests With Dolt Stored Procedures

June 10, 2022

4 min read

Dolt is a relational database that uses a custom storage layer and SQL engine to version data with Git-semantics. Every layer in the stack relies on thousands of tests to maintain MySQL compatibility and correctness. The surface area of MySQL and Git features is large, and testing is something we pay close attention to.

SQL engine tests are a team favorite. They comprise sets of query strings that validate SQL input/output pairs. They are conceptually easy, convenient to run, coverage-thorough, and provide tight iteration cycles. As a result, we have added new engine tests every day for years. This graph shows the amount of seconds taken to run enginetests over the past 12 months:

The uptick has started become a developer bottleneck. A dev waits 4 minutes for enginetests to pass before committing any changes to go-mysql-server (GMS), our open source SQL engine, or Dolt.

Today we will show how we slashed the engine test runtime in the most recent release by 90% using Dolt versioning.

Dolt Stored procedures

Consider TestInsertInto, which adds rows to a table before verifying a new row is added:

{
    WriteQuery:          "INSERT INTO mytable SET s = 'x', i = 999;",
    ExpectedWriteResult: []sql.Row{{sql.NewOkResult(1)}},
    SelectQuery:         "SELECT i FROM mytable WHERE s = 'x';",
    ExpectedSelect:      []sql.Row{{int64(999)}},
},

The profile of the test suite is distressing, but obvious in hindsight:

    .     .    709:func TestInsertInto(t *testing.T, harness Harness) {
    .     .    710:	for _, insertion := range InsertQueries {
    . 3.48s    711:		e := NewEngine(t, harness)
    .  20ms    714:		TestQuery(t, harness, e, insertion.WriteQuery, insertion.ExpectedWriteResult, nil)
    .  20ms    715:		TestQuery(t, harness, e, insertion.SelectQuery, insertion.ExpectedSelect, nil)
    .     .    716:    }
    .     .    716:}

The test creates a SQL engine populated with data, runs WriteQuery, and then runs SelectQuery. Engine setup dwarfs the test runtime. The actual test is fast (40 milliseconds), rebuilding the database every time is slow (3480 milliseconds).

Reduce, Reuse, Recycle

How do Dolt stored procedures help us with enginetests? We can recycle a database with two commands that reset preexisting and new tables instead of running the entire setup steps every time.

First, dolt reset --hard rewinds uncommitted table changes. Below we show the table data alongside the dolt_status describing working set changes:

-- Observe the new changes
> select * from mytable;
+-----+---+
| i   | s |
+-----+---+
| 999 | x |
+-----+---+
> select * from dolt_status;
+------------+--------+----------+
| table_name | staged | status   |
+------------+--------+----------+
| mytable    | false  | modified |
+------------+--------+----------+

-- Rewind changes
> call dolt_reset('--hard');
+--------+
| status |
+--------+
| 0      |
+--------+

-- Database state is equal to the last commit
> select * from mytable;
+---+---+
| i | s |
+---+---+
+---+---+
> select * from dolt_status;
+------------+--------+--------+
| table_name | staged | status |
+------------+--------+--------+
+------------+--------+--------+

Resetting the database does not entirely correct the working set. Reset only affects previously versioned tables.

Consider TestCreateTableSelect, where we add a new table to the database by selecting a fraction of an existing table:

{
    WriteQuery:          "CREATE TABLE newtable AS SELECT * from mytable;",
    ExpectedWriteResult: []sql.Row{{sql.NewOkResult(0)}},
    SelectQuery:         "SELECT i FROM newtable WHERE s = 'first row';",
    ExpectedSelect:      []sql.Row{{int64(1)}},
},

After running dolt reset --hard, the new table remains:

-- Observe new table
> show tables;
+---------------+
| Tables_in_tmp |
+---------------+
| mytable       |
| newtable      |
+---------------+
> select * from dolt_status;
+------------+--------+-----------+
| table_name | staged | status    |
+------------+--------+-----------+
| newtable   | false  | new table |
+------------+--------+-----------+

-- Attempt to remove the new table
> call dolt_reset('--hard');
+--------+
| status |
+--------+
| 0      |
+--------+

-- New table is still there
> show tables;
+---------------+
| Tables_in_tmp |
+---------------+
| mytable       |
| newtable      |
+---------------+
> select * from dolt_status;
+------------+--------+-----------+
| table_name | staged | status    |
+------------+--------+-----------+
| newtable   | false  | new table |
+------------+--------+-----------+

Reset prevents Dolt from deleting unversioned tables. In order to remove untracked changes from the working set, I implemented dolt clean.

-- Delete the new table
> call dolt_clean('newtable');
+--------+
| status |
+--------+
| 0      |
+--------+

-- Database state is equal to the last commit
> select * from dolt_status;
+------------+--------+--------+
| table_name | staged | status |
+------------+--------+--------+
+------------+--------+--------+
> show tables:
+---------------+
| Tables_in_tmp |
+---------------+
| mytable       |
+---------------+

Dolt clean removes new, unstaged, and untracked tables. Or put simply, dolt clean calls drop table for new tables, restoring the second half of the working set that reset preserves.

Results

The reset and clean combo restores modified tables and deletes new tables, respectively, recycling a SQL engine. Here is a TestInsertInto profile with recycling:

   .     .    637:	t.Run(tt.WriteQuery, func(t *testing.T) {
   . 270ms    638:		e := mustNewEngine(t, harness)
   .     .    639:		ctx := NewContext(harness)
   .  20ms    641:		TestQueryWithContext(t, ctx, e, tt.WriteQuery, tt.ExpectedWriteResult, nil, nil)
   .  30ms    642:		TestQueryWithContext(t, ctx, e, tt.SelectQuery, tt.ExpectedSelect, nil, nil)
   .     .    643:	})

A reasonable initial response to this profile might be "This is terrible! 90% of the runtime is still spent on setup." But remember, we went from 3000 milliseconds to 300 milliseconds. Here is the before and after at the scale developers see for all tests:

> go test github.com/dolthub/dolt/go/libraries/doltcore/sqle/enginetest -count=1
ok  	github.com/dolthub/dolt/go/libraries/doltcore/sqle/enginetest	246.408s
ok  	github.com/dolthub/dolt/go/libraries/doltcore/sqle/enginetest	30.254s

There is room for improvement, but overall we achieved a 90% speedup. A developer might run go test ./... anywhere from one to hundreds of times for every PR. 4 minutes * 100 ~= a day, 40 seconds * 100 ~= an hour.

Next time we will talk about our CI shell script tests, bats. Bats are rarely run during active development, but remain a bottleneck for clearing GitHub PRs for merge. Here is a final side-by-side for comparison:

	bats tests	engine tests
runtime	26 minutes	30 seconds

Summary

We care about shipping software at Dolt, but every PR accumulates tests. Over time, the weight of tests themselves become a drag on development.

This blog shows how we narrowed the test feedback cycle using Dolt stored procedures. Database native Git-style versioning lets us recycle our test setup so we can do more coding and less waiting on tests. If you need to mock a MySQL database for your test suite. Dolt might be able to speed up your tests as well.

If you have any questions about Dolt, databases, or Golang performance reach out to us on Twitter, Discord, and GitHub!

Blog

Dolt Stored procedures

Reduce, Reuse, Recycle

Results

Summary

Get started with Dolt