Don't change your open-source database's storage engine

TECHNICAL
10 min read

It's all broken!

Want a technical challenge? Find an open source database and change its storage engine. You'll have to rewrite hundreds if not thousands of functions, validate your changes against test suites, and continuously run benchmarks to ensure good performance. You'll be surprised again and again how much of your database breaks and how it ever worked in the first place.

For the past year or so at Dolt, we've been working on switching Dolt to a new storage engine. We wrote the new engine to improve Dolt's performance and to pin a stable storage format for the release of Dolt 1.0. Replacing the old engine has been a huge endeavor.

Our first goal was to ship an alpha version of Dolt on the new engine with all of the SQL behavior that the old engine supported. Our SQL engine test suite was the goal line for the alpha version. We passed the engine test suite and shipped alpha about two months ago.

The second goal was to rewrite the CLI commands to work on the new engine. Our BATS test suite held the test coverage for those. Our recent work has been to unskip all of these.

BATS stands for Bash Automated Testing System. It's a TAP-compliant testing framework for Bash and it lets you test most unix programs. We've been using it for a long time. For example, here is a simple bats script that tests the unix command ls which lists files in a directory.

# In ls.bats

setup() {
  # delete the test directory if it exists
  if [ -d "./test_dir" ]; then
    rm -r ./test_dir
  fi
  # create a test directory
  mkdir test_dir && cd test_dir
}

@test "ls lists new files in a directory" {
  # assert no output from ls in an empty directory
  run ls
  [ "$status" = "0" ]
  [ "$output" = "" ]

  # create a file
  touch new_file

  # ls lists the new file
  run ls
  [ "$status" = "0" ]
  [ "$output" = "new_file" ]
}

You can run a bats test with the bats CLI tool. Here's a successful test:

$ bats test-ls.bats
 ✓ ls lists new files in a directory

1 test, 0 failures

And a failed test:

$ bats test.bats
 ✗ ls lists new files in a directory
   (in test file test.bats, line 21)
     `[ "$output" = "bad_file" ]' failed

BATS now pass with the new storage engine

There are currently have 2,165 BATS tests spread across 115 files. The tests cover all of Dolt's commands such as add, commit, pull, push, and clone including some of its SQL behavior. Most of these tests were skipped so we could focus on the engine tests first.

As we un-skipped the files, we discovered a couple missing pieces of functionality for new engine Dolt and a couple bugs.

PRs, PRs, and more PRs

One of the first set of issues we discovered was that our support for keyless tables was lacking. dolt merge and dolt diff were both broken or intentionally panicked. So we fixed that.

The second set of issues were about unique key violations. First, we discovered that we did not detect unique key violations. There was some machinery that tried to detect them, but it did not work.

Then, we discovered issues related to INSERT IGNORE which is a pretty obscure SQL verb.

INSERT IGNORE ignores bad rows. If you try to insert multiple rows in a single statement and one the rows is bad that row will not be inserted into the table.

For example:

CREATE TABLE test (pk INT PRIMARY KEY);
INSERT INTO test VALUES (1);

-- Using INSERT with duplicate PKs gets rolled back

INSERT INTO test VALUES (2), (2);
/*
duplicate primary key given: [2]
*/

SELECT * from test;
/*
+----+
| pk |
+----+
| 1  |
+----+
*/

-- Using INSERT IGNORE, ignores the second insert
INSERT IGNORE INTO test VALUES (2), (2);
/*
Query OK, 1 row affected
*/

SELECT * from test;
/*
+----+
| pk |
+----+
| 1  |
| 2  | <- PK 2 is inserted!
+----+
*/

This INSERT IGNORE functionality for duplicate unique keys was broken. Again, we fixed:

And still there were more. There was a nasty bug where if you pushed or pulled your Dolt database you would lose your conflicts:

And we also had to change the behavior of schema merges:

We caught all of these issues and fixed them thanks to our BATS tests. One year into the rewrite, we now expect the new storage engine's functionality to be very close to the old engine!

Here's are 100 random lines from a full BATS run as it stands now:

ok 1 1pk5col-ints: create a table with a schema file and examine repo
ok 37 1pk5col-ints: checkout table with branch of same name # skip Should distinguish between branch name and table name
ok 43 1pksupportedtypes: dolt table put-row with all types then examine table
ok 48 2pk5cols-ints: add a row where one of the primary keys is different, not both
ok 50 2pk5cols-ints: interact with a multiple primary key table with sql
ok 63 auto_increment: insert into auto_increment table with correct floating point rounding
ok 74 auto_increment: go forward then backwards
ok 107 blame-system-view: correct error message for table with no primary key
ok 109 blame-system-view: view is not included in show tables output
ok 110 blame-system-view: view has a deterministic order
ok 118 blame: works with HEAD as the commit ref # skip SQL views do no support AS OF queries
ok 121 blame: works with HEAD~3 as the commit ref # skip SQL views do no support AS OF queries
ok 133 case-sensitivity: capital letter column names. select with an as
ok 249 constraint-violations: ancestor contains fk, main child add, other parent remove, set null
ok 304 create-views: cannot create view referencing non-existant table
ok 351 default-values: Modify column move first forward reference default literal
ok 386 deleted-branches: can DOLT_CHECKOUT on SQL connection with existing branch revision specifier when dolt_default_branch is invalid
ok 399 diff: summary comparing two branches
ok 434 dump: SQL type - compare tables in database with tables imported file
ok 453 dump: JSON type - with filename name given
ok 490 export-tables: export a table with a string with commas to csv
ok 508 filter-branch: filter multiple branches
ok 509 filter-branch: with missing table
ok 511 filter-branch: filter until commit
ok 556 foreign-keys: Commit all
ok 559 foreign-keys: Commit --force
ok 578 foreign-keys: child violation correctly detected
ok 582 foreign-keys: Delayed foreign key resolution
ok 604 git-dolt: update updates the specified pointer file to the specified revision
ok 610 import-create-tables: create a table with json data import. bad json data.
ok 647 import-create-tables: csv files has fewer columns filled with default value
ok 700 import-update-tables: string too large for column regression
ok 705 import-update-tables: csv files has more column than schema and different order
ok 740 index-on-writes-2: delete none two_pk, >, non-pk
ok 742 index-on-writes-2: delete none two_pk, =, pk + non-pk
ok 743 index-on-writes-2: delete none one_pk, =, pk + non-pk
ok 771 index-on-writes-2: update none one_pk, >, non-pk
ok 778 index-on-writes-2: update none two_pk, >=, pk + non-pk
ok 779 index-on-writes-2: update none one_pk, >=, pk + non-pk
ok 819 index-on-writes: delete all one_pk, <, pk
ok 853 index-on-writes: update all one_pk, <=, pk
ok 865 index-on-writes: update all one_pk, <>, pk + non-pk
ok 883 index: ALTER TABLE CREATE INDEX unnamed
ok 890 index: Disallow 'dolt_' name prefix
ok 897 index: dolt table rm
ok 900 index: SELECT = Primary Key
ok 976 keyless-foreign-keys: dolt table rm
ok 982 keyless-foreign-keys: Disallow change column type when SET NULL
ok 1066 keyless: batch import with keyless unique index
ok 1074 log: properly orders merge commits
ok 1091 merge: 3way merge doesn't stomp working changes
ok 1097 merge: dolt commit fails with unmerged tables in working set
ok 1127 multiple-tables: dolt reset --hard
ok 1135 no-repo: dolt status outside of a dolt repository
ok 1151 no-repo: dolt table outside of a dolt repository
ok 1171 primary-key-changes: add an index after dropping a key, and then recreate the key
ok 1272 remotes: call a clone's remote something other than origin
ok 1283 remotes: force push to main
ok 1351 replication: pull multiple heads, one invalid branch name
ok 1371 revert: HEAD~1
ok 1378 revert: invalid hash
ok 1418 schema-export: export all tables to file
ok 1434 schema-import: --update adds new columns # skip schema import --update is currently deleting table data
ok 1495 sql-checkout: DOLT_CHECKOUT throws error on branches that don't exist
ok 1534 sql-commit: The -f parameter is properly parsed and executes on CALL
ok 1559 sql-create-database: create database IF NOT EXISTS on database that already exists doesn't throw an error
ok 1563 sql-create-database: sql drop database errors for info schema
ok 1570 sql-create-tables: create a table that uses all supported types
ok 1591 sql-create-tables: You can drop temp tables
ok 1612 sql-diff: reconciles RENAME TABLE with schema changes # skip this test is generating extra sql
ok 1637 sql-fetch: dolt_fetch rename ref
ok 1683 sql-merge: CALL DOLT_MERGE correctly merges branches with differing content in same table without conflicts
ok 1690 sql-merge: DOLT_MERGE detects conflicts, returns them in dolt_conflicts table
ok 1726 sql-multi-db: sql join tables in different databases
ok 1728 sql-multi-db: fetch multiple databases with appropriate tempdir
ok 1776 sql-push: dolt_push active branch # skip upstream state lost between sessions
ok 1789 sql-push: not specifying a branch throws an error on CALL
ok 1790 sql-push: pushing empty branch does not panic
ok 1822 sql-server-config: dolt_replicate_heads is global variable
ok 1827 sql-server-config: persist invalid global variable value during server session
ok 1855 sql-server: DOLT_MERGE ff works
ok 1879 sql-server: create and drop database with --data-dir
ok 1881 sql-server: create database with existing repo
ok 1891 sql-shell: specify data-dir
ok 1925 sql-spatial-types: allow index on non-spatial columns of spatial table
ok 1951 sql: errors do not write incomplete rows
ok 1961 sql: basic inner join
ok 1968 sql: addition on both left and right sides of comparison operator
ok 1983 sql: commit hash qualified DB name in select
ok 2010 sql: group by statements
ok 2022 sql: stored procedures creation check
ok 2030 sql: dolt diff table correctly works with IN
ok 2053 status: dolt reset ref properly manages staged changes as well
ok 2054 status: dolt reset throws errors for unknown ref/table
ok 2062 system-tables: check unsupported dolt_remote behavior
ok 2072 system-tables: join dolt_commits and dolt_commit_ancestors
ok 2100 types: BOOLEAN
ok 2142 types: SET('a','b','c')
ok 2151 types: TINYTEXT
ok 2159 verify-constraints: Ignores NULLs

Conclusion

Dolt's Git repository will be 3 years old on July 24th (although Dolt itself is older than. And since the beginning, BATS has been the go-to test tool for Dolt features. Even after all this time, there's a lot more to work to do to ensure Dolt has good test coverage. Approaches we're currently exploring to improve our coverage are fuzzing and grammar crawling.

We're super excited to wrap up this journey we've made over the last year and get some of the new performance improvements into customer hands. Follow our LinkedIn and Twitter for updates related to the Dolt 1.0 release later this Fall and drop by our Discord if you have any questions.

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt

Or join our mailing list to get product updates.