Database Performance: Dolt vs MySQL

7 min read

Dolt is a version controlled SQL database. Dolt's query interface is SQL, and it has Git-like version control features. Adding version control features to a SQL database has performance trade offs when comparing Dolt with traditional databases like MySQL. In particular relational databases use highly optimized storage layouts that are motivated query engine performance. Implementing version control features requires making compromises about the storage layout that impact query performance. We believe that the explosion of data centric business processes and applications justifies a database that makes this tradeoff.

The purpose of this post is to show how we measure those performance tradeoffs and set clear expectations on Dolt performance. Our goal is to get Dolt to the point where it is no more than 2-4 times slower than MySQL. We spend the balance of this post showing how we measuring ourselves against that goal.

Background

Database solutions, for example MySQL and Postgres, are designed to be application backing stores. More precisely they are optimized for online transaction processing (OLTP) use cases, defined loosely as as:

Processing in which the system responds immediately to user requests.

Dolt's version control features were designed with a different set of use-cases in mind, most of which are not OLTP. For example, we are currently onboarding a customer using Dolt as a backend for researcher output. They use Dolt's branching functionality to compare results submissions before combining those submissions to a master copy. We have other users who are using Dolt as an ETL ingestion point, ensuring that all third party data is robustly versioned. Dolt's commit graph acts as a "configurable boundary" for their organization, providing users with simple tools for controlling what data comes through that boundary.

Despite early adoption from non-OLTP use case, users still want fast query performance. Faster is obviously better. We think as Dolt gets faster, Dolt could make sense for some OLTP use cases that are willing to trade performance for version control features. We are really interested in supporting as many use cases as we can long term. Thus, we are committed to making Dolt as fast as possible.

Expectations

As stated at the outset, our goal is to get Dolt to the point where it is no more than 2-4 times slower than MySQL for sysbench's standard test suite of OLTP and non-OLTP tests. Additionally, we will implement custom tests that emphasize common use-cases for Dolt, which we expect to be no more than 2 times slower than MySQL. Finally, we consider anything that is 10 times slower than MySQL a bug that we will prioritize fixing.

Speed Multiple Response
2-4 acceptable
4-10 not urgent, but needs work
10-20 bug
>20 urgent bug

Approach

We recently blogged about our approach to benchmarking Dolt, though our documentation is the place to go for the latest information. To recap, we use sysbench, an industry standard tool for benchmarking databases to evaluate both Dolt and MySQL using the same set of tests.

For example, our most recent release is 0.22.6, and we wanted to compare this to 0.22.2, the prior release. We ran the following command, noting that $DOLT_CHECKOUT is a checkout of Dolt's GitHub repository:

$ cd $DOLT_CHECKOUT/benchmark/perf_tools
$ ./run_benchmarks.sh all 100000 oscarbatori 0.22.2 0.22.6

At a high level, for each tag and MySQL this:

  • runs the database in a Docker container
  • runs sysbench in a separate Docker container
  • collects the results in a file

All the results, for MySQL and each Dolt tag, are associated with a single run ID which denotes an invocation of run_benchmarks.sh. This means that we can identify rows that were run on the same hardware at the same time, giving us a degree of hardware context isolation. For example, this run was associated with the unique run ID 7dffeeb22fb11efaec478ef3, which identifies the results:

$ ls -ltr output
-rw-r--r--  1 oscarbatori  staff  6846 Dec  4 11:13 7dffeeb22fb11efaec478ef3.csv

The all parameter denotes the following list of tests, though passing a comma separated list of tests also works for more targeted benchmarking:

bulk_insert
oltp_delete
oltp_insert
oltp_point_select
oltp_read_only
oltp_read_write
oltp_update_index
oltp_update_non_index
oltp_write_only
select_random_points
select_random_ranges

This is the complete list of standard sysbench tests. You can find these results posted to DoltHub, where you can run SQL against them.

Results

Here we present the results as they appear in the Dolt documentation:

Test Dolt MySQL Multiple
bulk_insert 308783 2278717 7.4
oltp_delete 883 21847 24.7
oltp_insert 1302 6099 4.7
oltp_point_select 6401 32925 5.1
oltp_read_only 268 1894 7.1
oltp_read_write 90 1328 14.8
oltp_update_index 496 6199 12.5
oltp_update_non_index 803 5971 7.4
oltp_write_only 141 3335 23.7
select_random_points 457 4101 9
select_random_ranges 97 6109 63
mean 19.5

This analysis highlighted the select_random_ranges result as particularly bad, and it will in fact be fixed in the next release.

In the previous section we pointed at these results hosted on DoltHub. We can easily clone and analyze the results in SQL as follows to compare how performance has changed between the two most recent versions:

$ dolt clone dolthub/dolt-benchmarks && dolt-benchmarks
cloning https://doltremoteapi.dolthub.com/dolthub/dolt-benchmarks
185 of 185 chunks complete. 0 chunks being downloaded currently.
$ dolt sql
~/Documents/dolt-dbs/dolt-benchmarks/test/dolt-benchmarks|>>  dolt sql
# Welcome to the DoltSQL shell.
# Statements must be terminated with ';'.
# "exit" or "quit" (or Ctrl-D) to exit.
dolt_benchmarks> SELECT
              ->   dolt_0222.run_id as run_id,
              ->   dolt_0222.test_name as test_name,
              ->   dolt_0222.sql_transactions as dolt_0222,
              ->   dolt_0226.sql_transactions as dolt_0226,
              ->   mysql.sql_transactions as mysql,
              ->   ROUND((1.0 * mysql.sql_transactions) / dolt_0222.sql_transactions, 1) as dolt_0222_multiple,
              ->   ROUND((1.0 * mysql.sql_transactions) / dolt_0226.sql_transactions, 1) as dolt_0226_multiple
              -> FROM
              ->   sysbench_benchmark as dolt_0222
              ->   inner join sysbench_benchmark as dolt_0226
              ->     on dolt_0222.run_id = dolt_0226.run_id
              ->     and dolt_0222.test_name = dolt_0226.test_name
              ->   inner join sysbench_benchmark as mysql
              ->     on dolt_0222.run_id = mysql.run_id
              ->     and dolt_0222.test_name = mysql.test_name
              ->
              -> WHERE
              ->   dolt_0222.run_id = '7dffeeb22fb11efaec478ef3'
              ->   and dolt_0222.database = 'dolt'
              ->   and dolt_0222.committish = '0.22.2'
              ->   and dolt_0226.database = 'dolt'
              ->   and dolt_0226.committish = '0.22.6'
              ->   and mysql.database = 'mysql'
              -> ORDER BY
              ->   dolt_0222;
+-----------------------+-----------+-----------+---------+--------------------+--------------------+
| test_name             | dolt_0222 | dolt_0226 | mysql   | dolt_0222_multiple | dolt_0226_multiple |
+-----------------------+-----------+-----------+---------+--------------------+--------------------+
| oltp_insert           | 1302      | 1167      | 6099    | 4.7                | 5.2                |
| oltp_point_select     | 6401      | 4819      | 32925   | 5.1                | 6.8                |
| oltp_read_only        | 268       | 228       | 1894    | 7.1                | 8.3                |
| bulk_insert           | 308783    | 308783    | 2278717 | 7.4                | 7.4                |
| oltp_update_non_index | 803       | 767       | 5971    | 7.4                | 7.8                |
| select_random_points  | 457       | 400       | 4101    | 9                  | 10.3               |
| oltp_update_index     | 496       | 461       | 6199    | 12.5               | 13.4               |
| oltp_read_write       | 90        | 88        | 1328    | 14.8               | 15.1               |
| oltp_write_only       | 141       | 122       | 3335    | 23.7               | 27.3               |
| oltp_delete           | 883       | 789       | 21847   | 24.7               | 27.7               |
| select_random_ranges  | 97        | 93        | 6109    | 63                 | 65.7               |
+-----------------------+-----------+-----------+---------+--------------------+--------------------+

Conclusion

Our conclusion from this is that while Dolt performance has improved, there is still a long way to go. As we move further along with our implementation it will become progressively harder to find large performance improvements. This post laid out our performance commitments to our users and customers, present and future, and demonstrated our mechanism for repeatably measuring ourselves against that commitment with every release of Dolt.

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt