Dolt is a Database

6 min read

Aaron, Brian and I founded DoltHub as Liquidata in 2018. Our mission was to add liquidity to the data market. How could we get more data shared? Our main hypothesis was adding branch/merge functionality as in source code to data would facilitate orders of magnitude more data sharing.

Over the past three years DoltHub has morphed into a database company. We built Dolt, the first SQL database you can clone, branch, merge, and diff just like a Git repository. It's fast becoming a drop in replacement for MySQL or any other Online Transaction Processing (OLTP) database.

This blog walks you through how we think about Dolt as an OLTP database and some of the gaps we're trying to fill over the next few months. If you were waiting for Dolt to be ready for your database use case, now is the time to give it a shot.

Dolt's SQL Server

Whenever you clone or init a Dolt database, you can start a MySQL compatible server by running dolt sql-server. This spins up a running server on port 3306 that you can connect to with any MySQL client. The name of the database is the name of the directory the dolt database is in with the - characters replaced by _. It's really that simple.

In other words, Dolt ships with a database server built in.

Here's a quick example to show you how simple this is.

dolt $ dolt clone dolthub/hospital-price-transparency
cloning https://doltremoteapi.dolthub.com/dolthub/hospital-price-transparency
7,761,699 of 7,761,699 chunks complete. 0 chunks being downloaded currently.
dolt $ cd hospital-price-transparency
hospital-price-transparency $ dolt sql-server
Starting server with Config HP="localhost:3306"|U="root"|P=""|T="28800000"|R="false"|L="info"

And then in another shell:

~ $ mysql --user root --database hospital_price_transparency --host 127.0.0.1 --port 3306 -e "show tables"
+-----------+
| Table     |
+-----------+
| cpt_hcpcs |
| hospitals |
| prices    |
+-----------+

Take a moment to marvel at that.

We were able to get 25GB, over 100M hospital prices, schema and data, and get a running MySQL compatible database in under 5 minutes.

If I want to make changes to it and give my changes back to the maintainer, I can just do it and make a Pull Request on DoltHub, just like with source code. If I want to use this data in my application or data pipeline, all I need to do to get updates is run dolt pull. Dolt is a beautiful and elegant solution to the data distribution problem.

A SQL database with Git-style version control is incredible but how much can you trust this server? Can you run it in production?

Correctness

The first thing we tackled was SQL Correctness. Do queries parse and return the right results? Do we support the required SQL grammar to be a legitimate database?

For this, we turned to the good folks at SQLite who had created sqllogictest. These tests are automatically generated using a known good MySQL instance to judge correctness. The tests are weird because of this. They produce SQL no human would dream up.

In December 2019 we celebrated getting to 90% correctness. Today, we measure our correctness at every Dolt release and publish the results in our documentation. We're currently at 98%. We expect to get over 99% by the end of the month.

Long and short here is that it is very rare to find a query in Dolt that parses in MySQL but does not in Dolt or a query that returns results that are different than MySQL. Dolt comes correct.

Features

On top of correctness, SQL databases have a lot of features. There are schema management features. There are functions. There are transactions. We've quickly been cranking through features as customers ask for them.

In schema management, Dolt has most of the MySQL types including JSON types. Dolt has secondary indexes. Dolt has foreign keys. Dolt has check constraints, a feature only added in MySQL 8.

Dolt has good SQL function coverage and supports window functions. We publish a complete list of MySQL functions in our documentation including which ones Dolt currently supports. Dolt currently supports 129 of 436 MySQL functions. We started with the most popular ones so we support more like 95% of function usage. Functions are easy to build and usually take at most 4 hours to implement so if you want one, just ask.

Lastly, the last big hurdle that we just cleared is transaction support. Transactions are behind a feature flag right now as we work out the kinks but we expect to remove the feature flag by the end of the month.

Dolt is still missing a few big features. Dolt only supports a single database in a running instance. Dolt does not support user level permissions. Dolt does not support geographical types. Encoding and collation support is minimal. These are all on the roadmap and will probably get into Dolt this year.

Dolt is pretty feature complete and we'll add features for your use case. We're fast at this. A lack of a particular feature should not stop you from adopting Dolt. Just ask for what you want and we'll give you a date, especially if you are willing to buy a support plan.

Performance

That's all well and good but will Dolt perform for your use case? We think about performance in two ways: core operations we can benchmark and complex queries which we cannot.

With every Dolt release we publish a set of performance benchmarks we create against MySQL using sysbench. Currently Dolt is 9X slower on reads and 16X slower than MySQL on writes by these benchmarks. Averaging across reads and writes, we're about 12X slower than MySQL. These measure simple operations like row or range retrieval. We think we can get performance to 2-4X MySQL by the end of the year.

Complex queries seem to come in about the same 12X slower than MySQL range. We judge these from our customer use cases. We have a couple intrepid customers who hit us with nine table JOINs and we worked over the past 6 months to make our analyzer better. Unlike core operations, with complex queries, you are more likely to hit a case that either does not return or takes 1000X MySQL. Anecdotally, these issues have gone from about 1 in 10 queries in January to about 1 in 30 queries now. We're making progress. If you find a broken query, send it our way. We need examples to help improve the query analyzer.

Our early customers have found that even though Dolt is slower, it's still fast enough that the speed difference is mostly not noticeable at the application layer. If a Dolt query takes 12ms to return vs 1ms, that latency difference is overwhelmed by the application latency. Performance is certainly close enough that you should give it a try.

Lastly, we have a read path cheat to replicate to MySQL to give MySQL performance to reads. Run Dolt as your write database and replicate to MySQL for reads. Do all your versioning operations on Dolt. We haven't implemented replicate to MySQL yet but if it interests you, please let us know.

Operations

Operations to us is about backup and replication. Beyond those features, do you have to operate the Dolt instance yourself, or will someone do it for you and give you a connection string?

Dolt only supports backup using remotes via clone, push, pull. Backups via push are very lightweight and can happen in the background while your server is running. A push will backup the state of the Dolt database to the last Dolt commit.

Replication is not implemented yet. We need to implement remotes, push, pull, and fetch as SQL functions to get a basic Dolt replication story. The idea would be to run push on writes on your master Dolt and run pull on reads for your Dolt replicas. All these Dolt servers would share the same remote. This setup combines replication and backup into the remote concept.

After the above, we could implement MySQL style replication via binlog or statement/row to be able to replicate to MySQL or other Dolt instances. This would likely be as a performance cheat but some users may prefer this style of replication to push and pull replication.

The last step is to package up the above into a hosted service a la Amazon RDS where you don't have to do any of this yourself. We're going to get to this in 2021.

We've arrived

Dolt is a database. You can run it in production just like MySQL or Postgres. With Dolt, you get Git-style version control to give to your application or make operating your database easier. Dolt has some missing features and is slower than alternatives. We'll be closing these gaps in 2021. Try it today. If you're not quite there and want to discuss or make feature requests, hit us up on our Discord.

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt

Or join our mailing list to get product updates.