Dolt is a MySQL compatible database with
killer version control features. Its data diff, branch, and merge features give
applications version control functionality at the database layer. And its
git-like interface makes it one of the most developer friendly databases too.
Over the last year, the team at DoltHub has been working on a new database
storage format which improves Dolt's performance bringing it closer to MySQL.
The new format is currently
just ~3x slower than MySQL. Compared to the old format which benchmarks at 8.3x,
the new format is considerably faster. The interface is 100% compatible so no
changes other than a migration are required. You can read more about how we
benchmark Dolt and how the new format is different in the footnotes.
Today, we're officially launching support for the new format on
DoltHub! 🎉 Any new database created through the
DoltHub UI will use the new format and you can push existing new format
databases to DoltHub as well.
New format databases have a special
How did we add support for the new format on DoltHub?
When we swapped out the storage formats, any machinery that used the storage
representation of a row directly had to be redone. Typically, this meant a
wrapper around a byte buffer that has functionality to decode the row into
Golang types. Some of the machinery could be changed to use a non-storage row
representation (a slice of Go types). Others had to be redone completely.
One difficulty was getting DoltHub's test suite running in the new format. As part of our tests, we have static Dolt databases checked into the repository. The first issue was that these Dolt databases were old and were difficult to migrate to the new format. I had to checkout old versions of Dolt and run migration scripts with custom modifications. Overall I felt pretty dirty doing that kind of thing but in the end it worked. 🤷
The second issue was that all of the commit hashes changed in the new format. Because the row data of each commit is represented differently the commit hashes necessarily changed. Any test that used a commit hash had to be rewritten. For each test a mapping between commits was created:
cmMap := mapCommitsAcrossFormats(
It was pretty painful to build these mappings. First, a list of all commits were
SELECT * from dolt_commits; then the commit messages were used to
manually identify matching commits. A possible improvement here is to automatically encode this mapping during the migration.
The most difficult task was reimplementing the scoreboard calculation for DoltHub's data bounties. The scoreboard compresses authorship information of the entire database. It has to track which rows were inserted by a bounty participant and who else modified individual cells of that row. All of this is done without storing additional logs or metadata. For each PR, a diff is calculated and that is used to attribute each cell to a participant. This is what a scoreboard typically looks like:
You can view the full bounty here
How to use the new format on DoltHub
If you don't have an existing database, create one here and run a SQL query or import a CSV. On the CLI, you can use
dolt init --new-format and then push the database to DoltHub.
If you have an existing database on DoltHub, follow these instructions to migrate it:
Clone your database locally using
Migrate the database using
Create a new database on DoltHub to house your new format database.
Add the remote to Dolt by following the instructions on the database page:
- Delete your old Dolt database on DoltHub and inform any colleagues of the new remote URL.
We're nearing the end of the release process for the new format. We are testing
it using the
bounty. The bounty database's size is ~85Gb which has helped uncover some issues
and make us more confident about a 1.0 release. Soon, we'll also add a migration
button to DoltHub to migrate your databases with one click.
The team at DoltHub has been eager to get the new format released. The performance wins have been super exciting to see and we can't wait for more users to switch over. Come talk to us on Discord if you have any questions!
If you'd like to learn more about Dolt's new storage format: