DoltHub is the central repository for Dolt's
version-controlled databases. We like to call it the Github of databases.
It lets you query, share, and collaborate on Dolt databases.
Last month, we brought Dolt's new storage format to DoltHub. It's
also the default format for new databases on the CLI. The new format is a
complete rewrite of the storage layer of Dolt. It makes your Dolt database
faster. The old format benchmarked 7.2x slower than MySQL, the new format
benchmarks at just 2.5x.
Starting today, you can migrate your old format databases on DoltHub using a
Why should I migrate my database?
We will eventually deprecate the current storage format. Date TBD.
The new format is faster!
What's the worst that could happen?
How to migrate your Dolt databases on DoltHub
If you see the
badge on your database, congrats! Your database is already in the new format.
If you don't see the
badge on your database, or dolt version tells you that the format is
__LD_1__, you will need to migrate your database.
A couple of notes before you migrate your database:
The previous storage format is incompatible with the new storage format. If
you migrate your database on DoltHub, any clones will need to migrated as
well. You can use the dolt migrate cli command for that.
Any forks will be disconnected from the migrated database. You will be unable
to create a pull request across those forks until you make new ones.
Migrating your database on DoltHub will delete any un-merged pull requests.
To migrate your database, go to the database settings tab:
Then click on the Migrate Database header:
Finally click on the migrate button:
How long will it take?
The migration is an expensive process. It effectively rewrites every row of all
commits in your database. It may take a while.
The duration is dependent on the number of commits and the total size of the
database. Migrating this 5.72Gb repo with 406
commits took approximately 100 minutes.
How does the migrate button work?
Clicking the button kicks off an asynchronous Kubernetes job that clones,
migrates, and re-uploads your database to DoltHub.
The job uses the dolt migrate CLI command to migrate the database. The migrate
command walks the commit graph from all heads, and migrates each commit found.
Once the migration is finished, a random key is generated and the database is
uploaded to S3 at that key.
To update the database on DoltHub, we essentially hot-swap the S3 storage keys
in our database. The primary key of the database's metadata object also happens
to be the S3 key, which makes changing it tricky due to foreign keys. We decided
to create a copy of the metadata object, and move any foreign key children over
to the new object. This was simple but painful to implement because there are 16
tables that need to be updated.
In addition, the migration changes all of the commit hashes because the storage
format is different. Commit hashes are the output of a cryptographic hash function, so if a single bit changes, the commit hash will change. Some of our
DoltHub state, like pull requests, reference a commit hash so we need to update
them with the new commit hashes. During the job, we build a mapping of old
commit hashes to new commit hashes. We send that mapping to DoltHub at the end
of the migration.
Migrating all of our dolt databases on DoltHub is one of the last steps in the
new storage format's release checklist. Once all of the databases are migrated,
we will start to deprecate the old format and officially announce a Dolt 1.0.
If you would like to learn more about the new storage format, you can read these blogs: