Migrate your Dolt database to the new format on DoltHub

WEB
4 min read

DoltHub is the central repository for Dolt's version-controlled databases. We like to call it the Github of databases. It lets you query, share, and collaborate on Dolt databases.

Last month, we brought Dolt's new storage format to DoltHub. It's also the default format for new databases on the CLI. The new format is a complete rewrite of the storage layer of Dolt. It makes your Dolt database faster. The old format benchmarked 7.2x slower than MySQL, the new format benchmarks at just 2.5x.

Starting today, you can migrate your old format databases on DoltHub using a migrate button.

Why should I migrate my database?

Couple reasons:

  1. We will eventually deprecate the current storage format. Date TBD.

  2. The new format is faster!

  3. What's the worst that could happen?

from xkcd.com/2261

How to migrate your Dolt databases on DoltHub

If you see the badge on your database, congrats! Your database is already in the new format.

If you don't see the badge on your database, or dolt version tells you that the format is __LD_1__, you will need to migrate your database.

A couple of notes before you migrate your database:

  1. The previous storage format is incompatible with the new storage format. If you migrate your database on DoltHub, any clones will need to migrated as well. You can use the dolt migrate cli command for that.

  2. Any forks will be disconnected from the migrated database. You will be unable to create a pull request across those forks until you make new ones.

  3. Migrating your database on DoltHub will delete any un-merged pull requests.

To migrate your database, go to the database settings tab: settings

Then click on the Migrate Database header: settings

Finally click on the migrate button: settings

That's it.

How long will it take?

The migration is an expensive process. It effectively rewrites every row of all commits in your database. It may take a while.

The duration is dependent on the number of commits and the total size of the database. Migrating this 5.72Gb repo with 406 commits took approximately 100 minutes.

How does the migrate button work?

Clicking the button kicks off an asynchronous Kubernetes job that clones, migrates, and re-uploads your database to DoltHub.

The job uses the dolt migrate CLI command to migrate the database. The migrate command walks the commit graph from all heads, and migrates each commit found. Once the migration is finished, a random key is generated and the database is uploaded to S3 at that key.

To update the database on DoltHub, we essentially hot-swap the S3 storage keys in our database. The primary key of the database's metadata object also happens to be the S3 key, which makes changing it tricky due to foreign keys. We decided to create a copy of the metadata object, and move any foreign key children over to the new object. This was simple but painful to implement because there are 16 tables that need to be updated.

In addition, the migration changes all of the commit hashes because the storage format is different. Commit hashes are the output of a cryptographic hash function, so if a single bit changes, the commit hash will change. Some of our DoltHub state, like pull requests, reference a commit hash so we need to update them with the new commit hashes. During the job, we build a mapping of old commit hashes to new commit hashes. We send that mapping to DoltHub at the end of the migration.

Conclusion

Migrating all of our dolt databases on DoltHub is one of the last steps in the new storage format's release checklist. Once all of the databases are migrated, we will start to deprecate the old format and officially announce a Dolt 1.0.

If you would like to learn more about the new storage format, you can read these blogs:

Have even more questions or having trouble migrating your database? Come talk to us on Discord, we'd be happy to help!

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt

Or join our mailing list to get product updates.