Schema Migrations in Dolt

REFERENCE
6 min read

Dolt is the world's first version controlled SQL database. You can build applications with Git-style branches and merges on top of Dolt using all the frameworks you know and love: Laravel, Ruby on Rails, Django, .NET, Knex.js, etc. These frameworks all assume your database has a single schema.

Dolt versions schema and data. Different branches can have different schema. These frameworks can be made branch aware for data but different schemas usually require different code.

Changing the schema of the database is often called a schema migration. How does Dolt handle schema migrations? Do you change the schema on every branch? What if you want different schemas on different branches? This blog outlines how most of our customers handle schema migrations in Dolt.

Schema Migrations

Schema Migrations in Traditional Databases

In practice, schema migrations work something like this. For version one of your database schema you may have something like:

create database db;

use db;

create table employees (
	int id primary key, 
	first_name varchar(100), 
	last_name varchar(100)
); 

You save this SQL as a file called schema1.sql and stick it in a version control system like Git under a directory called migrations. You run the SQL in the file on your production database after it has been thoroughly reviewed. Whenever you want to bootstrap a new database for testing, you run schema1.sql and now your database schema looks like production.

Now, let's say later on you want to add a start_date and end_date column to your employees table. You make a schema patch file with something like:

alter table employees add column start_date date;
alter table employees add column end_date date;

You save this in a file called schema2.sql and check it into Git in the same migrations directory as schema1.sql. You run schema2.sql on your production database after everything has been thoroughly reviewed and your code has been updated to handle the new schema. You tell people if they want their development database to match production, they have to run all the .sql files in your migrations directory in order.

You can see this starts to become a bit complicated and luckily there are a bunch of tools like Liquibase and Redgate that codify and organize this process for you.

Frustratingly for our search engine optimization (SEO), before we had a real version controlled database like Dolt, these tools were referred to as "database version control". Dolt is true database version control dammit! But I digress...

Schema Migrations in Dolt

The same process outlined above can work in a Dolt database with a single branch. But a Dolt database with multiple branches complicates matters. There are a few approaches that we'll outline individually.

Everything Everywhere All at Once

Most people's initial intuition is to apply migrations to every branch at the same time. Everything Everywhere All At Once. We even have an open feature request for a Dolt procedure to facilitate this approach. The basic idea here is that the user wants to keep the schema the same on every branch, likely because the code that connects to the database can't handle multiple schemas but can handle multiple branches.

You can certainly pull something like this off in a tool that handles your migrations. First, you could query the dolt_branches table to get every branch. Then, you could iterate through the list of branches applying migrations.

Depending on your use case, this may be the best approach for you. If the migration is bad, you could run a dolt_reset('--hard') on every branch to undo it. This approach does require the schema be the same on every branch which may make things like developing new features on a branch more complicated.

Pros

  • Most similar to what you would do in a traditional database.
  • Simplest approach if every branch has the same schema.

Cons

  • May require some customization to your migration tool.
  • Hard to do transactionally. What do you do if one migration fails? Branches may be in different states. Complicated error states.
  • Only works if you want all branches to have the same schema.
  • No merge commits to track schema changes. Migration changes on each branch appear as independent commits in the log.

Migrations per Branch

The second approach is to treat each branch as a separate database with its own set of migrations. This approach allows you to have different schemas on different branches. If you want every branch to have the same schema, the migrations SQL is the same. If you want branches to have different schema, the migrations SQL is different. The tool you use to apply migrations runs each migration as if it was a separate database.

This approach means that your migration tool must be aware when a branch is created or deleted. It's not ideal duplicating branch data in your database and version control. However, if the number of branches is manageable, this might be a reasonable approach.

Pros

  • Works with existing migration tools.
  • Allows you to have different schemas on different branches.
  • Each branch is a separate database is an easy model to understand.

Cons

  • Branch data is duplicated in the database and the migration tool.
  • No merge commits to track schema changes. Migration changes on each branch appear as independent commits in the log.

Schema Branch and Merge

Finally, the recommended approach. In this approach, you have a branch that has your main schema. This is the branch where you apply migrations using your standard migration tooling. This branch can be main and have the main copy of your data.

Or, preferably for most use cases, your schema branch can be a separate branch with no data. For simplicity we'll call it schema. No data on the schema branch ensures only schema changes are merged when applying a migration. No data will change. The schema branch should be made off of the init commit of the database as to avoid merge conflicts. We have an open feature request to "cherry pick" schema changes so schema branches would not require a full merge, and thus, could be created from anywhere in the tree.

Then, to get schema changes on branches, first you do a dolt_schema_diff() between the schema branch and the branch you are connected to to see if there are any "pending" schema changes. If there are not, proceed as planned. If there are schema changes, perform a merge of the schema branch using dolt_merge().

You can run this process when it works best for your application. Merges are fast in Dolt so you can even run the process inline in your database connection code which is also usually the best place to put your branch connection logic. You can run the process manually per branch, more like a traditional migration. You can run the process when your application starts. It's up to your use case.

Pros

  • Schema changes isolated to single branch.
  • You can use your standard migration tools on your schema branch.
  • Merge commits in the logs indicate schema changes.
  • Application controlled schema merges/migrations.
  • If a schema change breaks, rollback on the schema branch and use the same process to roll out to all branches.

Cons

  • Requires application or custom logic to merge migrations to branches.
  • Schema can drift between branches depending on how you implement merge logic.
  • You have an extra schema branch to maintain.

Advantages of Dolt's Schema Migration Approach

In traditional databases, schema migrations are dangerous. For instance, a migration that drops a column is irreversible. In Dolt, every change is recoverable, instantly, using dolt_reset().

Dolt's branch model adds complexity to your migration process but also adds flexibility to have multiple parallel evolving schemas and data in your database. You can track how these changes come together via merge commits. Dolt gives you the tools to manage complexity.

Schema Overriding

An additional, recently released tool in the Dolt schema migration toolbox is Schema Overriding. As the name suggests, schema overriding allows you to override the schema on a branch, having all queries return results in another branches schema. This allows you full control over a branch's schema at runtime.

The feature is particularly useful for historical queries using as of but can also be used as a "test schema migration". In the process of seeing whether a schema migration will work on this branch? Set @@dolt_override_schema to your migration branch, test, and do a full merge once you are satisfied with the schema change.

Conclusion

Because of Dolt branches, we get a lot of questions about how to do schema migrations. This blog gives you a few approaches with the recommended approach being schema branch and merge. Questions? Come by our Discord and we'd be happy to help you find an approach that works for you.

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt

Or join our mailing list to get product updates.