Dolt is the world’s first SQL database with Git-style version control. You can branch, merge, diff, and even rebase your relational data, in all the same ways Git allows you to manage your source code. Dolt’s novel data storage layer enables all these operations to be fast, while efficiently storing data versions. Dolt is compatible with MySQL tools and protocols, without actually including any MySQL source code. (If you’re looking for a Postgres flavored version, check out the DoltgreSQL project we’re working on.)
In this blog post, we’re digging into Dolt’s rebase support. We launched support for rebasing with Dolt earlier this year. It was a fun feature to build and is very useful for tidying up a branch’s commit history. It was also an exciting milestone for Dolt as we started to get deep into the advanced power tools of Git and bringing those features to Dolt. Today, we’re excited to announce a few improvements to Dolt’s rebase support: controlling how empty commits are handled, and resolving data conflicts during an interactive rebase.
Rebasing
Rebasing is a powerful concept from Git that allows you to build a new section in a repository’s commit graph, based off of an existing section. There are lots of ways to use rebase, but one of the most common is to rebase a feature branch off of its upstream branch to include the most recent changes, while keeping the history clean and free of merge commits. Another common use of rebase is to alter the commit history by combining commits, dropping commits, changing commit messages, or reordering the sequence of commits.
In the next two sections, we’ll take a look at two recent enhancements to Dolt’s rebase support. The first example shows how to work with empty commits when rebasing, and the second example shows how you can deal with data conflicts while rebasing.
Empty Commits
Both Git and Dolt support options for handling “empty” commits when rebasing or cherry-picking. Git defines two types of “empty” commits. In the first type, the commit starts off as empty – that means the commit doesn’t contain any changes and was created using the --allow-empty
flag with git commit
or dolt commit
. The second type of empty commit is a commit that becomes empty when applied through a rebase or cherry-pick. This can happen when all the changes in a commit have already been included on the branch being rebased.
When rebasing, commits that start off as empty are kept by default. The reasoning is that if an empty commit is in your commit history, it was put there explicitly through the --allow-empty
flag, so it should be kept when rebasing commit history. The second type of “empty” commit, commits that become empty during the rebase, are handled differently. By default, those commits are dropped from the output of rebasing by default when performing non-interactive rebasing. Dolt now supports the --empty
flag, as defined by Git, to allow users to specify the handling of commits that become empty during rebasing.
Example
Let’s look at an example of using the new --empty
option when rebasing with Dolt.
# create a new directory and initialize it as a Dolt database
mkdir club_members && cd club_members
dolt init
# Create a table to store a list of club members
dolt sql -q "create table members (name varchar(100) primary key, address varchar(100));"
dolt commit -Am "creating members table"
# Create a new branch from this point in the commit graph
dolt branch new-members
# Insert a new member on the main branch (past where the new-members branch was created)
dolt sql -q "insert into members values ('Adam Maitland', 'White River, Connecticut');"
dolt commit -am "New member: Adam Maitland"
# Checkout the new-members branch and add some new members
dolt checkout new-members
dolt sql -q "insert into members values ('Delia Deetz', 'New York, New York');"
dolt commit -am "New member: Delia Deetz"
dolt sql -q "insert into members values ('Adam Maitland', 'White River, Connecticut');"
dolt commit -am "New member: Adam Maitland"
At this point, our commit graph looks like this:
Now let’s rebase the new-members
branch onto the tip of its upstream branch (main
) so that we get a nice, tidy, linear history on our branch that includes the most recent changes from main
. Astute readers will have noticed that both branches contain a commit that adds an identical record for Adam Maitland to the members
table. Rebasing will start at the tip of the main
branch and then replay each commit from the new-members
branch. When it applies the commit that adds Adam Maitland, that commit won’t generate any changes in our database, since an identical record already exists. The --empty
option allows us to specify if we want Dolt to keep this empty commit, or to drop it. If we don’t specify the --empty
option, then commits that become empty will be dropped by default. Let’s see what happens when we rebase using --empty=keep
.
Still on the new-members branch, start an interactive rebase onto the tip of main:
dolt rebase -i --empty=keep main
When your editor opens up with the rebase plan, go ahead and keep the default rebase plan and exit the editor to start the rebase. After you exit the editor, you should see a message that rebasing finished successfully.
Now, when we look at the commit log for the new-members
branch, we see the most recent commits from main
included, we have a linear history without any merge commits, and we can see that the second “New member: Adam Maitland” commit was kept in our commit history, even though it didn’t actually make any changes to our database. If we hadn’t specified --empty=keep
when we started the rebase, that commit would not be included.
dolt log --graph --oneline
* commit fep5oe7n487688vecukabpirmtdtljgn (HEAD -> new-members) New member: Adam Maitland
* commit b6q7v61cc5eegr6583uqhu546bdi0r6b New member: Delia Deetz
* commit d5vnc2jr9eustri2790hmq979l461ct7 (main) New member: Adam Maitland
* commit h4h6npqajg7cpmpd3ldio3lmml7cnu12 creating members table
* commit d01t96q7m0ei640c99ce1l9crokqj3ts Іnіtialize datа repository
Just to double-check, let’s use dolt show
to look at each of the Adam Maitland commits. In the first commit, we see it does indeed make a data change to the members
table:
dolt show d5vnc2jr9eustri2790hmq979l461ct7
commit d5vnc2jr9eustri2790hmq979l461ct7 (main)
Author: Jason Fulghum <jason@dolthub.com>
Date: Thu Aug 29 13:21:29 -0700 2024
New member: Adam Maitland
diff --dolt a/members b/members
--- a/members
+++ b/members
+---+---------------+--------------------------+
| | name | address |
+---+---------------+--------------------------+
| + | Adam Maitland | White River, Connecticut |
+---+---------------+--------------------------+
And in the second “New member: Adam Maitland” commit, we see that it doesn’t contain any data changes:
dolt show fep5oe7n487688vecukabpirmtdtljgn
commit fep5oe7n487688vecukabpirmtdtljgn (HEAD -> new-members)
Author: root <root@localhost>
Date: Thu Aug 29 13:26:17 -0700 2024
New member: Adam Maitland
The --empty
option is handy for controlling your commit history when a commit becomes empty during a rebase. By default, without specifying the --empty
option Dolt will drop any commits that become empty during rebasing, however if you want to keep them in your commit history, all you have to do is specify --empty=keep
when you start the rebase.
Conflicts
Now that we’ve covered empty commit handling in rebase, let’s change gears and talk about resolving data conflicts while rebasing…
When replaying the commits from a branch on top of a new set of commits from the upstream branch, there’s a chance that the commits won’t apply cleanly. This is called a conflict. In Dolt, there are two types of conflicts: data conflicts and schema conflicts. In this post, we’ll be dealing only with data conflicts – schema conflicts are not supported yet in Dolt rebase. If schema conflict resolution during rebase is something you or your team needs, comment on the issue in our backlog to let us know.
Let’s consider the new-members
branch from our previous example, which was originally created from a main
branch. As we’ve been working on our data on the new-members
branch, new changes were also been added to the main
branch. If both of our branches have added a row with the same primary key, but different non-primary key values, then attempting to rebase new-members
onto the latest commit from main
will result in a data conflict and the rebase will be paused while we resolve the conflict. Since there are two competing changes with the same primary key, Dolt needs user input to tell it which one to accept.
Example
Let’s walk through a concrete example of resolving a data conflict. We’ll use the same example from the empty commit handling example above, but we’ll change our data slightly to create a data conflict.
# create a new directory and initialize it as a Dolt database
mkdir club_members && cd club_members
dolt init
# Create a table to store a list of club members
dolt sql -q "create table members (name varchar(100) primary key, address varchar(100));"
dolt commit -Am "creating members table"
# Create a new branch from this point in the commit graph
dolt branch new-members
# Insert a new member on the main branch (past where the new-members branch was created)
dolt sql -q "insert into members values ('Adam Maitland', 'White River, Connecticut');"
dolt commit -am "New member: Adam Maitland"
# Checkout the new-members branch and add some new members
dolt checkout new-members
dolt sql -q "insert into members values ('Delia Deetz', 'New York, New York');"
dolt commit -am "New member: Delia Deetz"
dolt sql -q "insert into members values ('Adam Maitland', 'The Neitherworld');"
dolt commit -am "New member: Adam Maitland 👻"
At this point, our commit graph and the members
table look like this:
Notice that in this setup, the main
branch has added a row for Adam Maitland and the new-members
branch has also added an Adam Maitland row, but the two updates conflict, since they update the same row with different values. The main
branch uses Adam Maitland’s original address, and the new-members
branch uses his more current, and spookier, address.
Let’s try running a rebase now and see what happens when Dolt encounters this data conflict.
dolt rebase -i main
Again, go ahead and accept the default rebase plan. If you’re curious about ways to customize a rebase plan, or how to use the other available rebase plan commands (e.g. drop, reword, squash, fixup), then go read our blog post from earlier this year on rebase support.
When you exit the editor Dolt will automatically start executing the rebase plan. However, in this example, you won’t immediately see a message about rebasing finishing successfully. Instead, you’ll see a message like this:
data conflict detected while rebasing commit vh7354antpr4m2mnb73g43gvpf96ihtd (New member: Adam Maitland 👻).
Resolve the conflicts and remove them from the dolt_conflicts_<table> tables, then continue the rebase by calling dolt_rebase('--continue')
This error message tells us what commit caused the data conflict and tells us to use Dolt’s conflict resolution tools to resolve it, then continue the rebase. Let’s start by taking a look at the conflict:
dolt conflicts cat .
+-----+--------+---------------+--------------------------+------+---------+------+---------+
| | | name | address | name | address | name | address |
+-----+--------+---------------+--------------------------+------+---------+------+---------+
| + | ours | Adam Maitland | White River, Connecticut | NULL | NULL | NULL | NULL |
| + | theirs | Adam Maitland | The Neitherworld | NULL | NULL | NULL | NULL |
+-----+--------+---------------+--------------------------+------+---------+------+---------+
From the commandline, we can use the dolt conflicts cat .
command to view all the conflicts. If we were using a SQL shell we could see the same thing by querying the dolt_conflicts
and dolt_conflicts_members
system tables.
In this case, we see that we want to keep the change from theirs
, where Adam Maitland lives in The Neitherworld. We could manually update the members
table and then remove the row in the dolt_conflicts_members
table, but it’s easier to use the dolt conflicts resolve <tablename>
command for a simple conflict like this:
dolt conflicts resolve --theirs members
Before we continue the rebase, we need to stage the members
table to communicate that our changes are ready to be committed. Otherwise, trying to continue the rebase will result in an error message that tells us we need to stage our changes before rebasing can continue.
dolt add members
Now that we’ve resolved the data conflict and staged the changed tables, we’re ready to continue rebasing:
dolt rebase --continue
Successfully rebased and updated refs/heads/new-members
Let’s quickly verify that the members
table and our commit history look like what we expect:
dolt sql -q "select * from members"
+---------------+--------------------+
| name | address |
+---------------+--------------------+
| Adam Maitland | The Neitherworld |
| Delia Deetz | New York, New York |
+---------------+--------------------+
dolt log --oneline
8m3ud6ah2bmcpakudgjl377lmoagtrr4 (HEAD -> new-members) New member: Adam Maitland 👻
kf58rjh5565a0rqfdnc6gd4vba9lb1sh New member: Delia Deetz
d5vnc2jr9eustri2790hmq979l461ct7 (main) New member: Adam Maitland
h4h6npqajg7cpmpd3ldio3lmml7cnu12 creating members table
d01t96q7m0ei640c99ce1l9crokqj3ts Іnіtialize datа repository
Sure enough, we can see that the Adam Maitland record has the correct address that we used when we resolved the conflict, and our commit history includes the commits from main
as well as all the comments from the new-members
branch.
Conclusion
Rebase is a powerful tool for rewriting a branch’s commit history, and is now one of the many Git features supported in Dolt. Rebasing can help you keep a tidy and linear commit history for your database, and it is particularly well suited for feature or development branches that aren’t being used by multiple people. Recent enhancements to Dolt’s rebase support allow you to control how empty commits are handled as well as resolve data conflicts during interactive rebases.
If you’re interested in databases, version control, and all the cool things you can do a version-controlled database, I hope you’ll join us on Discord to talk with our engineering team and other Dolt users.