Three-Way Merges in Dolt: Modeling Schema Conflicts

FEATURE RELEASESQL
9 min read

Dolt is the world's first and only version-controlled SQL database. Its powerful branch and merge model is exposed through a familiar Git-style interface. In fact, Dolt's connection to Git goes beyond interface compatibility. At the storage layer, Dolt and Git both leverage Merkle-Trees that enable scalable diff algorithms. However, Dolt's implementation is specifically tailored to handle database tables and schemas. This unique storage engine is what sets it apart from other databases: it was built from the ground up to support the database version control use-case. Dolt's three-way merge model is designed to accommodate a wide range of workflows and scenarios, making it a versatile and powerful tool for Dolt users. Today's blog details the mechanics of three-way merge in Dolt and how we solved the problem of versioning schema and data with a performant approach.

Two Branches Diverged in a SQL Database

Let's take a look at a minimal example of three-way merge in Dolt. The examples in this blog rely heavily on Dolt's custom stored procedures, specifically dolt_merge() and dolt_checkout(). Generally, these procedures work the same way they do in Git. You can also lookup their behavior in detail in the documentation.

% dolt sql
# Welcome to the DoltSQL shell.
# Statements must be terminated with ';'.
# "exit" or "quit" (or Ctrl-D) to exit.
three_way_merge> CREATE TABLE people (id int PRIMARY KEY, last_name VARCHAR(120), first_name VARCHAR(120));
three_way_merge> INSERT INTO people VALUES (1,'Billerson','Bill');
three_way_merge> CALL dolt_commit('-Am', 'new table people');
+----------------------------------+
| hash                             |
+----------------------------------+
| 8r81j2c97aamgdpoa5n7d8mvhlqh84ue |
+----------------------------------+

three_way_merge> CALL dolt_branch('branch2');
+--------+
| status |
+--------+
| 0      |
+--------+

three_way_merge> INSERT INTO people VALUES (3,'Sanderson','Sandy');
three_way_merge> CALL dolt_commit('-am','added Sandy on branch main');
+----------------------------------+
| hash                             |
+----------------------------------+
| uarph61mnp9pfdlpm6dv74c9an3njr6e |
+----------------------------------+

three_way_merge> CALL dolt_checkout('branch2');
+--------+
| status |
+--------+
| 0      |
+--------+

three_way_merge> INSERT INTO people VALUES (2,'Manderson','Mandy');
three_way_merge> CALL dolt_commit('-am', 'added Mandy on branch2');
+----------------------------------+
| hash                             |
+----------------------------------+
| 52m6tolft3r1jbf88drl6oaf1ebelu6c |
+----------------------------------+

three_way_merge> SELECT * FROM people AS OF 'main';
+----+-----------+------------+
| id | last_name | first_name |
+----+-----------+------------+
| 1  | Billerson | Bill       |
| 3  | Sanderson | Sandy      |
+----+-----------+------------+

three_way_merge> SELECT * FROM people AS OF 'branch2';
+----+-----------+------------+
| id | last_name | first_name |
+----+-----------+------------+
| 1  | Billerson | Bill       |
| 2  | Manderson | Mandy      |
+----+-----------+------------+

In this setup, we have a database with one table and two branches. Both branches have made changes since the common ancestor. Now let's merge the branches and see what happens:

three_way_merge> CALL dolt_checkout('main');
+--------+
| status |
+--------+
| 0      |
+--------+

three_way_merge> CALL dolt_merge('branch2');
+--------------+-----------+
| fast_forward | conflicts |
+--------------+-----------+
| 0            | 0         |
+--------------+-----------+

three_way_merge> SELECT * FROM people;
+----+-----------+------------+
| id | last_name | first_name |
+----+-----------+------------+
| 1  | Billerson | Bill       |
| 2  | Manderson | Mandy      |
| 3  | Sanderson | Sandy      |
+----+-----------+------------+

three_way_merge> SELECT * FROM dolt_log;
+----------------------------------+-----------+---------------+-------------------------+----------------------------------+
| commit_hash                      | committer | email         | date                    | message                          |
+----------------------------------+-----------+---------------+-------------------------+----------------------------------+
| 0vmp06tjfmb61v4hpctp73s2cad0o4ll | test      | test@test.com | 2023-04-28 22:29:34.217 | Merge branch 'branch2' into main |
| 52m6tolft3r1jbf88drl6oaf1ebelu6c | test      | test@test.com | 2023-04-28 22:29:09.517 | added Mandy on branch2           |
| uarph61mnp9pfdlpm6dv74c9an3njr6e | test      | test@test.com | 2023-04-28 22:28:09.424 | added Sandy on branch main       |
| 8r81j2c97aamgdpoa5n7d8mvhlqh84ue | test      | test@test.com | 2023-04-28 22:26:39.263 | new table people                 |
| dr2gkc6vs3ogcpl3089g7ne0rcvbqnba | test      | test@test.com | 2023-04-28 22:24:39.011 | Initialize data repository       |
+----------------------------------+-----------+---------------+-------------------------+----------------------------------+

Post-merge we can see that all the changes are reflected on the tip of branch main and the history of both branches are reflected in the commit log. And as we mentioned before, the Merkle-Tree storage engine allows this merge process to scale sub-linearly with the size of the table, regardless of how deep and complicated the commit history is. The cost of doing a three-way merge only depends on the size of the change sets being merged.

Merging Schema and Data

Now let's extend the model and see how Dolt tracks schema and data changes together. We'll return to our previous example and again make changes on two branches:

three_way_merge> CALL dolt_checkout('-b', 'branch3');
+--------+
| status |
+--------+
| 0      |
+--------+

three_way_merge> ALTER TABLE people ADD COLUMN age integer default '0';
three_way_merge> CALL dolt_commit('-am', 'added column age on branch3');
+----------------------------------+
| hash                             |
+----------------------------------+
| n2hoel0gemo76emm7h905o9lhfmc31jh |
+----------------------------------+

three_way_merge> CALL dolt_checkout('main');
+--------+
| status |
+--------+
| 0      |
+--------+

three_way_merge> ALTER TABLE people ADD COLUMN birthday datetime;
three_way_merge> UPDATE people SET birthday = DATE('1970-01-01') WHERE id = 1;

three_way_merge> CALL dolt_commit('-am', 'added column birthday on branch main');
+----------------------------------+
| hash                             |
+----------------------------------+
| 6jpepsoepcfuq82t24gdss6d36ufltdu |
+----------------------------------+

three_way_merge> SELECT * FROM people AS OF 'main';
+----+-----------+------------+---------------------+
| id | last_name | first_name | birthday            |
+----+-----------+------------+---------------------+
| 1  | Billerson | Bill       | 1970-01-01 00:00:00 |
| 2  | Manderson | Mandy      | NULL                |
| 3  | Sanderson | Sandy      | NULL                |
+----+-----------+------------+---------------------+

three_way_merge> SELECT * FROM people AS OF 'branch3';
+----+-----------+------------+-----+
| id | last_name | first_name | age |
+----+-----------+------------+-----+
| 1  | Billerson | Bill       | 0   |
| 2  | Manderson | Mandy      | 0   |
| 3  | Sanderson | Sandy      | 0   |
+----+-----------+------------+-----+q

In this example, each side of the merge has a new schema; they both differ from the "common ancestor" of the merge. Let's run the merge operation and see how this difference is resolved in Dolt:

three_way_merge> CALL dolt_merge('branch3');
+--------------+-----------+
| fast_forward | conflicts |
+--------------+-----------+
| 0            | 0         |
+--------------+-----------+

three_way_merge> SELECT * FROM people;
+----+-----------+------------+---------------------+------+
| id | last_name | first_name | birthday            | age  |
+----+-----------+------------+---------------------+------+
| 1  | Billerson | Bill       | 1970-01-01 00:00:00 | 0    |
| 2  | Manderson | Mandy      | NULL                | 0    |
| 3  | Sanderson | Sandy      | NULL                | 0    |
+----+-----------+------------+---------------------+------+

The new post-merge schema contains both of the new columns and all of the new data that was created on both sides. Dolt's three-way merge algorithm works by first reconciling schema differences between each side of the merge, and then merging the table data using this new schema.

Detecting Conflicts

So you may be asking yourself, "what happens if the changes overlap?" That's a great point! Let's look at how we detect and resolve conflicts. Dolt's merge algorithm is "cell-wise", meaning we can automatically merge any change sets that are disjoint at the cell level. However, if both branches change the same cell to a different value, we have no choice but to escalate this decision to the user and let them decide what the final value should be:

three_way_merge> call dolt_checkout('-b', 'branch4');
+--------+
| status |
+--------+
| 0      |
+--------+

three_way_merge> UPDATE people SET age = 42 WHERE id = 3;
three_way_merge> CALL dolt_commit('-am', 'update Sandys age to 42');
+----------------------------------+
| hash                             |
+----------------------------------+
| gpj83imprl8ousm8vqrvqqb4fjr0qiot |
+----------------------------------+

three_way_merge> CALL dolt_checkout('main');
+--------+
| status |
+--------+
| 0      |
+--------+

three_way_merge> UPDATE people SET age = 41 WHERE id = 3;
three_way_merge> CALL dolt_commit('-am', 'update Sandys age to 41');
+----------------------------------+
| hash                             |
+----------------------------------+
| 85qvm9st3oe5f4dmuifdv6ro20rpt83p |
+----------------------------------+

three_way_merge> SELECT to_id AS 'id', to_last_name AS 'last_name', to_first_name AS 'first_name', to_birthday AS 'birthday', to_age, from_age 
              -> FROM dolt_diff('main', 'branch4', 'people');
+----+-----------+------------+----------+--------+----------+
| id | last_name | first_name | birthday | to_age | from_age |
+----+-----------+------------+----------+--------+----------+
| 3  | Sanderson | Sandy      | NULL     | 42     | 41       |
+----+-----------+------------+----------+--------+----------+

three_way_merge> SET @@dolt_allow_commit_conflicts = 1;

three_way_merge> CALL dolt_merge('branch4');
+--------------+-----------+
| fast_forward | conflicts |
+--------------+-----------+
| 0            | 1         |
+--------------+-----------+

Notice that in this example we told the database we want it to materialize conflicts using SET @@dolt_allow_commit_conflicts = 1. By default, dolt_merge(...) will error if a conflict is detected and the merge operation will be rolled back. By setting this session variable, we're telling Dolt that we want to see the conflicts and resolve them manually. We can inspect the conflicted state of our main branch using the dolt_conflicts and dolt_conflicts_$tablename tables:

three_way_merge> select * from dolt_conflicts;
+--------+---------------+
| table  | num_conflicts |
+--------+---------------+
| people | 1             |
+--------+---------------+

three_way_merge> select * from dolt_conflicts_people;
+----------------------------------+---------+----------------+-----------------+---------------+----------+--------+---------------+----------------+--------------+---------+---------------+----------+-----------------+------------------+----------------+-----------+-----------------+------------------------+
| from_root_ish                    | base_id | base_last_name | base_first_name | base_birthday | base_age | our_id | our_last_name | our_first_name | our_birthday | our_age | our_diff_type | their_id | their_last_name | their_first_name | their_birthday | their_age | their_diff_type | dolt_conflict_id       |
+----------------------------------+---------+----------------+-----------------+---------------+----------+--------+---------------+----------------+--------------+---------+---------------+----------+-----------------+------------------+----------------+-----------+-----------------+------------------------+
| gpj83imprl8ousm8vqrvqqb4fjr0qiot | 3       | Sanderson      | Sandy           | NULL          | NULL     | 3      | Sanderson     | Sandy          | NULL         | 41      | modified      | 3        | Sanderson       | Sandy            | NULL           | 42        | modified        | VobIDfU6u4VXFsU7DUhBSA |
+----------------------------------+---------+----------------+-----------------+---------------+----------+--------+---------------+----------------+--------------+---------+---------------+----------+-----------------+------------------+----------------+-----------+-----------------+------------------------+

From these results we can see that our two branches are conflicted on the value of the age column. Thankfully, resolving this conflict is simple using the dolt_conflicts_resolve procedure. We can choose to accept "our" value (branch main) or "their" value (branch4) and automatically resolve the conflict:

three_way_merge> CALL dolt_conflicts_resolve('--theirs', 'people');
+--------+
| status |
+--------+
| 0      |
+--------+

three_way_merge> SELECT * FROM people;
+----+-----------+------------+---------------------+-----+
| id | last_name | first_name | birthday            | age |
+----+-----------+------------+---------------------+-----+
| 1  | Billerson | Bill       | 1970-01-01 00:00:00 | 0   |
| 2  | Manderson | Mandy      | NULL                | 0   |
| 3  | Sanderson | Sandy      | NULL                | 42  |
+----+-----------+------------+---------------------+-----+

three_way_merge> CALL dolt_commit('-am', 'resolved merge conflict');
+----------------------------------+
| hash                             |
+----------------------------------+
| qd38cahnjp3v0alql18mprvkq1qvra4h |
+----------------------------------+

Resolving Un-Mergeable Schema

Finally, let's explore our most recent merge concept: schema conflicts. As with data merges, there are some types of schema merges which we cannot resolve algorithmically. When two branches both modify the same portion of a table's schema, we must escalate the resolution decision to the user and let them decide what the final schema should look like. You can find a full list of Dolt's schema conflict rules in the docs To make this concept more clear, let's look at an example:

three_way_merge> CALL dolt_checkout('-b', 'branch5');
+--------+
| status |
+--------+
| 0      |
+--------+

three_way_merge> ALTER TABLE people MODIFY age bigint;
three_way_merge> CALL dolt_commit('-am', 'modified column age to bigint on branch5');
+----------------------------------+
| hash                             |
+----------------------------------+
| rs2deoh47bn7spthd65edjv1pj1clkrs |
+----------------------------------+

three_way_merge> CALL dolt_checkout('main');
+--------+
| status |
+--------+
| 0      |
+--------+

three_way_merge> ALTER TABLE people MODIFY age float;
three_way_merge> CALL dolt_commit('-am', 'modified column age to float on branch main');
+----------------------------------+
| hash                             |
+----------------------------------+
| qqbi8nvfiklmrunbnnpme8d0uvm6i86j |
+----------------------------------+

three_way_merge> CALL dolt_merge('branch5');
+--------------+-----------+
| fast_forward | conflicts |
+--------------+-----------+
| 0            | 1         |
+--------------+-----------+

three_way_merge> SELECT * FROM dolt_status;
+------------+--------+-----------------+
| table_name | staged | status          |
+------------+--------+-----------------+
| people     | false  | schema conflict |
+------------+--------+-----------------+

Each branch tried to modify the type of column age and the merge operation can't resolve this schema conflict. Instead we defer this decision to the user and describe the schema conflict using the dolt_schema_conflicts table:

three_way_merge> SELECT table_name, description, base_schema, our_schema, their_schema FROM dolt_schema_conflicts;
+------------+----------------------------------------------------------------------+-------------------------------------------------------------------+-------------------------------------------------------------------+-------------------------------------------------------------------+
| table_name | description                                                          | base_schema                                                       | our_schema                                                        | their_schema                                                      |
+------------+----------------------------------------------------------------------+-------------------------------------------------------------------+-------------------------------------------------------------------+-------------------------------------------------------------------+
| people     | different column definitions for our column age and their column age | CREATE TABLE `people` (                                           | CREATE TABLE `people` (                                           | CREATE TABLE `people` (                                           |
|            |                                                                      |   `id` int NOT NULL,                                              |   `id` int NOT NULL,                                              |   `id` int NOT NULL,                                              |
|            |                                                                      |   `last_name` varchar(120),                                       |   `last_name` varchar(120),                                       |   `last_name` varchar(120),                                       |
|            |                                                                      |   `first_name` varchar(120),                                      |   `first_name` varchar(120),                                      |   `first_name` varchar(120),                                      |
|            |                                                                      |   `birthday` datetime(6),                                         |   `birthday` datetime(6),                                         |   `birthday` datetime(6),                                         |
|            |                                                                      |   `age` int DEFAULT '0',                                          |   `age` float,                                                    |   `age` bigint,                                                   |
|            |                                                                      |   PRIMARY KEY (`id`)                                              |   PRIMARY KEY (`id`)                                              |   PRIMARY KEY (`id`)                                              |
|            |                                                                      | ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_bin; | ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_bin; | ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_bin; |
+------------+----------------------------------------------------------------------+-------------------------------------------------------------------+-------------------------------------------------------------------+-------------------------------------------------------------------+

Fortunately for us, resolving schema conflicts is just as easy as resolving data conflicts:

three_way_merge> CALL dolt_conflicts_resolve('--ours', 'people');
+--------+
| status |
+--------+
| 0      |
+--------+

three_way_merge> SHOW CREATE TABLE people;
+--------+------------------------------------------------------------------+
| Table  | Create Table                                                     |
+--------+------------------------------------------------------------------+
| people | CREATE TABLE `people` (                                          |
|        |   `id` int NOT NULL,                                             |
|        |   `last_name` varchar(120),                                      |
|        |   `first_name` varchar(120),                                     |
|        |   `birthday` datetime(6),                                        |
|        |   `age` float,                                                   |
|        |   PRIMARY KEY (`id`)                                             |
|        | ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_bin |
+--------+------------------------------------------------------------------+

Modeling schema conflicts and providing automatic schema resolution is a new feature for Dolt. It's part of a recent push to make dolt_merge(...) more stable, scalable and powerful. While it is often possible to resolve schema merges with no user intervention, modeling schema conflicts in the dolt_schema_conflicts table allows users to have more control over their branch-and-merge workflows.

Wrapping Up

This approach to version control allows users to keep track of changes to database schema and data, making it easier to collaborate with others and maintain a consistent version history. Additionally, Dolt's merge model provides a robust and reliable means of resolving conflicts, ensuring that data remains accurate and up-to-date at all times.

Overall, the three-way merge model is a powerful tool for managing database changes and facilitating collaboration among team members. Whether you're working on a small project or a large-scale enterprise database, Dolt's merge model is a flexible and reliable solution that can help you keep your data organized and up-to-date. If you'd like to learn more about Dolt's version control model or have a use-case for a version-controlled database, hit us up on Discord, we're always around.

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt

Or join our mailing list to get product updates.