Merging and Resolving Conflicts Programmatically with SQL

REFERENCE
9 min read

In the first part of this two part blog we covered concurrent connection handling within dolt sql-server. We learned about session state, how to commit changes, and how to persist those changes across sessions. Today we'll talk about the explicit actions a client can take in order to get the changes made by others, and how to resolve conflicts that arise from two or more clients making changes independently. We'll be jumping right into it from where we left off so make sure you've read part one.

Setup

Our examples in part two of this blog will use a table with a more complicated schema than we used in the first part, in order to show different cases that arise when merging and handling merge conflicts. Our table will be created and seeded with the following SQL statements:

CREATE TABLE people (
    id int NOT NULL,
    first_name varchar(32) NOT NULL,
    last_name varchar(64) NOT NULL,
    age int,
    occupation varchar(32),
    PRIMARY KEY(id)
);

INSERT INTO people VALUES
    (0,'Homer','Simpson',39,'Nuclear Engineer'),
    (1,'Marge','Simpson',36,'Homemaker');

MERGE()

In a standard multi-user Git or Dolt workflow you would make changes, merge your changes with other people's changes, and then push the merged result to the remote. We'll be working directly against master for our example here, but everything covered is applicable to any kind of multi-user collaboration workflow.

The Merge() function takes a branch name as an argument. It takes the HEAD commit of that branch and merges it with the commit pointed to by @@mydb_head. It differs from dolt merge in that it automatically creates a commit and returns the hash of that commit. This hash could be saved to @@db_head or to a branch in the dolt_branches table. A second difference is that it will not warn you about conflicts that result from the merge, and you'll need to query the state of the dolt_conflicts table to determine if additional action needs to be taken. We'll look into that more later. Here is an example of how we would make our changes available to other users by merging with master and pushing our changes onto master.

# start our session by setting our branch to be on the tip of master
MySQL [test]> SET @@test_head=hashof('master');
Query OK, 1 row affected (0.001 sec)

# inspect our starting state and see our db has been setup and seeded with the data specified above
MySQL [test]> SELECT commit_hash, message FROM dolt_log ORDER BY date DESC;
+----------------------------------+----------------------------------+
| commit_hash                      | message                          |
+----------------------------------+----------------------------------+
| 4fcs2jrgrmetlov1mdohrb90o2gbp42u | Seed database with initial state |
| 3sph456kkdvhqbmsqdkt0bf1j9fla727 | Initialize data repository       |
+----------------------------------+----------------------------------+
2 rows in set (0.011 sec)

# make our edits
MySQL [test]> UPDATE people SET age=40 WHERE id = 0;
Query OK, 1 row affected (0.009 sec)
Rows matched: 1  Changed: 1  Warnings: 0

# commit our edits so that they are persisted on disc, and save the returned commit hash to @@test_head
MySQL [test]> SET @@test_head=COMMIT('-m','Update Homer`s age');
Query OK, 1 row affected (0.014 sec)
# use the named lock "branches" when reading or writing the branches table as a convention
MySQL [test]> SELECT GET_LOCK('branches', -1);
+--------------------------+
| get_lock("branches", -1) |
+--------------------------+
|                        1 |
+--------------------------+
1 row in set (0.001 sec)

# update @@test_head with the commit hash returned by the MERGE function
MySQL [test]> SET @@test_head=MERGE('master');
Query OK, 1 row affected (0.026 sec)

# update the master branch to point to the merge commit we saved to @@test_head
MySQL [test]> UPDATE dolt_branches SET hash = @@test_head where name = 'master';
Query OK, 0 rows affected (0.001 sec)
Rows matched: 1  Changed: 0  Warnings: 0

# release the lock
MySQL [test]> SELECT RELEASE_LOCK('branches');
+--------------------------+
| RELEASE_LOCK("branches") |
+--------------------------+
|                        1 |
+--------------------------+
1 row in set (0.001 sec)

# inspect the state of our commit graph. Notice two changes that were made externally.
MySQL [test]> SELECT commit_hash, message FROM dolt_log ORDER BY date DESC;
+----------------------------------+-----------------------------------------------------------------------------------------------------+
| commit_hash                      | message                                                                                             |
+----------------------------------+-----------------------------------------------------------------------------------------------------+
| o7a38p0hr8t7cmsaeaha0kfhtalg2vdm | SQL Generated commit merging l3pcctub2st06eogguhljpak3mad7ibg into 91i3u9il10mrmb7s2s00qq7q92ji1nnh |
| l3pcctub2st06eogguhljpak3mad7ibg | Update Homer`s age                                                                                  |
| 91i3u9il10mrmb7s2s00qq7q92ji1nnh | Add Moe to people                                                                                   |
| cf70lrg43vounp79b8t2kht2dndbk59m | Update Marge`s age                                                                                  |
| 4fcs2jrgrmetlov1mdohrb90o2gbp42u | Seed database with initial state                                                                    |
| 3sph456kkdvhqbmsqdkt0bf1j9fla727 | Initialize data repository                                                                          |
+----------------------------------+-----------------------------------------------------------------------------------------------------+
6 rows in set (0.001 sec)

# inspect dolt_branches and we can see master now points to the same commit that we see at the top of our log
MySQL [test]> select name, hash from dolt_branches;
+--------+----------------------------------+
| name   | hash                             |
+--------+----------------------------------+
| master | o7a38p0hr8t7cmsaeaha0kfhtalg2vdm |
+--------+----------------------------------+
1 row in set (0.001 sec)

Here we make a small edit, create a commit, and then merge that commit with master. In doing so you'll see two additional commits that were picked up "Update Marge's age" and "Add Moe to people". These were changes made by one or more clients that took place between the time we connected, and the time we merged with master. Additionally, we see a merge commit which was generated when we called the MERGE() function.

dolt_conflicts and dolt_conflicts_$tablename

In the process of getting your changes onto a shared branch, there is the possibility of merge conflicts. Databases with high write rates where users are making edits to the same rows frequently will run into this more often. Let's walk through a second merge.

# start our session by setting our branch to be on the tip of master
MySQL [test]> SET @@test_head=hashof('master');
Query OK, 1 row affected (0.001 sec)

# add Bart Simpson to the people table
MySQL [test]> INSERT INTO people VALUES (3,'Bart','Simpson',10,'Student');
Query OK, 1 row affected (0.004 sec)

# update Homer's age
MySQL [test]> Update people SET age=39 WHERE id = 0;
Query OK, 1 row affected (0.003 sec)
Rows matched: 1  Changed: 1  Warnings: 0

# update Marge's age
MySQL [test]> Update people SET age=36 WHERE id = 1;
Query OK, 1 row affected (0.003 sec)
Rows matched: 1  Changed: 1  Warnings: 0

# update Moe's age
MySQL [test]> UPDATE people SET age = 62 WHERE id = 2;
Query OK, 1 row affected (0.009 sec)
Rows matched: 1  Changed: 1  Warnings: 0

# commit changes to our session's HEAD
MySQL [test]> SET @@TEST_HEAD = COMMIT('-m', 'Update Homer, Marge, and Moe`s age, and add Bart');
Query OK, 1 row affected (0.019 sec)

Here we've made a few changes to our table that we want to get onto master. We will merge our changes with the latest from master and then inspect the dolt_conflicts table to determine if there are any conflicts that need to be resolved before we update master.

# get the lock "branches"
MySQL [test]> SELECT GET_LOCK('branches', -1);
+--------------------------+
| get_lock("branches", -1) |
+--------------------------+
|                        1 |
+--------------------------+
1 row in set (0.001 sec)

# update @@test_head with the commit hash returned by the MERGE function
MySQL [test]> SET @@test_head=MERGE('master');
Query OK, 1 row affected (0.026 sec)

# check for conflicts
MySQL [test]> SELECT * FROM dolt_conflicts;
+--------+---------------+
| table  | num_conflicts |
+--------+---------------+
| people |             3 |
+--------+---------------+
1 row in set (0.007 sec)

You can see the table "people" has 3 rows that are in conflict. Unlike the command line, MERGE() will allow you to create a commit that has unresolved conflicts on it. We could update master with a commit that has conflicts, but this likely will cause you problems down the road. Conflicts should be resolved before updating shared branches. Not resolving conflicts will result in our values being taken, and the conflicts remaining unresolved and will accumulate over time. At some point in the future you may need to resolve conflicts, but your conflicts table will be littered with old data which may make it difficult to understand what actually needs to be resolved. Conflicts should be resolved immediately while you have all the information needed to resolve them. Dolt provides a system table named dolt_conflicts_$TABLENAME for each user table in order to allow conflict inspection and resolution. Lets look at dolt_conflicts_people.

# inspect the columns of the dolt_conflicts_people table
MySQL [test]> DESCRIBE dolt_conflicts_people;
+------------------+-------------+------+------+---------+-------+
| Field            | Type        | Null | Key  | Default | Extra |
+------------------+-------------+------+------+---------+-------+``
| base_occupation  | varchar(32) | YES  |      |         |       |
| base_last_name   | varchar(64) | YES  |      |         |       |
| base_id          | int         | YES  |      |         |       |
| base_first_name  | varchar(32) | YES  |      |         |       |
| base_age         | int         | YES  |      |         |       |
| our_occupation   | varchar(32) | YES  |      |         |       |
| our_last_name    | varchar(64) | YES  |      |         |       |
| our_id           | int         | YES  |      |         |       |
| our_first_name   | varchar(32) | YES  |      |         |       |
| our_age          | int         | YES  |      |         |       |
| their_occupation | varchar(32) | YES  |      |         |       |
| their_last_name  | varchar(64) | YES  |      |         |       |
| their_id         | int         | YES  |      |         |       |
| their_first_name | varchar(32) | YES  |      |         |       |
| their_age        | int         | YES  |      |         |       |
+------------------+-------------+------+------+---------+-------+
15 rows in set (0.001 sec)

# inspect the data in dolt_conflicts_people
MySQL [test]> SELECT * FROM dolt_conflicts_people;
+-----------------+----------------+---------+-----------------+----------+----------------+---------------+--------+----------------+---------+------------------+-----------------+----------+------------------+-----------+
| base_occupation | base_last_name | base_id | base_first_name | base_age | our_occupation | our_last_name | our_id | our_first_name | our_age | their_occupation | their_last_name | their_id | their_first_name | their_age |
+-----------------+----------------+---------+-----------------+----------+----------------+---------------+--------+----------------+---------+------------------+-----------------+----------+------------------+-----------+
| Homemaker       | Simpson        |       1 | Marge           |       37 | Homemaker      | Simpson       |      1 | Marge          |      36 | NULL             | NULL            |     NULL | NULL             |      NULL |
| Bartender       | Szslak         |       2 | Moe             |     NULL | Bartender      | Szslak        |      2 | Moe            |      62 | Bartender        | Szslak          |        2 | Moe              |        60 |
| NULL            | NULL           |    NULL | NULL            |     NULL | Student        | Simpson       |      3 | Bart           |      10 | Student          | Simpson         |        3 | Lisa             |         8 |
+-----------------+----------------+---------+-----------------+----------+----------------+---------------+--------+----------------+---------+------------------+-----------------+----------+------------------+-----------+

# inspect what is currently in people
MySQL [test]> SELECT * FROM people;
+------+------------+-----------+------+------------------+
| id   | first_name | last_name | age  | occupation       |
+------+------------+-----------+------+------------------+
|    0 | Homer      | Simpson   |   39 | Nuclear Engineer |
|    1 | Marge      | Simpson   |   36 | Homemaker        |
|    2 | Moe        | Szslak    |   62 | Bartender        |
|    3 | Bart       | Simpson   |   10 | Student          |
+------+------------+-----------+------+------------------+
4 rows in set (0.001 sec)

A dolt_conflicts_$TABLENAME table will have columns base_$COLUMNNAME, our_$COLUMNNAME, their_$COLUMNNAME' for every column in the table. base_* columns will contain the value for this column in the most recent common ancestor, our_* columns will have the value of the column from our session, and their_* columns will have values from the branch being merged. In the case above we can see:

  1. For id 1, our_age differs from base_age and their_age. For this row all the their_* columns are NULL. This result occurred because we updated Marge's age, and they deleted Marge from the table.
  2. For id = 2 the age column differs as well. In this case Moe had no previous age. We updated his age to 62, they updated his age to 60.
  3. For id = 3 the row has NULL for all base_* columns meaning this is a new row. We added Bart Simpson, they added Lisa Simpson.

Querying the state of the table "people", we can see that for all the conflicting rows, we are showing the value of "our_*" for every value. By default our values are used, and if we are happy with that we can clear out the conlicts and continue.

-- delete every conflict from the table
DELETE FROM dolt_conflicts_people;

If instead we wanted to take their value we would need to run:

-- Replace existing rows with rows taken with their_* values as long as their_id is not null (rows deleted in theirs)
REPLACE INTO people (id,first_name,last_name,age) (
    SELECT their_id, their_first_name, their_last_name, their_age
    FROM dolt_conflicts_people
    WHERE their_id != NULL
);

-- Delete any rows that are deleted in theirs
DELETE FROM PEOPLE WHERE id IN (
    SELECT base_id
    FROM dolt_conflicts
    WHERE base_id IS NOT NULL their_id IS NULL
);

-- delete every conflict from the table
DELETE FROM dolt_conflicts_people;

It may be that you don't want either of these and instead you want to implement something programmatic based on your application and schema. You have the full power of SQL to resolve these conflicts, and you can take advantage of it in doing so. In our case above I will add their Lisa Simpson data to the people, resolve the conflicts, and update master.

# add the values of Lisa Simpson taking from the dolt_conflicts_people table with id = 4
MySQL [test]> INSERT INTO people values (4, 'Lisa', 'Simpson', 8, 'Student');
Query OK, 1 row affected (0.003 sec)

# check the state of our data
MySQL [test]> SELECT * FROM people;
+------+------------+-----------+------+------------------+
| id   | first_name | last_name | age  | occupation       |
+------+------------+-----------+------+------------------+
|    0 | Homer      | Simpson   |   39 | Nuclear Engineer |
|    1 | Marge      | Simpson   |   36 | Homemaker        |
|    2 | Moe        | Szslak    |   62 | Bartender        |
|    3 | Bart       | Simpson   |   10 | Student          |
|    4 | Lisa       | Simpson   |    8 | Student          |
+------+------------+-----------+------+------------------+
5 rows in set (0.001 sec)

# resolve all conflicts
MySQL [test]> DELETE FROM dolt_conflicts_people;
Query OK, 3 rows affected (0.011 sec)

# verify that there are no more table with conflicts
MySQL [test]> SELECT * from dolt_conflicts;
+--------+---------------+
| table  | num_conflicts |
+--------+---------------+
| people |             0 |
+--------+---------------+

# create a commit for the resolved table
MySQL [test]> SET @@test_head=COMMIT('-m','Resolve merge conflicts');
Query OK, 1 row affected (0.013 sec)

# update master to point to the newly created commit
MySQL [test]> UPDATE dolt_branches SET hash=@@test_head where name='master';
Query OK, 1 row affected (0.010 sec)
Rows matched: 1  Changed: 1  Warnings: 0

# release the lock
MySQL [test]> SELECT RELEASE_LOCK('branches');
+--------------------------+
| RELEASE_LOCK("branches") |
+--------------------------+
|                        1 |
+--------------------------+
1 row in set (0.001 sec)

Concluding

Dolt as an application database provides a different paradigm. The use-cases are numerous, and we are just starting to explore the possibilities here. We'd love to hear from you. Join our discord community and chat with us about how you are using Dolt. We'd love to hear from you about your use-case, and your experience trying dolt.

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt

Or join our mailing list to get product updates.