Transactions in a Database with Branches

FEATURE RELEASESQL
6 min read

Dolt is Git for data, a SQL database that you can fork, clone, branch, and merge. Today we're excited to announce the alpha release of SQL transactions!

This blog will cover how transactions work in Dolt and how this differs from more traditional RDBMS transaction implementations.

What are transactions?

Transactions are a very old concept in relational databases. They're a way to safely manage the shared state of a database across many different readers and writers, so that two people can edit the same database at the same time. They're used for a lot of things in database applications, but broadly speaking they have three primary goals:

  1. Atomic units of work. Transactions let you issue multiple INSERT, UPDATE, and DELETE statements, then guarantee that either all of them go into effect at once, or none of them do. This happens when you COMMIT or ROLLBACK the transaction, respectively.

  2. Isolation. During a long-running transaction, you don't want other clients to see your changes until after you COMMIT. And you don't want to see any changes they are making either.

  3. Locking. The database needs to keep track of who is editing which rows, so that they are applied in a consistent order. It should be impossible for someone's changes to be silently overwritten by another transaction.

Consider this table:

mysql> describe cities;
+------------+--------------+------+-----+---------+-------+
| Field      | Type         | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+-------+
| name       | varchar(100) | NO   | PRI | NULL    |       |
| population | int          | YES  |     | NULL    |       |
+------------+--------------+------+-----+---------+-------+
2 rows in set (0.02 sec)

Let's say that I have two clients editing this table at the same time. This is what they see:

      Client #1                                Client #2

mysql> select * from cities;             mysql> select * from cities;
+---------+------------+                 +---------+------------+
| name    | population |                 | name    | population |
+---------+------------+                 +---------+------------+
| Houston |        100 |                 | Houston |        100 |
+---------+------------+                 +---------+------------+
1 row in set (0.01 sec)                  1 row in set (0.01 sec)

mysql> insert into cities values         mysql> insert into cities values
       ("Miami", 200);                          ("New York", 200);
Query OK, 1 row affected (0.01 sec)      Query OK, 1 row affected (0.01 sec)

mysql> insert into cities values         mysql> insert into cities values
       ("Boston", 300);                         ("Seattle", 300);
Query OK, 1 row affected (0.01 sec)      Query OK, 1 row affected (0.01 sec)

mysql> select * from cities;             mysql> select * from cities;
+---------+------------+                 +----------+------------+
| name    | population |                 | name     | population |
+---------+------------+                 +----------+------------+
| Boston  |        300 |                 | New York |        200 |
| Houston |        100 |                 | Houston  |        100 |
| Miami   |        200 |                 | Seattle  |        300 |
+---------+------------+                 +----------+------------+
3 rows in set (0.00 sec)                 3 rows in set (0.00 sec)

mysql> commit;                           mysql> commit;
Query OK, 0 rows affected (0.00 sec)     Query OK, 0 rows affected (0.00 sec)

mysql> select * from cities;             mysql> select * from cities;
+----------+------------+                +----------+------------+
| name     | population |                | name     | population |
+----------+------------+                +----------+------------+
| Boston   |        300 |                | New York |        200 |
| Houston  |        100 |                | Houston  |        100 |
| Miami    |        200 |                | Seattle  |        300 |
| New York |        200 |                +----------+------------+
| Seattle  |        300 |                3 rows in set (0.00 sec)
+----------+------------+

                                         mysql> start transaction;
                                         Query OK, 0 rows affected (0.00 sec)

                                         mysql> select * from cities;
                                         +----------+------------+
                                         | name     | population |
                                         +----------+------------+
                                         | Boston   |        300 |
                                         | Houston  |        100 |
                                         | Miami    |        200 |
                                         | New York |        200 |
                                         | Seattle  |        300 |
                                         +----------+------------+

At the end of the example, Client #2 actually commits first, which begins a new transaction before Client #1 commits their changes. So they don't see Client #1's changes until they start another new transaction.

Transactions in Dolt

In Dolt, transactions work a little differently. Dolt is the only SQL database you can branch and merge. And in Dolt, every transaction you commit performs a merge as well.

This is easiest to see with an example. We'll modify our table above to add one more column:

mysql> desc cities;
+------------+--------------+------+-----+---------+-------+
| Field      | Type         | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+-------+
| name       | varchar(100) | NO   | PRI | NULL    |       |
| state      | varchar(100) | YES  |     | NULL    |       |
| population | int          | YES  |     | NULL    |       |
+------------+--------------+------+-----+---------+-------+

When a transaction commits, it performs a merge using the current globally shared state. You can imagine this working exactly as in git branch and merge: every time you start a transaction you are checking out a new branch, and every time you commit you're merging it back to master.

dolt transaction graph

So far this isn't any different from a normal database. But let's say that two clients want to add a row with the same key to a table. What happens?

In MySQL and other RDBMS's, the second transaction blocks until the first commits, at which point you'll see an error in the second transaction:

      Client #1                                Client #2

mysql> select * from cities;             mysql> select * from cities;
+---------+------------+                 +---------+------------+
| name    | population |                 | name    | population |
+---------+------------+                 +---------+------------+
| Houston |        100 |                 | Houston |        100 |
+---------+------------+                 +---------+------------+
1 row in set (0.01 sec)                  1 row in set (0.01 sec)

mysql> insert into cities values         mysql> insert into cities values
       ("Miami", 200);                          ("Miami", 200);
Query OK, 1 row affected (0.01 sec)      -- Query blocks until Client #1 commits

mysql> commit;
Query OK, 0 rows affected (0.01 sec)     ERROR 1062 (23000): Duplicate entry 'Miami' for key 'cities.PRIMARY'

Dolt is different. If two clients add the same row to a table, there's no blocking and no error. It's same as if two branches of the database inserted the same rows and then wanted to merge back to master. Since the data is the same, there's no conflict.

dolt transaction graph, no conflict

On the other hand, what happens if two clients add the same key to a table, but the non-key columns aren't the same? That's an error, since there's no way to resolve the merge conflict without human help. The second transaction to COMMIT must roll back and retry their work.

dolt transaction graph, conflict

The same thing happens with two conflicting UPDATE statements to the same row, or when one client UPDATES a row that another client DELETES.

Finally, there's one more interesting wrinkle. If two clients both modify an existing row, but they don't modify the same columns, then there's no conflict, and both sets of modifications succeed. Dolt is the only SQL database in the world that can do this.

dolt transaction graph, two updates to same row

Compatibility

Depending on your application and its needs, the above flexibility might actually be a liability. Databases have been using transaction semantics similar to MySQL's for a long time, and many applications have built business logic around them.

Luckily, the transaction merge algorithm is flexible enough to accommodate most of these concerns, and we'll be adding additional merge modes to be more compatible with traditional RDBMS solutions for those applications that require it. We imagine setting a system variable or a custom property on the transaction itself. But for this initial alpha release, transactions will behave the same way existing command line customers expect from a dolt merge command.

If you need different semantics from your transactions, please let us know.

Other transactional statements

The above examples only deal with START TRANSACTION, COMMIT, and ROLLBACK, which are the more commonly seen transactional statements. But Dolt supports the full range of MySQL transaction statements, which means SAVEPOINT and ROLLBACK TO SAVEPOINT, if you need those too.

@@autocommit works as well, and is on by default. It works exactly the same as it does in MySQL.

For now, every transaction uses the REPEATABLE_READ isolation level, which is the default in MySQL and means that transactions don't see any changes made by other transactions. MySQL has several other isolation levels that you can read about here. These other isolation levels will be coming later, as customers need them.

Using transactions in Dolt today

For this alpha release, you'll need to take a couple extra steps to get it working. First, set an environment variable:

export DOLT_ENABLE_TRANSACTIONS=true

Then start your server with a config file that enables multiple concurrent connections by setting the max_connections field higher than 1, as described in the docs.

A database with branches

Transactions are really just table stakes for a fully functional database. Dolt does some interesting things in terms of merging units of work without conflicts or locks, but what's really exciting about Dolt as a transactional database is the prospect of running a database serving multiple branches simultaneously, from the same host.

dolt server branches

This is still a work in progress not ready to release, but it's what we're building towards in the near future. When you start a Dolt SQL server, clients can connect to a particular HEAD (a branch or other ref) in addition to the default. For example, you might use a connection string that looks like this:

"server=127.0.0.1;uid=root;pwd=12345;database=mydb:branches/feature"
"server=127.0.0.1;uid=root;pwd=12345;database=mydb:workspaces/fromWeb"

Similarly, you'll be able to change the branch you're connected to in the same sesssion by setting a session variable:

SET @@mydb_head_ref = "branches/feature";
SET @@mydb_head_ref = "workspaces/fromWeb";

Or by issuing a special SQL statement:

SELECT DOLT_CHECKOUT("feature");

And of course, you'll be able to merge one branch into another on a live server:

SELECT DOLT_MERGE("feature");

What's really exciting about this idea is that it extends the concept of transactions and makes it possible to persist them beyond a single session, without interfering with any other client's state. Simply make a branch for a user, connect them to it, and let them make as many changes as they want, on whatever schedule they want. Then when they're happy with the changes, merge them back into the production branch.

We're still very early in development for this concurrency model, and we'll have a lot more to say and show off in the coming weeks and months. Stay tuned!

Try it out today

If you haven't tried Dolt yet, download it and get started!

This is an alpha release under heavy development, and it would be great to hear from early adopters about what's missing for their own application's needs. Come join us on Discord to say hi and let us know what you think!

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt

Or join our mailing list to get product updates.