Transactions in Dolt? Hold My Beer

October 23, 2023

12 min read

Here at DoltHub, we strive to ensure the Dolt database can be a drop in replacement for MySQL. We've written a lot about this, and this particular post continues this tradition. The tool of this round is called FusionAuth, which is an identity management tool that works great with MySQL.

You can run FusionAuth on Dolt 🎉! I'm not going to tell you all the steps because I'd just be repeating all of their setup instructions.

This enthusiastic proclamation of support did not come for free. Turns out there were a few gaps and bugs in JSON support which needed to be addressed, but more interestingly we needed to add support for the SKIP LOCKED phrase - with a NoOp. And that's what the rest of this post is about.

We believe in truthful advertising, so we are going to go deep in this post about how transactions and locks work - because it's different from MySQL.

The Family Genes

Git and MySQL had a baby, and it's Dolt. This is one of those catchy things we like to say, and it comes across in the user experience of the product. Looking deeper though, we merged two of the primary concepts of Databases and Version Control: Transactions and the Three Way Merge.

Transactions are a critical part of behaving with ACID properties. Specifically, transactions mark a starting state, followed by a delta which can be applied (Atomic, Consistent, Isolated), and then finalization (Durable). Transactions have traditionally been modeled as a serial set of changes recorded in a journal. This works well enough until you have multiple queries being performed at the same time.

Q1 and Q2 have no contention. They are free to change any data with each write.
Q3 and Q4 are attempting to operate on the database at the same time. In order to allow that to happen, the database requires that the data touched by each query is isolated from the other. Ultimately one of the queries finishes first, and if all goes well then each will succeed.
After all is done, the data residing in the database is the only version accessible.

SQL has multiple ways at the application level for code to ensure that contention doesn't happen, and we'll get into that below. Most important thing to take away is that the concurrency is handled by the server implementation directly, and is not facilitated by the data model itself.

Source control systems have a different core concept in their DNA - The Three Way Merge. There is a Base state, and two parties would like to alter that state, we'll name them Left and Right. A three way merge takes changes Left wants to make and compares them to the changes of Right, and determines if there are conflicts. If there are no conflicts, the merge completes, and a new state exists.

Git popularized the use of a Directed Acyclic Graph (DAG) for commits. The DAG is what makes branching and merging in Git so effective. Any number of commits can be made between two branches, and a merge can be attempted if the two commits in question have a common base. Git even has a command called merge base specifically for this purpose (Dolt has it too!).

A is the first commit
B and C branch from it. Strictly speaking, branch names are not required. If two commits refer to the same parent, they are both branches.
The author of C continued with D. This happens in parallel to main being updated to point to B.
E is the merge between B and D. The three way merge is between A (base), B (left) and D (right). The important piece to call out is that can change any part of that data, or source code, and there is no blocking on either author. It is only at the time that an attempt to merge in those changes that the three way merge is performed between base, left, and right that we can verify that conflicts are not present - and commit E is born.
Only after E is created and written to disk can the main branch progress forward to reflect the new changes from the merge.

Here is the punch line: Dolt performs a Three Way Merge for every Transaction. The DNA of MySQL and Git really did come together to create our beautiful chimera

Let's get Sloppy

Oh concurrency and the perils of non-atomic updates. To convey the importance of this topic, let's talk about Beer. Specifically, how much beer is in the fridge. You never want to be wrong about this. In this magical IoT fridge, there is a database which orders you more beer when needed. We are going to make this a single row table. Couldn't be more simple:

beer> SELECT * FROM cold_ones;
+--------+------------+
| fridge | beer_count |
+--------+------------+
| 1      | 6          |
+--------+------------+

To remove a single beer from the fridge, you could do the following:

beer> SET @current_beers = (SELECT beer_count FROM cold_ones WHERE fridge = 1);
beer> UPDATE cold_ones SET beer_count = @current_beers - 1 WHERE fridge = 1;
beer> SELECT * FROM cold_ones;
+--------+------------+
| fridge | beer_count |
+--------+------------+
| 1      | 5          |
+--------+------------+
1 row in set (0.00 sec)

When you add beers to the fridge, you always want to get the number back up to 6, regardless of how many are in the fridge now. In order to get the number of beers up to 6, we need to know how many beers to add. So you could do something like this:

beer> SET @current_beers = (SELECT beer_count FROM cold_ones WHERE fridge = 1);
beer> UPDATE cold_ones SET beer_count = 6 WHERE fridge = 1;
beer> SELECT 6 - @current_beers AS beers_added;
+-------------+
| beers_added |
+-------------+
| 3           |
+-------------+
1 row in set (0.00 sec)

This is all fine and good right up until the point where you have housemates who also like to add and remove beers from the fridge. What happens when two people at attempting to add beers at the same time? If they both calculate @current_beers at the same moment, then one decrements the beer count while the other attempts to set it to 6, what happens?

Concurrency is hard, and the business logic of every application I've ever worked on requires having clarity about where atomic decisions are being made. Put in general terms, if we make a decision based on information that can change before we take action on the decision then we have a race. In concrete terms, if you have two independent people at the beer fridge and they both see the beer count is 3, they may both take action and add 3 beers to the fridge resulting in 9 total beer. But they both updated the beer count to 6 in the database. Oh No! Having 9 beers isn't the problem, free beer! The problem is that your fridge thinks it only has 6, so 3 are lost forever.

SELECT ... FOR UPDATE

How do we ensure that application logic is sound such that two processes can work with the data at the same time, and not run into race conditions?

SQL is 50 Years Old and there have been several ways to solve concurrency in it. One of the tried and true mechanisms in SQL is the dreaded SELECT ... FOR UPDATE. Here is how the above code would look in a SELECT FOR UPDATE:

mysql> SET autocommit = 0;
mysql> START TRANSACTION;
mysql> SELECT * FROM cold_ones where fridge = 1 FOR UPDATE;
+--------+------------+
| fridge | beer_count |
+--------+------------+
|      1 |          6 |
+--------+------------+
1 row in set (0.00 sec)

mysql> SET @current_beers = (SELECT beer_count FROM cold_ones WHERE fridge = 1);
/* More queries, expensive computations, time passes */
mysql> UPDATE cold_ones set beer_count = @current_beers - 1 where fridge = 1;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> COMMIT;

A comment on disabling autocommit. Autocommit can be useful in many scenarios, but when it come to concurrency between your read and and a business critical write, it's a non-starter. The only situation where autocommit can be used for application purposes is when you have all business logic encoded in a single query. This example could definitely be written as one query, but many applications require dozens of queries to ensure the correct final result. For the rest of this post I'll assume autocommit = 0.

In the queries above, SELECT ... FOR UPDATE is the key protection against race conditions - and it's a blunt tool. The effect it has is that other queries running at the same time, specifically other queries which attempt to UPDATE that row will be blocked until this transaction is committed. It is nearing Halloween, and surely there is no better way to scare an old timer programmer like me than to ask to recount the number of disasters caused by blocking during a SELECT FOR UPDATE. Learn from your elders, and work towards a future where you never depend on stopping and waiting for rows in a database. Everyone will be better off, and there is already too much suffering in the world.

The introduction of SKIP LOCKED could be an answer to our prayers, maybe? What SKIP LOCKED does when added after FOR UPDATE is rather than block it will return all the found rows which are not locked, thereby granting the locks to the second query. To use this feature effectively you need a particular application model which allows you to select a subset of something, so for example if we had 10 different beer fridges to manage, and are happy to lock any fridge we could:

mysql> SELECT * FROM cold_ones LIMIT 1 FOR UPDATE SKIP LOCKED;

then make the update on whatever we selected... I'm not gonna lie, that's awkward. Your application needs to be written specifically to account for this feature of SQL, all so you can avoid blocking when doing critical work.

The model of locking the entity you care about, working on it, then completing your work and releasing the entity is the solution a first grader can understand. It's the equivalent of checking out a book from the library - everyone else just needs to wait in line until you finish reading it. SELECT FOR UPDATE has stood the test of time because of this simplicity. There are other locking mechanisms, but the blunt tool of SELECT FOR UPDATE will probably be abused for a long time to come.

But what does Dolt do? SELECT FOR UPDATE is a... NoOp. And therefore SELECT FOR UPDATE SKIP LOCKED is also a NoOp. Specifically, you have one query which does a SELECT FOR UPDATE in one transaction, and another query which does the same in another transaction, neither will block. They will both attempt to write to that row, and one will fail.

Optimistic Locking

Every modern CPU has a set of compare and swap instructions, and these are the foundation for making correct decisions in the face of our world filled with race conditions. The idea is that you can update a value if, and only if, the current value is what you expect it to be. At the application level, we call this Optimistic Locking. You see this in the atomic interfaces of many programming languages. In Go, you'd see something like this:

    currentValue := atomic.LoadInt32(&racyValue)
    // long complicated decision making... take a lot of time.
    // ...during this time, another process may have changed the racyValue
    updated := atomic.CompareAndSwapInt32(&racyValue, currentValue, 42)
    if !updated {
        // try again, error, etc.
    }

This happens on many different levels. In databases you have the transaction log, usually called a journal, and in the implementations of all of these they are using compare and swap. This may be at the disk write level or all the way up into the use of the atomic package, and multiple places in between. Compare and Swap is a critical piece of any computer which has more than one process running - which is all of them.

Very large distributed applications can get a very long way with optimistic locking. DynamoDB Does this with conditional writes. I've written SQL applications which use commit triggers to enforce optimistic locking on all writes. Compare and Swap is the way to go for responsive applications that need to scale... and Dolt uses them exclusively.

Dolt brings Compare and Swap to the table as it's transactional sentry. We call this a drop in replacement for MySQL!? Yes, yes we do.

How Dolt works

Similar to other databases, users can connect to the server and both be working on the same data. In Dolt, that's a branch, but let's not get hung up on that and just assume there is one branch that everyone uses: main.

As stated before, Dolt takes from the DNA of MySQL and Git. On the Git side, all data in your database is referred to by a tree structure called a Prolly Tree. Using this tree, we store all of your data, and we can refer to a hash key as a unique immutable identifier for a point in time. In a user's session, when you START TRANSACTION we record the hash key of the session, and we call it the WORKING SET. This is going to be your Base in the three way merge which is coming. Remember that since autocommit is disabled, you are effectively in your own isolated space until you COMMIT. Think of the WORKING SET as your private workspace.

If you perform a SELECT FOR UPDATE in this private workspace, it will never block because Dolt doesn't actually differentiate SELECT from SELECT FOR UPDATE. When you finish your work and COMMIT, if you attempt to write to a row that someone else wrote to during that time, you will get a merge conflict and the update will fail. Let's reuse the image from above.

As before, A is our starting commit
Two sessions start working, and they both start with the same working set.
Session 1 COMMITs, which results in a three way merge between the BASE, A (left) and B (right). Since there is no difference between A and BASE, there are no conflicts and HEAD is moved to B.
Meanwhile, Session 2 is hard at work and they created the D change. When they COMMIT, Dolt performs a Three Way Merge between the BASE, HEAD, and D.
Prior to the commit finishing, the merge is completed and written to disk in the form of E.
The final step is to update the HEAD (which is performed with a Compare and Swap).

The chimera in the room is conflicts. What is a conflict, and what does it mean to say that Dolt believes there are no conflicts? Our documentation has all the details, and it's particularly more complicated for schema changes. Data changes are pretty straight forward:

Conflicts are detected on a cell-level. If two operations modify the same row, column pair to be different values, a conflict is detected. Primary key values are used to identify rows across versions for the purpose of diff and merge.

Getting back to our example of two people modifying the beer count for fridge 1. In MySQL, the SELECT FOR UPDATE call for Session 2 would just block until the Session 1 was done. In Dolt, both sessions would continue and attempt to commit, and since dolt detects the conflict it will reject the second write. Simple, see?

Consequences

As a drop in replacement for MySQL, we must confess that our transaction and commit model is different. There are applications that work in MySQL which will not behave well with our approach. Examples:

Two transactions which make the same change to the data will both succeed because there are no conflicts. In the beer fridge example, when both transactions set the number of beers to 6, they both succeed. This may be ok, but it may not be either. Depends on the expectations of your application.
Starving long running queries. With MySQL, the first query to request the lock wins, and it has it until the session is terminated. If you have many short transactions and a few long transactions, the SELECT FOR UPDATE approach will ensure that FIFO is followed. With Dolt, those short transactions will win, and the long transaction may be in retry hell until it gives up.
Altering different columns. If Session 1 modifies Column A while Session 2 modifies column B, according to Dolt's conflict rules there is no conflict, and both writes will succeed. This differs from MySQL - when you lock the row you are locking the whole row.
Select for update with no updates. Sometimes at the beginning of a query you don't know if you'll make an update or not, but you need to lock the row in order to determine what's required. It's possible to leverage this as a guard in MySQL to ensure that only one process is doing something, but in Dolt everything is going to succeed because there will never conflicts if no update occurs. Subtle difference.
The most important consequence of this design is that Dolt servers never block. This is a good thing, and we believe that applications that take advantage of it will be better for it.

Drop In Replacement?

I have over 20 years of experience in high scale applications, many of which have SQL databases at their core. I spent more than a year moving Amazon services off of Oracle, and I promise it's hard work to "Drop In" a new database. Getting back to FusionAuth, it works pretty well if I drop in Dolt instead of MySQL, but have I load tested it with high contention? No. My advice to anyone building software - test test test. Verify that important pieces work. Don't just wing it.

Strictly speaking, we don't support SELECT FOR UPDATE, and we say so in our documentation. So while the SQL statements may get executed, the lock semantics are different and will produce different results.

Branch More, Conflict Less

Applications which build on Dolt should keep this optimistic locking behavior in mind to ensure that you are producing a sound application. Some users create a branch for every transaction, so they can have full isolation that they can reason about in their commit graph. It's also possible to use columns as guards, such as a last_updated column which will always result in conflicts. Given the importance of ensuring you are making sound decisions in your application you need to ensure that races against your data aren't occurring. If you have questions about how Dolt works and how best to use it, come chat with us on discord!

Blog