Introducing Dolt CLI in SQL

December 9, 2020

5 min read

Dolt is Git for data, a SQL database with version control tooling. While Dolt is nearing full MySQL compatibility, its current command line interface (CLI) functionality hasn’t been accessible in SQL. That means that you can’t currently run operations like push or commit as SQL commands. We didn’t plan to implement this functionality for a while until some of our customers started running application layer SQL with Dolt operating as a backend database. This is a powerful use case since it allows you to fully utilize Dolt’s MySQL compatibility with a traceable commit graph. But, you shouldn’t have to access the machine that’s hosting your database to run Dolt commands. In fact, we don’t think you should ever need to leave SQL to use Dolt’s version control features.

So we asked ourselves how can we bridge the gap between Dolt in the CLI layer and Dolt in SQL?

How we first approached this#

When we first thought about this feature the primary challenge was the difference between a SQL session and a command line process. When a command is executed in the shell a single process is spawned that exits on completion. But a SQL server is significantly more complex. There can be multiple readers and writers accessing the same database. Getting Dolt operations to fit this mode is challenging. For example, imagine one writer checking out to a different branch and another trying to do an INSERT statement at the same time. Yikes!

Handling Reads#

We decided that handling the case of multiple readers was easier so we started there. Our solution was custom Dolt system tables. These tables contain information on the repo state ranging from all the branches in the database to its commit history.

For example, query dolt_branches to view information about all the branches in the database:

SELECT *
FROM dolt_branches

+--------+----------------------------------+------------------+------------------------+-----------------------------------+-------------------------------+
| name   | hash                             | latest_committer | latest_committer_email | latest_commit_date                | latest_commit_message         |
+--------+----------------------------------+------------------+------------------------+-----------------------------------+-------------------------------+
| 2011   | t2sbbg3h6uo93002frfj3hguf22f1uvh | bheni            | brian@dolthub.com     | 2020-01-22 20:47:31.213 +0000 UTC | import 2011 column mappings   |
| 2012   | 7gonpqhihgnv8tktgafsg2oovnf3hv7j | bheni            | brian@dolthub.com     | 2020-01-22 23:01:39.08 +0000 UTC  | import 2012 allnoagi data     |
| 2013   | m9seqiabaefo3b6ieg90rr4a14gf6226 | bheni            | brian@dolthub.com     | 2020-01-22 23:50:10.639 +0000 UTC | import 2013 zipcodeagi data   |
| 2014   | v932nm88f5g3pjmtnkq917r2q66jm0df | bheni            | brian@dolthub.com     | 2020-01-23 00:00:43.673 +0000 UTC | update 2014 column mappings   |
| 2015   | c7h0jc23hel6qbh8ro5ertiv15to9g9o | bheni            | brian@dolthub.com     | 2020-01-23 00:04:35.459 +0000 UTC | import 2015 allnoagi data     |
| 2016   | 0jntctp6u236le9qjlt9kf1q1if7mp1l | bheni            | brian@dolthub.com     | 2020-01-28 20:38:32.834 +0000 UTC | fix allnoagi zipcode for 2016 |
| 2017   | j883mmogbd7rg3cfltukugk0n65ud0fh | bheni            | brian@dolthub.com     | 2020-01-28 16:43:45.687 +0000 UTC | import 2017 allnoagi data     |
| master | j883mmogbd7rg3cfltukugk0n65ud0fh | bheni            | brian@dolthub.com     | 2020-01-28 16:43:45.687 +0000 UTC | import 2017 allnoagi data     |
+--------+----------------------------------+------------------+------------------------+-----------------------------------+-------------------------------+

Our other system tables are:

dolt_log - Stores the commit log
dolt_branches - Stores the names and commit info of the branches
dolt_docs - Stores information about the doc files like README.md
dolt_diff - Stores how rows change over time
dolt_history - Stores a rows value at every commit
dolt_schemas - Stores SQL Schema fragments

Handling Concurrent Writes#

In the Spring of 2020, to deal with concurrent writes, we introduced a new SQL mode called detached head. Think of it like the detached head state in Git, where the user is working on a particular commit history but not directly modifying any particular branch. This is the kind of operation that our Merkle Tree like storage layer is optimized for handling efficiently. Let’s see how this looks in practice.

First we define a session variable called <dbname>_head where dbname is the name of the database that provides an interface for reading and writing the HEAD commit for a session. With the COMMIT function we are able to write a new commit in this detached head state that contains whatever changes we made. We are then able to directly update the master branch with this updated commit if no one else has updated it:

-- Set the current database for the session
USE mydb;

-- Set the HEAD commit to the latest commit to the branch "master"
SET @@mydb_head = HASHOF("master");

-- Make modifications
UPDATE table
SET column = "new value"
WHERE pk = "key";

-- Modify master if nobody else has changed it
UPDATE dolt_branches
SET hash = COMMIT("modified something")
WHERE name == "master" and hash == @@mydb_head;

-- Set the HEAD commit to the latest commit to the branch "master" which we just wrote
SET @@mydb_head = HASHOF("master");

If someone has in fact updated master while we are working in our detached head state, one can use the MERGE() function:

-- Set the current database for the session
USE mydb;

-- Set the HEAD commit to the latest commit to the branch "feature-branch"
SET @@mydb_head = HASHOF("feature-branch");

-- Make modifications
UPDATE table
SET column = "new value"
WHERE pk = "key";

-- MERGE the feature-branch into master and get a commit
SET @@mydb_head = MERGE('feature-branch');

-- Set the HEAD commit to the latest commit to the branch "master" which we just wrote
INSERT INTO dolt_branches (name, hash)
VALUES("master", @@bug_head);

Moving towards full CLI functionality#

Unfortunately this created substantial differences in our two core interfaces for executing the same operation, burdening our users with parsing the documentation for subtle differences. You really shouldn’t have to learn about system tables or session variables to start using Dolt in SQL. More importantly, if all users want to do is just create a new commit in their editor it shouldn’t take multiple steps. So we took a step back and decided that “If you know the Dolt CLI then you know the Dolt SQL Statements.”

Here’s the first one that we’ve implemented so far DOLT_COMMIT(). Let see how to use it:

SELECT DOLT_COMMIT('-a', '-m', 'This is a commit');
SELECT DOLT_COMMIT('-m', 'This is a commit');
SELECT DOLT_COMMIT('-m', 'This is a commit', '--author', 'John Doe <johndoe@example.com>');

Here we are writing a commit that directly affects whatever branch we are working on right now. That means we are no longer working in a detached head state. Any write queries will modify the in memory state, and automatically update the working set on disk.

This is an important distinction from what we did above with COMMIT(). With COMMIT(), we passed on the complexity of managing commit hashes to the user to ensure that the user did work in a detached head state.

When to use DOLT_COMMIT vs COMMIT#

For now we imagine DOLT_COMMIT to be primarily used in the SQL command line or through a SQL editor like Tableplus. Users can make updates in SQL and immediately make a commit before doing anymore changes.

If Dolt is being used as a backend database, then COMMIT syntax should be used to avoid any concurrency problems. In the future we plan to make DOLT_COMMIT implicitly handle these type of concurrency issues.

Conclusion#

Our goal is to reduce friction between Dolt in the CLI layer and Dolt in SQL. With less friction we forsee Dolt’s UX significantly improving allowing for more use cases. With this in mind, we will soon be implementing the following functions:

DOLT_MERGE
DOLT_CHECKOUT
DOLT_STATUS
DOLT_ADD

If you have any opinions on what you would like to see next, come chat with us in our Discord Channel!

Blog

PRODUCTS

KEYWORDS