Attach of the (Database) Clones! 🤖

6 min read

Dolt DB is the world's first fully versioned SQL database. It combines the distributed versioning features of Git with the power and expressiveness of a relational database.

In the same way the Git CLI allows you to branch, merge, diff, push, pull, and clone the files in your repositories, the Dolt CLI allows you to perform the same operations with the data in your Dolt databases. In addition to the Git-inspired CLI for command line access, Dolt's versioning features are also available through a SQL interface via built-in Dolt stored procedures. These stored procedures allow you to access all the powerful versioning features of Dolt in the same way you or your application would run a query against your database.

We recently added support for cloning Dolt databases through the SQL interface and in today's blog post, we're going to explain what Dolt database clones are, and show you how to use the new dolt_clone() stored procedure to create an exact clone of an existing Dolt database.

Clones

What is a Dolt database clone?

Let's start off by explaining what a Dolt database clone is and when you should use one...

A Dolt database contains all the data you've stored in it, but it also contains a record of all the versions and changes to your data across all branches, too. When you clone a Dolt database, just like when you clone a Git repository, you get the full database, along with all the branches, and all versioned history of that database – it's an exact copy of the source database and all its history.

Once we've cloned a Dolt database, the original database we cloned from is registered as a remote with the default name origin. From this point forward, we can synchronize with that origin database by using the Dolt CLI to push and pull changes, or by using the dolt_push and dolt_pull stored procedures. (Note that pushing changes to a remote database requires authentication, which currently requires using the dolt login from the command line.)

If you wanted to copy a non-versioned database, you'd likely use a tool like mysqldump to export all the current data, then load it back in on another database. This is often a cumbersome process and only gives you a "hard fork" of the database's current data. Since there's no versioning information tracked, you don't have any record of the history of your data or any audit trail showing you where the data came from, when it changed, or who changed it. There's also no connection back to the original data, so synchronizing with any changes means a full copy of all the data again, even if only a few values changed. So, in addition to being an easier process, cloning a Dolt database gives you much more information than simply making a copy of a non-versioned database.

Attaching a Database Clone

Now that we know what database clones are, let's create one!

We're going to be using Hosted Dolt to spin up a new Dolt database deployment and create a clone of an existing database stored on DoltHub, but we could do the same thing by running dolt sql or running dolt sql-server and connecting to the sql-server through any MySQL compatible database client or tool. Hosted Dolt is a natural fit for this example, since it's an environment where we only have access to a SQL interface – we don't have an option of getting to a command line shell, so it's a perfect environment for showing off the dolt_clone() stored procedure.

It's easy to launch a new Hosted Dolt database – just create a deployment and give it a minute or two to start up, then you'll see the connection information for your new database.

When we log into a SQL connection to our new deployment, we see that only the two system-created databases exist (mysql and information_schema):

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
+--------------------+
2 rows in set (0.02 sec)

At this point, you could run create database to create an empty database, but we want to start from an existing database, so we're going to use the dolt_clone() stored procedure to clone a database and all of its branches and versioned history to our new deployment.

Just like the dolt clone CLI command, the dolt_clone() stored procedure makes it trivially easy to clone a database from DoltHub. I've got a simple database hosted on DoltHub, called jfulghum/DoltDiffMagic, that I'm going to use in this example.

mysql> call dolt_clone('jfulghum/DoltDiffMagic');
+--------+
| status |
+--------+
|      0 |
+--------+
1 row in set (0.82 sec)

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| DoltDiffMagic      |
| information_schema |
| mysql              |
+--------------------+
3 rows in set (0.02 sec)

Sure enough! After dolt_clone completes, we see the DoltDiffMagic database is available! 🎉 We can use the database to work with its data, then taking a quick look at the dolt_log system table shows us that in addition to the current data, we have the full, versioned history of our database, too.

mysql> use DoltDiffMagic;
Database changed

mysql> select * from dolt_log limit 10;
+----------------------------------+-----------+-------------------------+-------------------------+--------------------------+
| commit_hash                      | committer | email                   | date                    | message                  |
+----------------------------------+-----------+-------------------------+-------------------------+--------------------------+
| pt6eptfopqpfq6qf8nrisnf45qg6l9n5 | jfulghum  | jason.fulghum@gmail.com | 2022-05-09 23:45:09.538 | Adding store hours table |
| d4q5m9ks9f8b4eq1rjskicnsaiq343cn | jfulghum  | jason.fulghum@gmail.com | 2022-03-31 16:02:54.613 | gum is in stock now!     |
| 4fevgeq4t6r55d9epvn28a3rj3csdsfv | jfulghum  | jason.fulghum@gmail.com | 2022-03-31 14:56:19.978 | updating row 1           |
| 4koq08rg8iuuefk7ssckoo7m56gvr2rj | jfulghum  | jason.fulghum@gmail.com | 2022-03-31 14:56:07.131 | inserting a new row      |
| taf8r3gh7rv3ei0u6f7jio326llgtb61 | jfulghum  | jason.fulghum@gmail.com | 2022-03-31 14:55:50.706 | updating row 1           |
| 19pba1s8h7esi2rqrspvarfg61ietsfc | jfulghum  | jason.fulghum@gmail.com | 2022-03-31 14:55:44.64  | updating row 1           |
| o70kqfvhfr7b1pntp1t8cnn69o9r353j | jfulghum  | jason.fulghum@gmail.com | 2022-03-31 14:55:37.062 | updating row 1           |
| f7gjqm5foil1436kk53iv6e47pc3g4j4 | jfulghum  | jason.fulghum@gmail.com | 2022-03-31 14:55:31.51  | updating row 1           |
| gkorppmpemtnh91jrhn46ptcqphm2iaj | jfulghum  | jason.fulghum@gmail.com | 2022-03-31 14:55:12.365 | inserting a new row      |
| u8cvv5pm9elftq9f5c7juovcckq7nlqb | jfulghum  | jason.fulghum@gmail.com | 2022-03-31 14:54:47.729 | creating simpletable     |
+----------------------------------+-----------+-------------------------+-------------------------+--------------------------+
10 rows in set (0.03 sec)

This versioned history is where the magic of Dolt really starts to shine through. Dolt provides several ways to inspect your data on any branch, at any commit or point in time. For example, we can query historical data using as of easily with Dolt:

mysql> select * from inventory;
+---------+------------+-------+-------+
| item_id | name       | price | count |
+---------+------------+-------+-------+
|       3 | blender    |    15 |    99 |
|       2 | nintendo   |   100 |     9 |
|       5 | walkman    |    10 |    11 |
|       4 | camera     |    50 |     6 |
|       6 | puppy chow |    15 |     8 |
|       7 | gum        |     1 |    12 |
+---------+------------+-------+-------+
6 rows in set (0.02 sec)

mysql> select * from inventory as of TIMESTAMP('2022-03-31 14:55:31.51');
+---------+------------+-------+-------+
| item_id | name       | price | count |
+---------+------------+-------+-------+
|       2 | nintendo   |   100 |     9 |
|       4 | camera     |    50 |     6 |
|       3 | blender    |    15 |    99 |
|       6 | puppy chow |    15 |     8 |
|       5 | walkman    |    10 |    11 |
+---------+------------+-------+-------+
5 rows in set (0.02 sec)

We can access the full history of our data from this clone – it's identical and indistinguishable from the source database we cloned from.

Summary

The new dolt_clone stored procedure is a powerful tool for making an exact copy of an existing Dolt database, including all the history, branches, and version info. It's particularly useful in environments like Hosted Dolt, where you only have access to a SQL interface and don't have any command line access, and in situations where you want to automate database setup via SQL scripts instead of shell scripts.

Being able to easily clone an exact replica of a database, along with all the versioned history and branches, is a powerful tool! If there's a problem in production, you can easily clone the database to another location to debug and audit the data. If you need to perform ad-hoc analysis across multiple, separate databases, you can clone each database to the same SQL server, then easily join the data from those different databases together in whatever queries you need. Because cloning a Dolt database tracks a remote to the original database, it's easy and fast to use dolt_pull() to efficiently synchronize the clone with any new changes from the original.

May data versioning be with you!

If you're curious about Dolt and want to learn more, come by and talk to us on Discord!

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt

Or join our mailing list to get product updates.