How To Use Dolt SQL Remotes

September 22, 2021

3 min read

TLDR

We recently introduced a new Dolt feature: interacting with remotes from SQL server sessions.

Dolt is a modern relational database that supports Git style fetch, pull, and push semantics. Although originally developed for the command line, we now support remote commands in application databases via SQL functions. You can view remotes info with the dolt_remotes system table, and interact with your remotes via the SQL functions dolt_pull(), dolt_push(), and dolt_fetch(). This gives Dolt applications connected to the database full control of remote interactions. Imagine the possibilities!

What are Remotes?

Git defines several objects and actions for communicating state between servers.

The objects:

Remote: a common communication endpoint for multiple clients to share state (ex: file:///Users/max-hoffman/Documents/dolthub/scratch/test/remote)
Refspec: a namespace identifier that is cognizant of remotes (ex: refs/heads/master:refs/remotes/origin/master is a refspec that maps our local master branch to a corresponding remote named origin/master)
Commit: a unit of persisted work, which in Dolt is comprised of chunk files

The actions:

Push: update the remote state with local changes
Pull: update a local version from a shared remote state
Clone: initialize a new repository from a previously saved shared state

The Dolt CLI duplicates Git's behavior for remote commands. For example, dolt push origin master pushes the local master refspec to the remote named "origin", the same as git push origin master but with data.

Dolt can also be used as an application database server, where users connect to the database with a MySQL client. Before today's change, users would have to stop a SQL-server to dolt push or dolt pull data, and resume the server afterwards.

The most recent update lets users run statements like select dolt_push(‘origin’, ‘master’) in a SQL client. The entire lifecycle of a Dolt database can now be controlled through SQL connections.

Demo of Remote Functions

dolt_push, dolt_fetch, and dolt_pull work the same as CLI counterparts. But before running these functions we need a Dolt repository and remote folder:

$ mkdir -p test/{sql-repo,remote}
$ cd test/sql-repo
$ dolt init
Successfully initialized dolt data repository.
$ dolt remote add origin file://../remote

We can view the origin remote using the dolt_remotes table:

$ dolt sql -q "select * from dolt_remotes"
+--------+-----------------------------------------------------------------+--------------------------------------+--------+
| name   | url                                                             | fetch_specs                          | params |
+--------+-----------------------------------------------------------------+--------------------------------------+--------+
| origin | file:///Users/max-hoffman/Documents/dolthub/scratch/test/remote | [refs/heads/*:refs/remotes/origin/*] | map[]  |
+--------+-----------------------------------------------------------------+--------------------------------------+--------+

And we can push changes to our remote:

$ dolt sql -q "select dolt_push('origin', 'master')"
+-------------------------------+
| dolt_push('origin', 'master') |
+-------------------------------+
| 1                             |
+-------------------------------+

To pull into our first repo, we will have to commit a change from a separate repo:

$ cd ..
$ dolt clone file://./remote cli-repo
cloning file:///Users/max-hoffman/Documents/dolthub/scratch/test/remote
2 of 2 chunks complete. 0 chunks being downloaded currently.
$ cd cli-repo
$ dolt sql -q "create table test (a int primary key)"
$ dolt commit -am “new commit”
$ dolt push origin master

Returning to our first repo, we pull the latest commit and inspect our new table:

$ cd ../sql-repo
$ dolt sql -q "select dolt_pull('origin')"
+---------------------+
| dolt_pull('origin') |
+---------------------+
| 1                   |
+---------------------+
$ dolt sql -q "show tables"
+-------+
| Table |
+-------+
| test  |
+-------+

You have to use select dolt_pull() because dolt_pull() by itself isn't valid SQL syntax. With this SQL sleight on hand, you now control the entire lifecycle of you data within a running server!

Why Are SQL Remotes Interesting?

Remotes are a first step towards replication. Transmitting commits to remotes allows production servers to safely recover from disk failures. We are using the same primitives as push and pull to deliver this soon.

Gracefully transitioning from a SQL server to a local CLI client is a related but unique feature. Consider migrating a database between versions. With a typical database you would either take the instance offline during a migration, or run two full instances that alternate between production and development. With a Dolt database, you can migrate your database locally, push changes to a remote, and then have the server pull and merge the migration without downtime. Because Dolt is versioned, the production instance can instantly rollback to old data and schema versions without interruption.

Future work

We are currently working on auto-replicating commits, an extension of this blog’s work. A production Dolt server will soon be able to backup commits, share changes wth collaborators, and recover from disk failures.

Remotes and branches are read-only during an ongoing sql session. The repo_state.json holds this configuration in a JSON file, and files live outside of a SQL session’s purview. For a database to write remote and branch updates, we would need to change how we encapsulate shared database contents with private configuration files.

Lastly, Dolt does not parse as broad a range of refspecs as Git yet. Feel free to reach on on Discord if you notice a feature you'd like to see added.

Conclusion

You can now interact with Dolt remotes from SQL server clients, previously only available in the CLI. Users can select dolt_push(), select dolt_pull(), select dolt_fetch(), and select * from dolt_remotes. This behavior is a first step towards asynchronously replicating Dolt application databases, but also allows for interesting development workflows. Users can debug and update production servers from local filesystems using Git style primitives. Those updates are versioned, reproducible, and shareable, like everything else in Dolt.

If you would like to ask us questions about Dolt, data versioning, remotes, or anything in between reach out to us on our Discord server.

Blog