Dolt is a SQL database with Git-style versioning, and like git, when you want to make your data
available to others you need a place to push it to so others can clone / pull it. In both Git and Dolt this is called a remote. A remote
is simply a remote storage location with a user defined name. To add a remote in Dolt you run
dolt remote add <NAME> <URL>
specifying a name which you will use when referring to this remote, and a URL which identifies both the protocol, and location
used to store the data. Details of Dolt remotes can be listed using
dolt remote -v.
$dolt remote add origin https://doltremoteapi.dolthub.com/Dolthub/menus
$dolt remote -v
While DoltHub is a great option for many use cases, there will be times when you want to push
your data to somewhere else. In this blog I'll be covering the different types of remotes that Dolt
Filesystem Based Remotes
Filesystem based remotes allow you to push / pull data from any location that can be accessed via the filesystem. This
may be a directory on your local disk, or any other storage location that can be mounted to the filesystem. To add a
filesystem based remote you need a URL with the
Linux / OSX Examples
dolt remote add origin file:///Users/brian/datasets/menus
dolt clone file:///Users/brian/datasets/menus
dolt remote add origin file:///c:/Users/brian/datasets/menus
dolt clone file:///c:/Users/brian/datasets/menus
It's important to note that a directory-based remote is not the same as a workspace for a dolt clone, and the directory
listed above as a remote file URL is not a dolt repository created or cloned with the Dolt cli. Similarly
a Dolt repository directory's file URL cannot be used as a remote directly.
Cloud Hosted Remotes
In the case where you have cloud hosted infrastructure, it can be very convenient to set up your remotes
to be stored within your cloud project. Dolt supports both Google's and Amazon's cloud platforms currently.
Google Cloud Platform remotes use Google Cloud Storage (GCS). You can create or use an existing GCS bucket to host one or more Dolt
remotes. To add a GCP remote provide a URL with the
gs:// protocol like so:
dolt remote add origin gs://BUCKET/path/for/remote
In order to initialize Dolt to use your GCP credentials you will need to install the
line tool and run
gcloud auth login. See the Google document
AWS remotes use a combination of Dynamo DB and S3. The Dynamo table can be created with any name but must have a primary
key with the name "db".
This single dynamo table can be used for multiple unrelated remote repositories. Once you have a Dynamo DB table, and an
S3 bucket setup you can add an aws remote using a URL with the protocol
aws://. To add a remote named "origin" to my "menus" repository using an S3 bucket named
dolt_remotes_s3_storage and a Dynamo DB table named
dolt_dynamo_table you would run:
dolt remote add origin aws://[dolt_dynamo_table:dolt_remotes_s3_storage]/menus
This same URL can then be used to clone this database by another user.
dolt clone aws://[dolt_remotes:dolt_remotes_storage]/menus
In order to initialize your system to be able to connect to your AWS cloud resources see Amazon's documentation
on configuring your credential file. Dolt also provides additional parameters you may need to provide when adding an AWS remote such as
aws-creds-profile allows you to select a profile from your credential file. If it is not provided then the default profile is used.
aws-region allows you to specify the region in which your Dynamo DB table and S3 bucket are located. If not provided it will use the default region from the current profile.
dolt remote add --aws-creds-profile prod-profile --aws-region us-west-2 origin aws://[dolt_dynamo_table:dolt_remotes_s3_storage]/menus
dolt clone --aws-creds-profile prod-profile --aws-region us-west-2 origin aws://[dolt_dynamo_table:dolt_remotes_s3_storage]/menus
Finally Dolt supports remotes which use the protocol
https://. Remote servers must
implement the GRPC methods defined by the ChunkStoreService interface.
This is the way by which DoltHub itself provides remote functionality. When you add a DoltHub remote via
dolt remote add origin owner/repository or do a
dolt clone owner/repository Dolt is just providing shorthand notation for the URL.
When you run
dolt remote -v you can see that Dolt adds an
https:// URL with the host
doltremoteapi.dolthub.com as can be seen here:
$dolt remote add origin Dolthub/menus
$dolt remote -v
Dolt provides a sample remote server
that we use for integration testing which could be deployed to serve your remotes as well, though you would want to extend the
sample functionality to support things like auth. In our integration tests we install and run the remote server locally:
remotesrv --http-port 1234 --dir ./remote_storage
This starts a server listening on port 50051 for our grpc requests, and runs a file server on port 1234 which provides
upload, and download functionality similar to S3 / GCS locally. We use the url
adding a remote or cloning from this remote server.
Give it a try
Dolt remotes unlock a powerful set of decentralized data tooling. The easiest way to try it is to clone
a repository from DoltHub. You will be able to merge in changes from others working on the same remote,
and collaborate as you do with Git. However, if you don't want your data on DoltHub there are many other
options allowing you to control where your data lives, and who can access it, and if there are other types of remotes you want
to see supported come talk with us on Discord.