Dolt without DoltHub: Other Dolt Remotes

6 min read

Dolt is a SQL database with Git-style versioning, and like git, when you want to make your data available to others you need a place to push it to so others can clone / pull it. In both Git and Dolt this is called a remote. A remote is simply a remote storage location with a user defined name. To add a remote in Dolt you run dolt remote add <NAME> <URL> specifying a name which you will use when referring to this remote, and a URL which identifies both the protocol, and location used to store the data. Details of Dolt remotes can be listed using dolt remote -v.

$dolt remote add origin https://doltremoteapi.dolthub.com/Dolthub/menus

$dolt remote -v
origin https://doltremoteapi.dolthub.com/Dolthub/menus

While DoltHub is a great option for many use cases, there will be times when you want to push your data to somewhere else. In this blog I'll be covering the different types of remotes that Dolt supports.

Filesystem Based Remotes

Filesystem based remotes allow you to push / pull data from any location that can be accessed via the filesystem. This may be a directory on your local disk, or any other storage location that can be mounted to the filesystem. To add a filesystem based remote you need a URL with the file:// protocol.

Linux / OSX Examples

  • Adding a remote
dolt remote add origin file:///Users/brian/datasets/menus  
  • Cloning
dolt clone file:///Users/brian/datasets/menus

Windows Examples

  • Adding a remote
dolt remote add origin file:///c:/Users/brian/datasets/menus
  • Cloning
dolt clone file:///c:/Users/brian/datasets/menus

It's important to note that a directory-based remote is not the same as a workspace for a dolt clone, and the directory listed above as a remote file URL is not a dolt repository created or cloned with the Dolt cli. Similarly a Dolt repository directory's file URL cannot be used as a remote directly.

Cloud Hosted Remotes

In the case where you have cloud hosted infrastructure, it can be very convenient to set up your remotes to be stored within your cloud project. Dolt supports both Google's and Amazon's cloud platforms currently.

GCP Remotes

Google Cloud Platform remotes use Google Cloud Storage (GCS). You can create or use an existing GCS bucket to host one or more Dolt remotes. To add a GCP remote provide a URL with the gs:// protocol like so:

dolt remote add origin gs://BUCKET/path/for/remote

In order to initialize Dolt to use your GCP credentials you will need to install the gcloud command line tool and run gcloud auth login. See the Google document for details.

AWS Remotes

AWS remotes use a combination of Dynamo DB and S3. The Dynamo table can be created with any name but must have a primary key with the name "db".

Create a Dynamo Table with a primary key of: db

This single dynamo table can be used for multiple unrelated remote repositories. Once you have a Dynamo DB table, and an S3 bucket setup you can add an aws remote using a URL with the protocol aws://. To add a remote named "origin" to my "menus" repository using an S3 bucket named dolt_remotes_s3_storage and a Dynamo DB table named dolt_dynamo_table you would run:

dolt remote add origin aws://[dolt_dynamo_table:dolt_remotes_s3_storage]/menus

This same URL can then be used to clone this database by another user.

dolt clone aws://[dolt_remotes:dolt_remotes_storage]/menus

In order to initialize your system to be able to connect to your AWS cloud resources see Amazon's documentation on configuring your credential file. Dolt also provides additional parameters you may need to provide when adding an AWS remote such as aws-creds-profile, and aws-region. aws-creds-profile allows you to select a profile from your credential file. If it is not provided then the default profile is used. aws-region allows you to specify the region in which your Dynamo DB table and S3 bucket are located. If not provided it will use the default region from the current profile.

dolt remote add --aws-creds-profile prod-profile --aws-region us-west-2 origin aws://[dolt_dynamo_table:dolt_remotes_s3_storage]/menus

or

dolt clone --aws-creds-profile prod-profile --aws-region us-west-2 origin aws://[dolt_dynamo_table:dolt_remotes_s3_storage]/menus

HTTP(s) Remotes

Finally Dolt supports remotes which use the protocol http:// and https://. Remote servers must implement the GRPC methods defined by the ChunkStoreService interface. This is the way by which DoltHub itself provides remote functionality. When you add a DoltHub remote via dolt remote add origin owner/repository or do a dolt clone owner/repository Dolt is just providing shorthand notation for the URL. When you run dolt remote -v you can see that Dolt adds an https:// URL with the host doltremoteapi.dolthub.com as can be seen here:

$dolt remote add origin Dolthub/menus

$dolt remote -v
origin https://doltremoteapi.dolthub.com/Dolthub/menus

Dolt provides a sample remote server that we use for integration testing which could be deployed to serve your remotes as well, though you would want to extend the sample functionality to support things like auth. In our integration tests we install and run the remote server locally:

remotesrv --http-port 1234 --dir ./remote_storage

This starts a server listening on port 50051 for our grpc requests, and runs a file server on port 1234 which provides upload, and download functionality similar to S3 / GCS locally. We use the url http://localhost:50051/test-org/test-repo when adding a remote or cloning from this remote server.

Give it a try

Dolt remotes unlock a powerful set of decentralized data tooling. The easiest way to try it is to clone a repository from DoltHub. You will be able to merge in changes from others working on the same remote, and collaborate as you do with Git. However, if you don't want your data on DoltHub there are many other options allowing you to control where your data lives, and who can access it, and if there are other types of remotes you want to see supported come talk with us on Discord.

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt