Remote Authentication Explained
Dolt is the world's first SQL Database with Version Control features. It's like MySQL and Git had a baby.
Git lacks authentication because there is no server that is Git-specific. All of the things we use to host Git repositories, GitHub, GitLab, BitBucket, and so on, all intercept traffic in the middle of git push
and gate users at that stage.
Dolt deviated from Git in this regard. Our data-sharing strategy included the creation of Dolthub.com, which required users to create credentials to push and pull. Later, when we decided to support dolt sql-server
, we had to enable authentication to be a credible drop-in replacement for MySQL. The two authentication mechanisms differ and sometimes collide. That's what we are going to discuss today.
Authentication?
Always good to make sure we're on the same page with terms. When we need to perform an action, such as updating the data of your database, we want to ensure that the individual or process is allowed to do so. This is called Authorization. Authorization is impossible if you cannot determine who is attempting to make the change in question. The only Authorization questions you can answer if you don't know who you are talking to are: "World Writable?" and "World Readable?"
We generally never want anything to be World Writable. Even within the confines of a corporate intranet where security concerns seem low, you will want to know the name of the person who dropped all of your tables by mistake.
Authentication is the art of "Who" in this equation. The identity of the user or process is a necessary first step before you can determine if they can make the action in question.
There are many ways to authenticate users in general. Username/Password is the simplest—the user knows a secret. OAuth is an open standard for delegated authentication where a third party tells you who the user is. SAML is similar and possibly a more corporate solution. There are SSH Keys, Client Side Certificates, Tokens, and many other Authentication mechanisms.
DoltHub and DoltLab
DoltHub.com was the first way we enabled people to share Dolt databases. The landscape was pretty simple: users were required to have the Dolt CLI installed, and they pushed and pulled against Dolthub.com. Anyone familiar with GitHub will be comfortable in this mode.
DoltLab too! DoltLab is like taking Dolthub.com, wrapping it up with a bow, and deploying it to your on-premise hardware. While DoltLab didn't exist in the early days, everything I say here pertains to DoltLab today. It's the same code. Anywhere we say "Dolthub.com" here, you can swap in DoltLab if that is what you're working with.
The Dolt CLI has two commands that pertain to authentication against Dolthub.com: dolt login
and dolt creds
. The login command was recently discussed in another blog. Honestly, it's just sugar on top of the dolt creds new
command, but I digress. The point is that these two commands are all about Authenticating to DoltHub and DoltLab instances.
The result of dolt login
or dolt creds new
is to place a secret—a token—on your local machine, which serves as your identity. When you set it up, you will be directed to a webpage where you specify the Token ID you want to associate with your account. This page is familiar if you've run dolt login
before.
It's worth calling out that the webpage you connect to has authenticated you as well. On Dolthub.com, you can assert your identity using Google, GitHub, or an email/password. For DoltLab we can also authenticate with OIDC. The point is that the trust that Dolthub has that you are the owner of that token is possible because you were authenticated to the website at the time you added your key.
Once you have created your personal token, dolt clone
, dolt pull
, and dolt push
will use that token whenever it talks to a remote endpoint with the correct host. The token is stored in a location in your home directory, so it's not associated with your local database copies.
With all of that, the invocation of commands like dolt clone
will simply grab the secret from your local disk drive and authenticate with the remote using that token.
sql-server
As A Go Between
There is another use of this token which is important to highlight: dolt sql-server
will use your tokens as well. dolt sql-server
is how we start our MySQL compatible server. You can read about configuring and running your server in our documentation. sql-server
has the ability to clone and push just like on the CLI, but with the use of SQL stored procedures. When you run a sql-server
instance and connect to it, then attempt to clone with the following SQL:
CALL DOLT_CLONE('timsehn/media_wiki');
The sql-server
instance will recognize that timsehn/media_wiki
is a shorthand for Dolthub.com, and when it clones the database, it will use the token that it would have used if you had simply used the CLI dolt clone
command.
Danger! sql-server
will use your tokens for anyone who can connect to the process. If you run a sql-server
instance on any host that has credentials created by dolt login
, you are effectively giving your users your keys. Well, the ability to use the keys even if they can't see them directly.
There are legitimate situations where having a sql-server
as a go between makes sense. There are some Dolt users who run a single user sql-server
locally so that their custom application has all the capabilities of a SQL server at their disposal. Only after they are done with their local changes do they push to a common shared database for review.
sql-server
and Hosted Instances
The second type of Authentication is Username/Password against a running instance of dolt sql-server
. Everything said here goes for Hosted Dolt Instances, too. Hosted instances are simply instances of dolt sql-server
that we manage on your behalf. You are welcome to run a sql-server
locally, but we'll discuss here like it's always running on another host.
dolt sql-server
, similar to MySQL, has the ability to track users and their access permissions. As an example, if you connect to your server and run the following:
CREATE USER 'alice'@'%' IDENTIFIED BY 'ApplesAndBananas';
A new user will be created. alice
will be able to connect to the host and provide ApplesAndBananas
as their password. They can't really do anything because they have no permissions yet. If we just want to make Alice super powerful, we can grant all:
GRANT ALL PRIVILEGES ON mydb.* TO 'alice'@'%';
When using Dolt's SQL shell, Alice connects using a command like the following:
$ dolt --host "yourdb.hosted.doltdb.com" --user "alice" --password "ApplesAndBananas" sql
This will open a TLS connection to the host and connect Alice to the host. This is probably the simplest model of them all. Especially since there isn't really any ability to push and pull to another source. You can add remotes, but it's only possible to use remotes which require no Authentication (eg, World readable). In this model, you are using all the branching and merging features of Dolt, but they are contained entirely within the server instance. A perfectly reasonable approach - it's what we did before we could push and pull!
What if you need to push and pull though? To push and pull from the sql-server
instance to DoltHub/DoltLab requires you to start the instance on a host where you have credentials setup. We covered this above. To push and pull to another sql-server
instance, using the first as the go between requires a few steps.
First, consider you have two hosts running dolt sql-server
. The first is yourdb
and the second is theirdb
. As just described you connect to yourdb
with the userid and password, just like you would a regular MySQL server. The trick to getting yourdb
to talk to theirdb
is you need to start yourdb
with a special environment variable which specifies the password the process will use to connect to theirdb
$ export DOLT_REMOTE_PASSWORD=<pwd to theirdb>
The userid that will be used to connect to theirdb is the same userid as you connected to yourdb
. Currently there is no way to override that, but it's a very reasonable request. File an feature request if you'd like that. We go into more detail on how to set up this scenario here.
Danger! The same risk as above exists here. If you have multiple people connecting to yourdb
, and they have the permissions to call dolt_push()
, then when they push the password they will be using is the one set with the DOLT_REMOTE_PASSWORD
environment variable.
Similar to above, there are legitimate use cases to do this. Running a sql-server
instance locally that needs to push to another sql-server
is reasonable, but take care to ensure that yourdb
is yours alone - not a multi-user server.
Local sql-server
There is a final way to access a Dolt server. When you have access to the data files managed by Dolt, you effectively have super user access. If you attempt to connect to your sql-server
instance while in the directory of your data, you will automatically connect with a super user. This directly relates to how we expect people to use Dolt servers to talk to other servers. The concerns about yourdb
being yours alone is pretty easy when you are using an instance on your local machine. It fits with the single user approach because there is only one user with access to the token or password to talk to outside instances.
To pull it all together, to push/pull to DoltHub.com and DoltLab instances, use dolt login
to create a token. To push/pull to a dolt sql-server
or Hosted instance, use the DOLT_REMOTE_PASSWORD to specify your password. You will get best results if you use a local sql-server
where you are the sole user.
Conclusion
Dolt is a little different from other databases. Your data doesn't just sit in one machine—it can be pushed and pulled to other locations. Our Authentication story to do so is simple and has room for expansion. We like building things for real users with real problems, so come to our Discord server and tell us what you'd like to see!