DoltHub and DoltLab now support SQL uploads

July 27, 2022

5 min read

Today we're excited to announce that DoltHub and DoltLab now support .sql file uploads. Prior to this change, users could only upload a .csv, .psv, .xlsx, or .json file using DoltHub and DoltLab's web UI.

Although these are quite common file formats for uploading data on the web, these formats only allow for single table creation or single table modification on a database branch. This means that if a user wants to upload data for three different tables in her DoltHub or DoltLab database, she'd have to upload three separate files, table1.csv, table2.csv, and table3.csv.

However, with the .sql file format now supported, users are no longer limited by this one-file one-table restriction. Instead, they can make changes to all tables on a branch with just a single file!

Users also gain the ability to execute powerful SQL statements on the web against their DoltHub and DoltLab databases. SQL commands that were formerly reserved for the Dolt command line, like adding table triggers with CREATE TRIGGER or checking column constraints with CONSTRAINT CHECK, can now be performed on the web by including these statements in an uploaded .sql file.

Let's look at a couple examples to see this new feature in action.

Branch-wide Changes

When uploading a .sql file to a DoltHub or DoltLab database, each of the SQL statements in the file is executed against the selected database branch. Any changes made to the branch get committed and pushed to either a pull request branch, or to the main branch of the database.

Uploading a .sql file on DoltHub or DoltLab, then, is essentially analogous to piping a .sql file into a Dolt database on the command line with:

dolt sql < example.sql

To see this in action, let's create a new database on DoltHub, called sql-example.

Under the "Get Started" section of our new database, let's click "File Upload" so we can add tables and data to our new database using a .sql file.

Since our new database does not yet have a branch, a main branch will be created and our changes will be committed and pushed to it.

Next, we'll select the file we want to upload.

Let's choose example.sql that contains the following:

CREATE TABLE t1 (
    pk INT NOT NULL,
    c1 VARCHAR(255),
    c2 VARCHAR(255),
    c3 VARCHAR(255),
    PRIMARY KEY (pk)
);
CREATE TABLE t2 (
    pk INT NOT NULL,
    c1 VARCHAR(255),
    c2 VARCHAR(255),
    c3 VARCHAR(255),
    PRIMARY KEY (pk)
);
CREATE TABLE t3 (
    pk INT NOT NULL,
    c1 VARCHAR(255),
    c2 VARCHAR(255),
    c3 VARCHAR(255),
    PRIMARY KEY (pk)
);
INSERT INTO t1 (pk, c1, c2, c3) VALUES (1, 'foo', 'bar', 'baz');
INSERT INTO t2 (pk, c1, c2, c3) VALUES (1, 'aaa', 'bbb', 'ccc');
INSERT INTO t3 (pk, c1, c2, c3) VALUES (1, 'homer', 'jay', 'simpson');

This file will create three tables in our database and insert a single row into each table.

We can see the diff of our .sql upload in the Review step and our changes look good. Finally, we can complete our upload and commit these changes.

After a successful .sql upload, we see that our database is no longer empty, and we successfully made branch-wide changes to our database with a single .sql file instead of uploading three separate files using a different file format.

Let's trying uploading a different .sql file called update-example.sql that contains:

DROP TABLE t2;
INSERT INTO t1 (pk, c1, c2, c3) VALUES (2, 'hot', 'cross', 'buns');
UPDATE t3 SET c1 = 'marge', c2 = 'kay' WHERE pk = 1;

This file should drop table t2, insert a new row into t1, and update a two cells in t3.

And just like before, the diff shows us that all three tables are changed correctly and a new pull request is created against main with these proposed changes.

After merging the pull request, we can see our main branch is up to date.

Now let's try adding a check constraint to table t3 and an insert trigger to table t1. We can upload a file called constraints_triggers.sql that contains the following SQL statements:

ALTER TABLE t3 ADD CONSTRAINT simpsons_only CHECK (c3 = 'simpson');
CREATE TRIGGER add_foo_c3 BEFORE INSERT ON t1 FOR EACH ROW SET NEW.c3 = CONCAT(NEW.c3, 'foo');

After uploading this file and merging the changes into our main branch, we expect table t3 to only allow inserts of "simpson" into column c3. We also expect inserts of any value into c3 on table t1 to get "foo" appended to it.

We can see the diff of our changes in the image below which creates a dolt_schemas system table and includes the trigger we created. The check constraint is added as well, but does not produce a diff.

With this change merged, let's try our new insert trigger on table t1. We can run INSERT INTO t1 (pk, c1, c2, c3) VALUES (3, 'alpha', 'beta', 'gamma');, using DoltHub or DoltLab's SQL console.

This statement will insert a new row and the add_foo_c3 trigger should append the word "foo" to the word "gamma". After running the query and inspecting the pull request diff, we can see that our trigger worked perfectly.

Finally, let's test the check constraint we added on t3. Using the SQL console once again, we are able to successfully insert a new row, adding 'lisa', 'may', 'simpson' to the table.

However, when we attempt to insert 'principal', 'seymour', 'skinner', ('skinner' in column c3) we get the constraint violation error, as we expect, demonstrating our check constraint is in place.

Limitations and Future Work

Currently, .sql uploads on DoltHub and DoltLab are limited to .sql files < 10MB, with accumulated writes limited to < 64MB. Although these allowed maximums are pretty small, they're temporary.

We are in the process of scaling our infrastructure to support arbitrarily large .sql file uploads on DoltHub and DoltLab, but for now, we need to prevent memory exhaustion in our API while we continue the scaling work behind the scenes.

We're also really excited about adding support for .sql file uploads because it's the first step in allowing users to easily version their production database snapshots on DoltHub and DoltLab.

For users who want to continue using their legacy databases but also want to harness the versioning power of Dolt, .sql file uploads are the first step in allowing them to do so.

These users can create SQL snapshots of their legacy databases, then upload those snapshots to a DoltHub or DoltLab database. Doing this provides them a simple way to track changes occurring in their legacy database and provides excellent diffs between snapshots.

We're also working to make this process fully automated so that versioned snapshots can simply be scheduled by users. If this interests you, stay tuned for "Dolt your snapshots" updates 🤠.

Got questions, comments, or want more information about our products?

Checkout DoltHub, DoltLab, Hosted Dolt (https://hosted.doltdb.com/), and Dolt.

Also, don't hesitate to contact us here or on Discord.

Blog

Branch-wide Changes

Limitations and Future Work

Get started with Dolt