Introducing Column Defaults

8 min read

Dolt is a SQL database with Git-style versioning. With each new version of Dolt, we increase the number of supported SQL features, moving toward our goal of being a complete drop-in replacement for MySQL, while adding all of the versioning features you know and love from Git applied to a database, such as branching, diffs, merging, etc. In our latest release, we added support for default values on columns, and in this blog post I'll briefly go over what they are.

What Are Column Defaults?

Default values on columns are either a literal value or an expression that evaluates to a value, which are then inserted when a column is omitted on an INSERT statement. Any columns that are defined without a default value, but allow NULL values, have an implicit default expression of (NULL). Any columns that do not allow NULL values will error if a valid value is omitted.

Using Defaults

You can download Dolt and follow along!

$ dolt init
Successfully initialized dolt data repository.

$ dolt sql <<SQL
CREATE TABLE test(
  pk BIGINT PRIMARY KEY,
  v1 BIGINT DEFAULT 1,
  v2 BIGINT DEFAULT (pk + v1)
);
INSERT INTO test (pk) VALUES (1);
INSERT INTO test (pk, v1) VALUES (2, 5);
SQL
Rows inserted: 2 Rows updated: 0 Rows deleted: 0

$ dolt sql -q "SELECT * FROM test"
+----+----+----+
| pk | v1 | v2 |
+----+----+----+
| 1  | 1  | 2  |
| 2  | 5  | 7  |
+----+----+----+

In the first INSERT statement, we only assign to the column pk, which then causes the defaults for both v1 and v2 to evaluate. On the second INSERT statement, we assign to pk and v1, meaning only v2's default value evaluates.

Default Literals

Just like MySQL, we make a distinction between literal values and expressions, which abide by a slightly different ruleset. The validity of a literal value on the defined column is evaluated at table creation. On a column that has the type INT UNSIGNED, a default literal of -1 will immediately error. The type of the default literal does not have to match the defined column, as long as it can be implicitly converted to the type. For example, a column that has type SMALLINT may have a default literal of "16", as the string represents a number. Of note, columns deriving from one of the TEXT or BLOB types (such as LONGTEXT and MEDIUMBLOB) are disallowed from having a literal default value.

Default Expressions

Expressions, on the other hand, are allowed on TEXT and BLOB columns. Expressions may contain literals, built-in functions, columns, and operators. Anything that is wrapped with a set of parentheses is automatically an expression. This means that, although DEFAULT "hi" is invalid on a TEXT column, DEFAULT ("hi") is valid. While there isn't a technical limitation to this decision on our end, we chose to abide by this restriction in favor of round-trip compatibility with MySQL. Although I just mentioned that anything wrapped in a set of parentheses is an expression, it is required that all default expressions are wrapped in a set of parentheses. For example, DEFAULT 2 + 5 is invalid as the expression 2 + 5 must be wrapped in parentheses (DEFAULT (2 + 5) is valid). The only exceptions are the NOW and CURRENT_TIMESTAMP functions, which are also only allowed on columns of type TIMESTAMP and DATETIME.

The validity of expressions are evaluated at insertion time, which means that an INSERT may fail if the expression evaluates to an invalid value for that column. As a result, it is possible to write any number of expression default values that are guaranteed to fail. For example, DEFAULT NULL will immediately error on any columns that are declared with NOT NULL, however DEFAULT (NULL) will succeed, even though an error will occur every time the default value is used. Care should be used when using any non-deterministic functions, to ensure that all insertions will succeed.

Referencing Other Columns

It should be noted that referencing columns within a default expression has a few limitations. Each insertion has two passes, with the first assigning all given values and default literals, and the second evaluating all default expressions in column order. As a result, it is illegal for a default expression to reference another column that comes after it if that column also has a default expression, as the value for that column may not have been assigned yet on any given insertion. To visualize this, let's use two sample tables:

CREATE TABLE invalid_sample (
  v1 BIGINT DEFAULT (v2),
  v2 BIGINT DEFAULT (RAND() * 10)
);
CREATE TABLE valid_sample_1 (
  v1 BIGINT DEFAULT (v2),
  v2 BIGINT DEFAULT 0
);
CREATE TABLE valid_sample_2 (
  v1 BIGINT DEFAULT (RAND() * 10),
  v2 BIGINT DEFAULT (v1)
);

For the table invalid_sample, v1 has a default expression that references v2. However since v2 also has a default expression, it is possible that when evaluating v1's default expression that v2 will be in an unassigned state, resulting in an error at table creation. For the table valid_sample_1, v1 still references v2. v2 has a literal has a default value, and thus will always be assigned during the first pass, ensuring that v1's expression will always have a value to reference. For the table valid_sample_2, v2 references v1. Although v1 has an expression for the default value, v1 is declared before v2, thus it will always evaluate before v2.

These limitations also affect statements that modify the column order. If we were to run:

ALTER TABLE valid_sample_2 MODIFY COLUMN v1 BIGINT DEFAULT (RAND() * 10) AFTER v2;

Then we would receive an error, as the column referenced in v2's default expression now comes after it.

Defaults And Versioning

As mentioned earlier, in addition to being a drop-in replacement for a MySQL server, Dolt is also a versioned database. Default values work just like all other values, in that any data that is inserted into any column either manually or from the default may be diffed across commits, or merged from other commits. As Dolt also versions the schema, we can modify the default value and track those changes through commits with diffs!

Let's walk through an example of using Dolt with our new column defaults. First we'll create a table that has a single non-primary-key column that defaults to the negative value of the primary key. Then we'll insert some data.

$ dolt init
  Successfully initialized dolt data repository.

$ dolt sql <<SQL
CREATE TABLE version_sample (
  pk BIGINT PRIMARY KEY,
  v1 BIGINT DEFAULT (-pk)
);
INSERT INTO version_sample (pk) VALUES (1), (2), (3);
SQL
Rows inserted: 3 Rows updated: 0 Rows deleted: 0

$ dolt sql -q "SELECT * FROM version_sample"
  +----+----+
  | pk | v1 |
  +----+----+
  | 1  | -1 |
  | 2  | -2 |
  | 3  | -3 |
  +----+----+

Next, let's add these changes and commit the result to our branch (which we default to master). We'll then switch over to a new branch that we're calling new_branch.

$ dolt add -A

$ dolt commit -m "Created table and inserted data"
  commit 4c60qv84n63dg2s5pog427uv153c4qe2
  Author: Daylon Wilkins <daylon@liquidata.co>
  Date:   Fri Sep 11 12:00:00 -0700 2020
  
          Created table and inserted data


$ dolt checkout -b new_branch
  Switched to branch 'new_branch'

Let's modify the default value to become the primary key plus 10. Then we'll insert a few more rows, allowing the default value to do its thing, and commit that result.

$ dolt sql <<SQL
ALTER TABLE version_sample MODIFY COLUMN v1 BIGINT DEFAULT (pk + 10);
INSERT INTO version_sample (pk) VALUES (6), (7);
SQL
Rows inserted: 2 Rows updated: 0 Rows deleted: 0

$ dolt add -A

$ dolt commit -m "Changed default value and added data"
  commit vakup16rmmjdqt6afnq2v6umg0rhk985
  Author: Daylon Wilkins <daylon@liquidata.co>
  Date:   Fri Sep 11 12:00:10 -0700 2020
  
          Changed default value and added data

Let's switch back to master and add a few more rows. Remember that this branch still has the default expression (-pk). We'll then commit that result.

$ dolt checkout master
  Switched to branch 'master'

$ dolt sql -q "INSERT INTO version_sample (pk) VALUES (4), (5)"
  Query OK, 2 rows affected

$ dolt add -A

$ dolt commit -m "Added data to the master branch"
  commit b0oqusf59t78hirmq6h7c8lejge3nhho
  Author: Daylon Wilkins <daylon@liquidata.co>
  Date:   Fri Sep 11 12:00:20 -0700 2020
  
          Added data to the master branch

Let's go back to our new branch and look at the diff. You can see that we've updated the default expression, and we can also see the difference in rows!

$ dolt checkout new_branch
  Switched to branch 'new_branch'

$ dolt diff master
  diff --dolt a/version_sample b/version_sample
  --- a/version_sample @ d3oh6eif4ie5uci7pkoaihdj5b17ujsr
  +++ b/version_sample @ cst2kj941p2v2oo33gg5vi1ka0f514iv
    CREATE TABLE version_sample (
      `pk` BIGINT NOT NULL COMMENT 'tag:3108'
  <   `v1` BIGINT DEFAULT (-pk)
  >   `v1` BIGINT DEFAULT (pk + 10)
       PRIMARY KEY (pk)
    );
  
  +-----+----+----+
  |     | pk | v1 |
  +-----+----+----+
  |  -  | 4  | -4 |
  |  -  | 5  | -5 |
  |  +  | 6  | 16 |
  |  +  | 7  | 17 |
  +-----+----+----+

As our last exercise, we'll switch back to master and merge in our changes from the new branch.

$ dolt checkout master
  Switched to branch 'master'

$ dolt merge new_branch
  Updating b0oqusf59t78hirmq6h7c8lejge3nhho..vakup16rmmjdqt6afnq2v6umg0rhk985
  version_sample | 2 ++
  1 tables changed, 2 rows added(+), 0 rows modified(*), 0 rows deleted(-)

$ dolt schema show
  version_sample @ working
  CREATE TABLE `version_sample` (
    `pk` BIGINT NOT NULL COMMENT 'tag:3108',
    `v1` BIGINT DEFAULT (pk + 10) COMMENT 'tag:14825',
    PRIMARY KEY (`pk`)
  );

$ dolt sql -q "SELECT * FROM version_sample"
  +----+----+
  | pk | v1 |
  +----+----+
  | 1  | -1 |
  | 2  | -2 |
  | 3  | -3 |
  | 4  | -4 |
  | 5  | -5 |
  | 6  | 16 |
  | 7  | 17 |
  +----+----+

Just like that, we've merged our changes! Super simple, and very powerful. Same as earlier, you can download Dolt and try out everything that you see here!

Conclusion

Default values are a powerful way of producing values for your data. There are many cases where columns are implemented with defaults specifically to keep track of items such as the time that a row was inserted. If you're interested in using default values, then you may be interested in triggers as well, which are coming soon! You can stay up-to-date on our progress by following our releases, and you can directly interact with us by joining our Discord server. We hope you'll join us for the ride!

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt