Dolt Now Supports Check Constraints

4 min read

Dolt is an SQL database with Git versioning. We have come a long way since initially committing to 100% MySQL compatibility, and today we introduce our latest step in that journey: check constraints.

What Are Check Constraints?

Check constraints are a data validation mechanism for table rows introduced formally to MySQL in version 8.0.16.

Like triggers, check constraints execute logic on row inserts and updates. And like foreign keys, checks verify the correctness of rows and can short-circuit persistence if a constraint fails.

Check constraints are different and useful because they contain arbitrary expression logic, whereas foreign keys verify only presence in another table.

Simple Examples

Positive Values

Consider a data source of blood pressure readings. We can use check constraints to limit the input range of values:

$ dolt sql -q "
     CREATE TABLE bp (
         patient_id int,
         systole int,
         diastole int,
         CHECK (systole > 0),
         CHECK (diastole > 0)
     )"

Adding normal values succeeds:

$ dolt sql -q "INSERT INTO bp VALUES (1, 120, 80)"
Query OK, 1 row affected

Negative values trigger a check constraint:

$ dolt sql -q "INSERT INTO bp VALUES (2, -120, 80)"
Check constraint "chk_62j1d75m" violated

Only valid rows remain after these two inserts:

$ dolt sql -q "SELECT * from bp"
+------------+---------+----------+
| patient_id | systole | diastole |
+------------+---------+----------+
| 1          | 120     | 80       |
+------------+---------+----------+

Categorization

Checks are equally useful in scenarios with a distinct set of categories.

Consider a table recording the day of week a patient's blood pressure was most-recently taken (we will intentionally initialize the table without checks):

$ dolt sql -q "
     CREATE TABLE bp_day (
         patient_id int,
         day int
     )"

Our analytics team assumes day is a categorical feature. They use array indexing to convert between numeric and string representations in a data pipeline:

num_to_day = ["Sun, "Mon", "Tues", "Wed", "Thurs", "Fri", "Sat"]

A physician trying to record "Sunday" with day=7 will create a bug for our data team:

$ dolt sql -q "INSERT INTO bp_day VALUES (1, 7)"
Query OK, 1 row affected

num_to_day[7] will IndexOutOfRangeException and crash our code.

Instead of running a batch job to backfill our database, or changing the processing code to check for data inconsistencies, we can alter our table with a check constraint:

$ dolt sql -q "DELETE FROM bp_day WHERE patient_id = 1"
Query OK, 1 row affected
$ dolt sql -q "ALTER TABLE bp_day ADD CHECK (day in (0, 1, 2, 3, 4, 5, 6))"
$ dolt sql -q "INSERT INTO bp_day VALUES (1, 7)"
Check constraint "chk_ko21oh4v" violated

Our database can only store valid rows now, and our workflows will be more reliable in the future.

Limitations

Most simple expressions are valid check constraints, with two notable limitations:

  1. Checks can only reference columns in the table they are

defined in:

$ dolt sql -q "
     CREATE TABLE bp_week (
         patient_id int,
         week int,
         bp.systole < 200
     )"
table "bp" does not have column "systole"
  1. Checks cannot include subexpressions or SQL functions:
$ dolt sql -q "ALTER TABLE bp ADD CHECK (AVG(systole) < 150)"
Invalid constraint expression, functions not supported: AVG(bp.systole)

There are a host of implementation details in the MySQL docs. Multiple ways to declare constraints, options for ignoring errors, and other comments on how to use checks are included. .

Implementation Challenges

Adding check constraints required changes to Dolt's data storage layer, SQL query engine and SQL parser.

One interesting technical detail involves bridging the persistence layer, where constraints are serialized, and the query engine, where constraints are executed.

The first implementation converted all check constraints to the ALTER TABLE format used above in the category example. The alter string was stored in next to the associated table in Dolt, and unwound when the query engine needed to validate rows.

An improved implementation saves only the expression string. Here we embed and parse the expression from a select statement (code simplified):

func convertCheckDefToConstraint(ctx *sql.Context, check *sql.CheckDefinition) (*sql.CheckConstraint, error) {
	parseStr := fmt.Sprintf("select %s", check.CheckExpression)
	parsed, err := sqlparser.Parse(parseStr)
	selectStmt, ok := parsed.(*sqlparser.Select)
	expr := selectStmt.SelectExprs[0]
	c, err := parse.ExprToExpression(ctx, expr)
	return &sql.CheckConstraint{
		Name:     check.Name,
		Expr:     c,
		Enforced: check.Enforced,
	}, nil
}

Versioning Concerns

Dolt versions data in addition to being an SQL database. The complexity of combining these two paradigms often complicates the story of 1-to-1 porting MySQL features, and check constraints are no exception.

First, dolt merge does not execute check constraints for every row. Second, dolt verify-constraints does not yet invoke check constraints.

Future Work

Alter check statements but do not currently update information_schema and SHOW CREATE TABLE. This affects constraint visibility in multiple ways, including dolt diff not showing which version of a check statement is active. dolt merge also chooses the most recent between two constraints with the same name. When schema metadata is updated to reflect check constraints, checks will behave the same way as other table schemas fields in versioning commands. Fixes are currently in progress, and the current behavior does not negatively affect constraint checks themselves.

Summary

It took MySQL 15 years, and now Dolt too supports check constraints! It took us about one month to bring this customer feature request into production.

Check constraints will expand Dolt's ability to natively validate data, an area of growing interest in the data engineering community.

If you are interested in learning more about Dolt reach out to us on our discord! We would love to hear about your experiences with data versioning.

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt