SQL

6 min read

Dolt is a SQL database with Git-style version control. Unlike traditional databases, Dolt lets you branch, merge, clone, and diff your data, just like source code!

As the result of this dual nature, we often need to create interfaces that work on both SQL and CLI fronts, especially to provide alternatives and keep workflows consistent. This begs the question: how do you build features that work identically across two different paradigms?

Well, most recently, I faced this challenge when importing our existing --skinny flag and adding a new --include-cols option for our dolt diff command and DOLT_DIFF() SQL table function.

# CLI command
dolt diff --skinny HEAD~1 HEAD table_name

# SQL function
SELECT * FROM DOLT_DIFF ('--skinny', 'HEAD~1', 'HEAD', 'table_name');

To provide a TL;DR, these options filter diff output for wide tables to show only relevant columns. --skinny omits unchanged columns while --include-cols lets you explicitly add columns to the output for any reason. You can read more about these focused diffs in my previous blog post.

Here, we’ll more explore some helpful patterns to follow to allow easy sharing of such features between our CLI and SQL interfaces in Dolt.

Building the Bridge#

Before we implemented the --skinny feature, we needed to parse the option string. In the case of our CLI interface the option already exists. So, it might seem like a good idea to just copy the parsing logic into the SQL function. But things can easily diverge, and in the case of the CLI code we see this first hand:

func (cmd DiffCmd) ArgParser() *argparser.ArgParser {
	// Line 155
    ap := argparser.NewArgParserWithVariableArgs(cmd.Name())
    ap.SupportsFlag(DataFlag, "d", "Show only the data changes, do not show the schema changes (Both shown by default).")
    ap.SupportsFlag(SchemaFlag, "s", "Show only the schema changes, do not show the data changes (Both shown by default).")
    ap.SupportsFlag(StatFlag, "", "Show stats of data changes")
    ap.SupportsFlag(SummaryFlag, "", "Show summary of data and schema changes")
    ap.SupportsString(FormatFlag, "r", "result output format", "How to format diff output. Valid values are tabular, sql, json. Defaults to tabular.")
    ap.SupportsString(whereParam, "", "column", "filters columns based on values in the diff.  See {{.EmphasisLeft}}dolt diff --help{{.EmphasisRight}} for details.")
    ap.SupportsInt(limitParam, "", "record_count", "limits to the first N diffs.")
    ap.SupportsFlag(cli.StagedFlag, "", "Show only the staged data changes.")
    ap.SupportsFlag(cli.CachedFlag, "c", "Synonym for --staged")
    ap.SupportsFlag(SkinnyFlag, "sk", "Shows only primary key columns and any columns with data changes.")
    ap.SupportsFlag(MergeBase, "", "Uses merge base of the first commit and second commit (or HEAD if not supplied) as the first commit")
    ap.SupportsString(DiffMode, "", "diff mode", "Determines how to display modified rows with tabular output. Valid values are row, line, in-place, context. Defaults to context.")
    ap.SupportsFlag(ReverseFlag, "R", "Reverses the direction of the diff.")
    ap.SupportsFlag(NameOnlyFlag, "", "Only shows table names.")
    ap.SupportsFlag(cli.SystemFlag, "", "Show system tables in addition to user tables")
    return ap
}

This uses our internal argparser.ArgParser library to define what flags the command supports, validate user input, and extract parsed values. The SkinnyFlag is defined here along with all the other diff options, but this parser is tightly coupled to the CLI command structure.

To avoid duplicating this logic, we can instead extract the relevant parts into a shared function. In fact, an existing file serves this exact purpose, arg_parser_helpers.go:

func CreateDiffArgParser(isTableFunction bool) *argparser.ArgParser {
	ap := argparser.NewArgParserWithVariableArgs("diff")
	ap.SupportsFlag(SkinnyFlag, "sk", "Shows only primary key columns and any columns with data changes.")
	if !isTableFunction { // TODO: support for table function
		ap.SupportsFlag(DataFlag, "d", "Show only the data changes, do not show the schema changes (Both shown by default).")
		ap.SupportsFlag(SchemaFlag, "s", "Show only the schema changes, do not show the data changes (Both shown by default).")
		// ...other flags & options
	}
	return ap
}

Here we can define a function to create a parser for diff options in general for both CLI and SQL. We take in a boolean to indicate whether we’re building the parser for a table function (SQL) or not (CLI). This speeds up development and communicates intent for each interface for future maintainers, we only need to implement --skinny for SQL right now.

Now we can update our CLI command to use this shared parser, along with the SQL table function:

func (cmd DiffCmd) ArgParser() *argparser.ArgParser {
    return cli.CreateDiffArgParser(false)
}

func (dtf *DiffTableFunction) WithExpressions(expressions ...sql.Expression) (sql.Node, error) {
	newDtf := *dtf
	// TODO: For now, we will only support literal / fully-resolved arguments to the
	//       DiffTableFunction to avoid issues where the schema is needed in the analyzer
	//       before the arguments could be resolved.
	for _, expr := range expression {
	var exprStrs []string
	strToExpr := map[string]sql.Expression{}
	for _, expr := range expressions {
		if !expr.Resolved() {
			return nil, ErrInvalidNonLiteralArgument.New(dtf.Name(), expr.String())
		}
		// prepared statements resolve functions beforehand, so above check fails
		if _, ok := expr.(sql.FunctionExpression); ok {
			return nil, ErrInvalidNonLiteralArgument.New(dtf.Name(), expr.String())
		}
		strVal := expr.String()
		if lit, ok := expr.(*expression.Literal); ok { // rm quotes from string literals
			strVal = fmt.Sprintf("%v", lit.Value())
		}
		exprStrs = append(exprStrs, strVal) // args extracted from apr later to filter out options
		strToExpr[strVal] = expr
	}

The SQL implementation is more involved since we need to convert SQL expressions into strings to parse them. However, this extra work is worth it to once more tightly couple the parsing logic.

Any fields can now be set through the arg parser results objects in either interface:

if apr.Contains(cli.SkinnyFlag) {
    newDtf.showSkinny = true
}

Implementing the Feature#

Now that we have a shared parser, we can implement the actual --skinny feature. Once more, it might seem tempting to copy the logic from the CLI command into the SQL function. However, the execution contexts are fundamentally different.

The CLI implementation operates on raw diff output, filtering columns after the diff is computed:

if dArgs.skinny {
    modifiedColNames, err = getModifiedCols(sqlCtx, rowIter, unionSch, sch)
    if err != nil {
        return errhand.BuildDError("Error running diff query:\n%s", interpolatedQuery).AddCause(err).Build()
    }
    for _, col := range dArgs.includeCols {
        modifiedColNames[col] = true // ensure included columns are always present
    }
    // instantiate a new schema that only contains the columns with changes
    var filteredUnionSch sql.Schema
    for _, s := range unionSch {
        for colName := range modifiedColNames {
            if s.Name == colName {
                filteredUnionSch = append(filteredUnionSch, s)
            }
        }
    }
}

The SQL table function, however, must return a schema upfront—before any rows are generated. Here we pre-compute which columns actually changed during schema generation:

if dtf.showSkinny {
    skDelta, err := dtf.filterDeltaSchemaToSkinnyCols(ctx, &delta)
    if err != nil {
        return err
    }
    delta = *skDelta
}

diffTableSch, j, err := dtables.GetDiffTableSchemaAndJoiner(format, delta.FromSch, delta.ToSch)
if err != nil {
    return err
}

Both implementations achieve the same result, showing only changed columns, but due to the different functions and utilization in other system tables we decide to use different implementations for now. We have an open issue for this if you’d like a shot at coalescing everything!

Testing#

Even though our solutions diverged, we still want to ensure the same behavior when it comes to the columns shown. Luckily, we can make use of agents to create complicated regular expressions. By providing output from both the CLI and SQL table we can quickly generate helper functions in bash to extract useful metadata, i.e., column names, counts, etc.

_cli_header_cols() {
    awk '
        /^\s*\|\s*[-+<>]\s*\|/ && last_header != "" { print last_header; exit }
        /^\s*\|/ { last_header = $0 }
    ' <<<"$1" \
      | tr '|' '\n' \
      | sed -e 's/^[[:space:]]*//;s/[[:space:]]*$//' \
      | grep -v -E '^(<|>|)$' \
      | grep -v '^$'
}

Conclusion#

This approach can be applied to other Dolt system tables that span CLI and SQL interfaces. If you’re looking to create a similar contribution, following this guide to get a quick start on adding new options or fitting such interfaces together. For Dolt users, this means a consistent experience whether you prefer command-line workflows or SQL queries.

Ready to try Dolt’s unified CLI and SQL interfaces? Download Dolt and explore how the other version control features work across both interfaces. Have more questions about contributing or adding new features? Join us on Discord to discuss with the community.

Blog

When CLI Meets SQL: Building Unified Interfaces in Dolt

Building the Bridge#

Implementing the Feature#

Testing#

Conclusion#

Get started with Dolt