Writing zsh completions for CLIs with subcommands

November 15, 2021

11 min read

Dolt is a SQL database with git-like version control capabilities, the only SQL database you can branch and merge, clone and fork, push and pull like a git repository. Very early in the development lifecycle, we committed to copying git's workflow and CLI exactly, so that anyone who knew how to use git would already know how to use dolt. This is great for users, but it means that the dolt CLI very quickly grew to have all the surface area of git, plus some additional commands. There are now 46 commands in the dolt CLI, many of which take their own set of option flags and arguments. It's a lot to keep track of! Enough that I decided I needed a little help from my shell.

Like many software engineers, I use zsh (which is also now the default login shell on OS X). It features a very robust, customizable completion system. A while back we'd gotten a customer request to generate shell completions for zsh and other shells, and I decided to tackle it. How hard could it be, right?

Getting started with zsh completion scripts

Unfortunately, it was a lot harder than I thought. The zsh completion system is famously dense, a classic beginner's graveyard of partial answers, conflicting information, bad examples and dead ends. Starting was easy: I was able to get a proof of concept working very quickly. It looked like this:

#compdef _dolt dolt

_dolt() {
    local line state

    _arguments -C \
               "1: :->cmds" \
               "*::arg:->args"

    case "$state" in
        cmds)
            _values "dolt command" \
                    "add[add a table to the staging area]" \
                    "table[table commands]"
            ;;
        args)
            case $line[1] in
                add)
                    _add_cmd
                    ;;
                table)
                    _table_cmd
                    ;;
            esac
            ;;
    esac
}

_add_cmd() {
    _arguments '(--addarg)--addarg[a value]'
}

_table_cmd() {
    _arguments '(--tablearg)--tablearg[a value]' \
               '(--tablearg2)--tablearg2[a value]'
}

This was just a skeleton for the add and table commands to prove I was on the right track. It installs a completion function for the dolt command, with two supported commands: add and table. Each of those commands has a couple arguments that can be provided.

This is very simple as far as zsh completion scripts go. I tested it out, and it worked well enough. So far so good. But my next task was to get subcommands working, and that's where I hit a brick wall.

zsh completion for subcommands

Many dolt commands have subcommands. For example, the table command has a bunch of commands related to tables you can run.

% dolt table --help
Valid commands for dolt table are
              import - Creates, overwrites, replaces, or updates a table from the data in a file.
              export - Export a table to a file.
                  rm - Deletes a table
                  mv - Moves a table
                  cp - Copies a table

So when my terminal has % dolt table on the line, when I press tab I want it to suggest import, export, and so on.

Unfortunately, subcommands are not a "standard" feature of zsh completion. This isn't to say they are impossible -- far from it! There are very many ways one could implement subcommands using zsh's building blocks, but no "standard" way do so. And if you google it, you'll find 5 guides that offer 2 different methods each.

There are of course examples you can find, including the completion script for git itself. But at over 7,000 lines of fearsome complexity, it's not exactly beginner friendly. And I didn't want to spend more than a day on this, it was supposed to be a fun task before my next big deliverable! So I read all the tutorials, started experimenting, and tried to synthesize the info into something usable.

The standard place for zsh completion beginners is the zsh-completions github repo, which also has many examples of completion scripts. This was a good reference for getting the lay of the land, but didn't have much in the way of advice or examples for what I was trying to do. It did point me to a couple other tutorials though. I also found this stack overflow question which has a characteristically unhelpful answer with lots of upvotes.

Most of these answers all share the same thing in common: they want you to manipulate the $state variable to control execution of completions for the various sub commands from the main function. I'm 100% certain that this can be made to work, but I couldn't make it work in the few hours I played with it, and without concrete examples to guide me I decided to try another approach.

Finally I found this guide, which lays out a partial strategy to getting subcommands to work, and this tutorial, which contained the following piece of wisdom:

Define a function named _program that provides the default completions. For each sub-command that the program provides define a _program_sub_command function that provides completions for that sub-command. In my experience this makes the completion script pretty straight-forward to write.

Straight-forward sounded pretty good to me after the last few hours' failed experiments. This was the key insight that allowed me to put together a solution.

The solution

Here's the key to getting subcommands working in a zsh completion script without tearing your hair out: the top-level completion function and the sub-command completion functions are symmetric. They look the same.

For dolt, the full solution looks like the following. I'm illustrating only the completions for the init command, which has no subcommands, and the table command, which does have subcommands. For the table command, I've removed all but one of its subcommands for brevity.

_dolt() {
    local line state

    _arguments -C \
               "1: :->cmds" \
               "*::arg:->args"
    case "$state" in
        cmds)
            _values "dolt command" \
                    "init[Create an empty Dolt data repository.]" \
                    ...
                    "table[Commands for copying, renaming, deleting, and exporting tables.]" \
                    ...
            ;;
        args)
            case $line[1] in
                init)
                    _dolt_init
                    ;;
                table)
                    _dolt_table
                    ;;
                ...
            esac
            ;;
    esac
}

_dolt_init() {
    _arguments -s \
               '(--name)--name[The name used in commits to this repo. If not provided will be taken from user.name in the global config.]' \
               '(--email)--email[The email address used. If not provided will be taken from user.email in the global config.]' \
               '(--date)--date[Specify the date used in the initial commit. If not specified the current system time is used.]' \
               {-b,--initial-branch}'[The branch name used to initialize this database.]'
}

_dolt_table() {
    local line state

    _arguments -C \
               "1: :->cmds" \
               "*::arg:->args"
    case "$state" in
        cmds)
            _values "dolt_table command" \
                    "import[Creates, overwrites, replaces, or updates a table from the data in a file.]" \
                    ...
            ;;
        args)
            case $line[1] in
                import)
                    _dolt_table_import
                    ;;
                ...
            esac
            ;;
    esac
}

_dolt_table_import() {
    _arguments -s \
               {-c,--create-table}'[Create a new table, or overwrite an existing table (with the -f flag) from the imported data.]' \
               {-u,--update-table}'[Update an existing table with the imported data.]' \
               {-f,--force}'[If a create operation is being executed, data already exists in the destination, the force flag will allow the target to be overwritten.]' \
               {-r,--replace-table}'[Replace existing table with imported data while preserving the original schema.]' \
               '(--continue)--continue[Continue importing when row import errors are encountered.]' \
               {-s,--schema}'[The schema for the output data.]' \
               {-m,--map}'[A file that lays out how fields should be mapped from input data to output data.]' \
               {-pk,--pk}'[Explicitly define the name of the field in the schema which should be used as the primary key.]' \
               '(--file-type)--file-type[Explicitly define the type of the file if it can''t be inferred from the file extension.]' \
               '(--delim)--delim[Specify a delimiter for a csv style file with a non-comma delimiter.]'
}

Let's break it down. The key realization is that your top-level command and each subcommand look and work the same. They each begin with the same magic incantation:

    local line state

    _arguments -C \
               "1: :->cmds" \
               "*::arg:->args"

This means (essentially) the first command should be completed by setting $state to cmds, and each following one should be handled by setting it to args. Then lower down, we switch on the value of state:

    case "$state" in
        cmds)
            _values "dolt command" \
                    "init[Create an empty Dolt data repository.]" \
                    ...
                    "table[Commands for copying, renaming, deleting, and exporting tables.]" \
                    ...
            ;;
        args)
            case $line[1] in
                init)
                    _dolt_init
                    ;;
                table)
                    _dolt_table
                    ;;
                ...
            esac
            ;;

Each command gets completed with the name of the command and its help text in the cmd) case. Then in the args) case, the name of the comamnd chosen gets routed to a particular subcommand processor function. Let'slook at one in detail, the one that handles dolt table.

_dolt_table() {
    local line state

    _arguments -C \
               "1: :->cmds" \
               "*::arg:->args"
    case "$state" in
        cmds)
            _values "dolt_table command" \
                    "import[Creates, overwrites, replaces, or updates a table from the data in a file.]" \
                    ...
            ;;
        args)
            case $line[1] in
                import)
                    _dolt_table_import
                    ;;
                ...
            esac
            ;;
    esac
}

Note how similar this looks to the top-level _dolt function.. It starts with the same magic arguments -C incantation, then declares all the subcommands in the cmds) case and all of their handlers in the args) case.

Finally, we get to a leaf node, _dolt_table_import, which handles the argument completion for dolt table import:

_dolt_table_import() {
    _arguments -s \
               {-c,--create-table}'[Create a new table, or overwrite an existing table (with the -f flag) from the imported data.]' \
               {-u,--update-table}'[Update an existing table with the imported data.]' \
               {-f,--force}'[If a create operation is being executed, data already exists in the destination, the force flag will allow the target to be overwritten.]' \
               {-r,--replace-table}'[Replace existing table with imported data while preserving the original schema.]' \
               '(--continue)--continue[Continue importing when row import errors are encountered.]' \
               {-s,--schema}'[The schema for the output data.]' \
               {-m,--map}'[A file that lays out how fields should be mapped from input data to output data.]' \
               {-pk,--pk}'[Explicitly define the name of the field in the schema which should be used as the primary key.]' \
               '(--file-type)--file-type[Explicitly define the type of the file if it can''t be inferred from the file extension.]' \
               '(--delim)--delim[Specify a delimiter for a csv style file with a non-comma delimiter.]'
}

Because this is a terminal command (it has no subcommands), we just spit out some arguments to complete using the special argument syntax that zsh completions use (and it can get much more complicated than this).

And that's it! One of the nicest things about this structure (besides the fact that it's easy to read and understand) is that it lends itself really well to generating from source. Let's do that next.

Generating the entire file

Dolt is under active development, so it doesn't make sense to just write this completion script once and update it manually thereafter. What we want is a way to generate it from the dolt program every time it gets updated. To that end, I wrote a hidden dolt command with exactly this purpose.

% dolt gen-zsh

Let's look at what it does under the hood, slightly edited for brevity.

func (z GenZshCompCmd) dumpZsh(wr io.Writer, cmdStr string, subCommands []cli.Command) error {

	var subCmds []string
	var subArgs []string

	for _, sub := range subCommands {
		subCmds = append(subCmds, fmt.Sprintf(cmdValueFmt, sub.Name(), sub.Description()))
		subArgs = append(subArgs, fmt.Sprintf(argSwitchFmt, sub.Name(), fmt.Sprintf("%s_%s", cmdStr, sub.Name())))

		subCmdStr := fmt.Sprintf("%s_%s", cmdStr, sub.Name())

		if subCmdHandler, ok := sub.(cli.SubCommandHandler); ok {
			z.dumpZsh(wr, subCmdStr, subCmdHandler.Subcommands)
		} else {
			z.dumpZshLeaf(wr, subCmdStr, sub)
		}
	}

	functionStr := fmt.Sprintf(subCmdFmt, cmdStr, cmdStr, strings.Join(subCmds, lineJoiner), strings.Join(subArgs, ""))

	_, err := wr.Write([]byte(functionStr))
	return err
}

func (z GenZshCompCmd) dumpZshLeaf(wr io.Writer, cmdString string, command cli.Command) error {
	ap := command.ArgParser()
	var args []string
	if len(ap.Supported) > 0 {
		for _, opt := range ap.Supported {
			args = append(args, formatOption(opt))
        }

		_, err := wr.Write([]byte(fmt.Sprintf(leafCmdFmt, cmdString, strings.Join(args, lineJoiner))))
		return err
	}

	_, err := wr.Write([]byte(fmt.Sprintf(noOptCmdFmt, cmdString)))
	return err
}

First we call dumpZsh with the dolt command itself and all dolt's to-level subcommands. For each command, we first write completion definitions for any subcommands, then the definition for the command itself. leafCmdFmt, subCmdFmt and related constants define zsh completion templates that we simply Sprintf the relevant values into.

Final result

Now when I type dolt in my terminal and push tab, this is what I see:

% dolt
add            -- Add table changes to the list of staged table changes.
backup         -- Manage a set of server backups.
blame          -- Show what revision and author last modified each row of a table.
branch         -- Create, list, edit, delete branches.
checkout       -- Checkout a branch or overwrite a table from HEAD.
clone          -- Clone from a remote data repository.
commit         -- Record changes to the repository.
config         -- Dolt configuration.
conflicts      -- Commands for viewing and resolving merge conflicts.
constraints    -- Commands for handling constraints.
creds          -- Commands for managing credentials.
diff           -- Diff a table.
dump           -- Export all tables in the working set into a file.
fetch          -- Update the database from a remote data repository.
filter-branch  -- Edits the commit history using the provided query.
gc             -- Cleans up unreferenced data from the repository.
init           -- Create an empty Dolt data repository.
log            -- Show commit logs.
login          -- Login to a dolt remote host.
ls             -- List tables in the working set.
merge          -- Merge a branch.
merge-base     -- Find the common ancestor of two commits.
migrate        -- Executes a repository migration to update to the latest format.
pull           -- Fetch from a dolt remote data repository and merge.
push           -- Push to a dolt remote.
read-tables    -- Fetch table(s) at a specific commit into a new dolt repo
remote         -- Manage set of tracked repositories.
reset          -- Remove table changes from the list of staged table changes.
revert         -- Undo the changes introduced in a commit.
schema         -- Commands for showing and importing table schemas.
sql            -- Run a SQL query against tables in repository.
sql-client     -- Starts a built-in MySQL client.
sql-server     -- Start a MySQL-compatible server.
status         -- Show the working tree status.
table          -- Commands for copying, renaming, deleting, and exporting tables.
tag            -- Create, list, delete tags.
version        -- Displays the current Dolt cli version.

% dolt table completes like this:

% dolt table
cp      -- Copies a table
export  -- Export a table to a file.
import  -- Creates, overwrites, replaces, or updates a table from the data in a file.
mv      -- Moves a table
rm      -- Deletes a table

% dolt table import completes like this:

% dolt table import -
--continue            -- Continue importing when row import errors are encountered.
--create-table   -c   -- Create a new table, or overwrite an existing table (with the -f flag) from the imported data.
--delim               -- Specify a delimiter for a csv style file with a non-comma delimiter.
--file-type           -- Explicitly define the type of the file if it cant be inferred from the file extension.
--force          -f   -- If a create operation is being executed, data already exists in the destination, the force flag will a
--map            -m   -- A file that lays out how fields should be mapped from input data to output data.
--pk             -pk  -- Explicitly define the name of the field in the schema which should be used as the primary key.
--replace-table  -r   -- Replace existing table with imported data while preserving the original schema.
--schema         -s   -- The schema for the output data.
--update-table   -u   -- Update an existing table with the imported data.

Definitely worth spending about a day working on.

Gotchas

Being new to zsh completion development, one thing that tripped me up was how to load changes made to a completion script in development. This doesn't happen automatically, isn't mentioned in most tutorials, and was incredibly frustrating when I ran into it. You need to tell zsh to reload the completion script after you make changes, which is best done with a shell function.

reload() {
  local f
  f=(~/.zsh-completions/*(.))
  unfunction $f:t 2> /dev/null
  autoload -U $f:t
}

Future work

There's lots more that we could do here in future releases. Just a few ideas:

The current completions only consider flags, not arguments. Many dolt commands take table names as arguments, which we could generate as well.
Many options are mutually exclusive, like --update and --create for dolt table import. zsh completions can take advantage of this, only suggesting completion for one such option at a time. To generate these kinds of completions, the dolt command framework would need to be made aware that these options are mutually exclusive.
Value choices. Many dolt option flags expect one of several pre-defined values. E.g. the --result-format flag needs a value of csv, json, or tabular. There's a way to encode this into a zsh completion (of course there is!), but again the dolt command framework would need to be expanded to be made aware of this constraint.
Generate the completions using the dolt binary itself, rather than a script. This would mean writing a dolt complete command that takes the current command line and returns completions that zsh can use directly. This has the advantage of never needing to install a new completion script when new features are released, but was a bit more than I wanted to bite off on my first pass.

Any of these would be a great starter project for someone wanting to begin contributing to Dolt.

Conclusion

Hopefully you found this article because you were trying to get zsh subcommand completion working for your own program, and it saved you several hours of your life.

Current Dolt user? Get command line completions for dolt by running dolt gen-zsh and following the installation instructions in the generated file. New to Dolt? Download it and try it out today. Questions or comments about the article? Join us on Discord to talk to the team.

Blog