Using dolt_ignore to Prevent Accidents

May 3, 2023

4 min read

Dolt is a MySQL compatible database with Git-style version control. Think of it as Git for Data: Git versions files, and Dolt versions tables.

In its typical use case, users commit changes to the database, and in git-like fashion, those commits act as snapshots of the database that can be checked out and explored.

But sometimes users only want to version some of their tables. There are many reasons why this would be the case: Maybe the tables change a lot and storing every version of every table would inflate the size of the database. Maybe the tables are generated from other tables that are versioned. Maybe the tables simply don’t need to be versioned at all. In any case, you might want a simple way to make sure that you don’t commit these tables by accident.

In git, you would use a special file called .gitignore to list patterns for file paths that should never be committed. We took inspiration from .gitignore to make our own dolt-shaped solution: dolt_ignore.

Introducing dolt_ignore

If .gitignore is a file that determines what other files you can commit, then dolt_ignore is a table that determines what other tables you can commit! (Makes sense, right?)

The primary purpose of dolt_ignore is to prevent you from accidentally staging tables for commit, even if (especially if) you use a command like dolt add -A, which normally stages all modified tables.

dolt_ignore is a special kind of system table that we’re calling a configuration table. Like user tables, configuration tables can be written to and even versioned. But unlike a user table, dolt_ignore follows some special rules.

Firstly, it always exists; you never have to create it (In fact, you can’t create it, or drop it, change its columns, or invoke any other Data Definition Language commands on it.) Secondly, like the other system tables, it always has the same schema. In dolt_ignore’s case, the schema looks like this:

`dolt_ignore` (
  `pattern` TEXT NOT NULL,
  `ignored` BOOL NOT NULL,
  PRIMARY KEY (`pattern`)
)

Each row of dolt_ignore is a rule the helps dolt determine what tables it can stage. Each rule contains a pattern, which is a string that describes one or more table names, and each rule also contains a true-or-false flag that tells dolt whether or not the tables that match that pattern should be ignored (in this case, ignored means that it won’t be staged for versioning).

(For the complete definition of patterns, check out our documentation for dolt_ignore.)

So for example, if you have a script that computes auxiliary tables and you don’t want those table to get versioned, you could have all the generated tables start with generated_ and then add a rule:

> dolt sql -q “INSERT INTO dolt_ignore VALUES (‘generated_%’, true);”
> dolt select * from dolt_ignore
+---------------+---------+
| pattern       | ignored |
+---------------+---------+
| generated_%  	| 1       |
+---------------+---------+

Again, you don’t need to create the dolt_ignore table in order to use it. It already exists; you can just write to it.

Once you’ve written this rule to dolt_ignore, you no longer have to worry about accidentally staging any of the generated tables; dolt won’t let you, regardless of whether your commands specify these tables by name or by calling add -A. (If you’re absolutely sure that you want to stage that table anyway, you can override this with the --force flag: dolt add -A --force will stage all your modified tables regardless of the state of dolt_ignore.)

Now in this example we used a true flag to tell dolt that yes, we want dolt to ignore these tables. If we had set that flag to false instead, then the tables would have been staged normally. This flag may feel unnecessary, but it’s important for carving out exceptions to our rules: if a table name matches multiple patterns, only the most specific pattern will apply. This means that if you only want to version specific tables, you can use two patterns: a broad pattern that ignores most or all tables (with ignored = true), and a specific pattern for the tables you want to version (with ignored = false)

Let’s look at an example like that:

Suppose that we want most of the tables in the database to be unversioned, and we only want to version specific tables, which we give names that end in _versioned. Then we might write our dolt_ignore rules like so:

dolt sql -q “INSERT INTO dolt_ignore VALUES (‘%’, true), (‘%_versioned’, false)”
> dolt select * from dolt_ignore
+---------------+---------+
| pattern       | ignored |
+---------------+---------+
| %             | 1       |
| %_versioned   | 0       |
+---------------+---------+

Now, only the tables ending in _versioned will be staged, because those are the only tables that don’t match the % rule.

Parting Wisdom

There are a couple of little things to keep in mind when using dolt_ignore:

dolt_ignore can be staged and versioned just like user tables. Of course, if you don’t want to do this you can always just keep it in your working set. (And you can even add a rule to dolt_ignore that will match the dolt_ignore table itself!)
If you somehow create a table that matches multiple rules, but none of those rules are more specific than each other, then you’ll get in error when trying to stage the table. This shouldn’t happen unless you go out of your way to make it happen though.
Commands like dolt status and dolt diff also won't display ignored tables, so you don't need to worry about too many unstaged tables bloating up the responses of these commands. If you want to include ignored tables in dolt status, add the --ignored flag.

We hope that dolt_ignore will be another helpful tool in your Dolt toolkit, especially for users writing automated scripts or procedures that perform stages and commits. If you have any unanswered questions about this feature, or about Dolt in general (or if you just want to chat!) you can always reach out to us on Twitter, Discord, and GitHub!

Blog

Introducing dolt_ignore

Parting Wisdom

Get started with Dolt