Supporting AUTO_INCREMENT

November 6, 2020

3 min read

Dolt is a database built for collaboration and data distribution. It's "Git for Data," a SQL database you can branch, merge, diff, clone, fork, push and pull. We intend to become a fully MySQL compatible database. Today, we're announcing support for AUTO_INCREMENT columns in Dolt.

The Basics

For the unfamiliar, AUTO_INCREMENT is column property that allows you to generate sequential primary keys for your data, automatically. If your data doesn't have a natural unique identifier, the database can create one for you, removing this logic from the application layer.

dolt> create table auto_inc (
   ->     auto_pk int PRIMARY KEY AUTO_INCREMENT,
   ->     myStr text
   -> );
dolt> INSERT INTO auto_inc (myStr) VALUES ("abc"),("xyz"),("dupe"),("dupe");
dolt> SELECT * FROM auto_inc;
+---------+-------+
| auto_pk | myStr |
+---------+-------+
| 1       | abc   |
| 2       | xyz   |
| 3       | dupe  |
| 4       | dupe  |
+---------+-------+

Unadultered, auto increment will generate ascending, sequential keys. If inserted values skip over a range of keys, auto increment will insert after the latest keys so that key order always matches insert order:

dolt> INSERT INTO auto_inc VALUES (10,"asdf");
dolt> INSERT INTO auto_inc (myStr) VALUES ("aoeu");
dolt> SELECT * FROM auto_inc;
+---------+-------+
| auto_pk | myStr |
+---------+-------+
| 1       | abc   |
| 2       | xyz   |
| 3       | dupe  |
| 4       | dupe  |
| 10      | asdf  |
| 11      | aoeu  |
+---------+-------+

The Implementation

At face value, auto increment seems to generate the next value in the sequence using max(val) + 1. The initial implementation of auto increment in Dolt made this assumption and generated auto increment values using a sub query. However, a closer inspection of the MySQL documentation shows a more complex and stateful behavior. MySQL auto increment values can be "burned" either by deletion or through a rolled-back transaction. If the largest key value is deleted, we still don't want to reuse it:

dolt> DELETE FROM auto_inc WHERE auto_pk = 11;
dolt> INSERT INTO auto_inc (myStr) VALUES ("newest");
dolt> SELECT * FROM auto_inc;
+---------+--------+
| auto_pk | myStr  |
+---------+--------+
| 1       | abc    |
| 2       | xyz    |
| 3       | dupe   |
| 4       | dupe   |
| 10      | asdf   |
| 12      | newest |
+---------+--------+

Properly following MySQL's auto increment semantics meant creating some table-level state to track auto increment sequences. Dolt's SQL functionality is implemented using go-mysql-server an open-source SQL engine. go-mysql-server allows queries over an arbitrary persistence layer by providing a set of golang interfaces for the persistence layer to implement. The interface for supporting AUTO_INCREMENT is:

type AutoIncrementTable interface {
	Table
	// GetAutoIncrementValue gets the next AUTO_INCREMENT value.
	GetAutoIncrementValue(*Context) (interface{}, error)
	// AutoIncrementSetter returns an AutoIncrementSetter.
	AutoIncrementSetter(*Context) AutoIncrementSetter
}

// AutoIncrementSetter provides support for altering a table's AUTO_INCREMENT sequence.
type AutoIncrementSetter interface {
	// SetAutoIncrementValue sets a new AUTO_INCREMENT value.
	SetAutoIncrementValue(*Context, interface{}) error
	// Close finalizes the set operation, persisting the result.
	Close() error
}

The persistence layer is responsible for providing and maintaining the auto increment sequence. The query engine is responsible for inserting rows correctly given the next value in the sequence. go-mysql-server even supports mutating this table-level state via an ALTER TABLE query:

blog> ALTER TABLE auto_inc AUTO_INCREMENT = 20;
blog> INSERT INTO auto_inc (myStr) VALUES ("twenty");
blog> SELECT * FROM auto_inc;
+---------+--------+
| auto_pk | myStr  |
+---------+--------+
| 1       | abc    |
| 2       | xyz    |
| 3       | dupe   |
| 4       | dupe   |
| 10      | asdf   |
| 12      | newest |
| 20      | twenty |
+---------+--------+

The Tradeoffs

Choosing MySQL compatibility was a central decision in Dolt's product evolution. It was important to emulate a known quantity in order to ease adoption and flatten Dolt's learning curve. Mixing the functionality of MySQL and Git creates a tool that feels familiar the first time you use it. However, we found plenty of gotchas as we implement more of MySQL's functionality. In some more obscure cases we've decided to deviate from MySQL's behavior when it makes sense to. While it was important to create table state for correct auto increment semantics, it's less clear whether we should implement the LAST_INSERT_ID() function. This function has unintuitive and, at times, undefined semantics. From the MySQL documentation:

For a multiple-row insert, LAST_INSERT_ID() and mysql_insert_id() actually return the AUTO_INCREMENT key from the first of the inserted rows. This enables multiple-row inserts to be reproduced correctly on other servers in a replication setup.

You should be aware that, if you mix references to LAST_INSERT_ID() and LAST_INSERT_ID(expr), the effect is undefined.

MySQL is so widely used that even a poorly defined API such as this is relied upon in some contexts. Eventually we will replicate these idiosyncrasies. Dolt users can still inspect a table's auto increment state using the information_schema database:

dolt> SELECT table_name,`auto_increment` FROM information_schema.tables WHERE table_name = 'auto_inc';
+------------+----------------+
| table_name | auto_increment |
+------------+----------------+
| auto_inc   | 20             |
+------------+----------------+

Conclusion

As we build out Dolt's SQL functionality (and correctness) the product continues to become more powerful and more usable. Auto increment is just the latest example. We're committed to solving data collaboration and distribution. We believe Dolt is the tool to do it. Let us know what you'd like to see us build next!

Blog

The Basics

The Implementation

The Tradeoffs

Conclusion

Get started with Dolt