Correctness Update

SQL
4 min read

Dolt is a version-controlled SQL database that you can fork, clone, branch, merge, push, and pull just like a git repository. It turns out building a database is difficult to get right, but we're always improving. About a month ago, we announced that Dolt had reached 100% SQL Correctness.

Since then, we've been continuing to improve Dolt's correctness.

Improved Metrics

We've added a couple new metrics to our correctness documentation. Initially, we only displayed the SQLLogicTest results. Now, the page includes enginetest coverage and function coverage.

Engine Test Coverage

As users report issues and we implement new features, we accumulate a list of enginetests. Many of these tests cover complicated edge cases that are difficult to get right. As a result, some of these tests are skipped.

Initially, we had 166 tests skipped out of a total 41549 tests, resulting in a 99.60 percent pass rate. Now, we still have the same number of skipped tests, 166, but more enginetests in total, 42021, resulting in still a 99.60 percent pass rate.

Function Coverage

We already had a list of functions that we supported, which can be found here. Now, we have a section that summarize those results and shows the percentage of functions that we support.

Initially, we had 298 functions supported out of a total 438 functions, resulting in a 68 percent coverage. Now, we have 304 functions supported out of a total 438 functions, resulting in a 69 percent coverage.

JSON Functions

This pass, there's been a focus on improving our function support, specifically JSON functions.

Here are some notable JSON functions that we've added support for:

JSON_PRETTY

JSON_PRETTY() is a function that nicely formats a JSON document.

tmp/main> SET @j = '["abc", [{"k": "10"}, "def"], {"x":"abc"}, {"y":"bcd"}]';
tmp/main> select json_pretty(@j);
+-----------------+
| json_pretty(@j) |
+-----------------+
| [               |
|   "abc",        |
|   [             |
|     {           |
|       "k": "10" |
|     },          |
|     "def"       |
|   ],            |
|   {             |
|     "x": "abc"  |
|   },            |
|   {             |
|     "y": "bcd"  |
|   }             |
| ]               |
+-----------------+
1 row in set (0.00 sec)

JSON_SEARCH() is a utility function that searches a JSON document for a given value. This can be particularly useful when you're working with large JSON documents.

tmp/main> SET @j = '["abc", [{"k": "10"}, "def"], {"x":"abc"}, {"y":"bcd"}]';
tmp/main> SELECT JSON_SEARCH(@j, 'all', 'abc');
+-------------------------------+
| JSON_SEARCH(@j, 'all', 'abc') |
+-------------------------------+
| ["$[0]", "$[2].x"]            |
+-------------------------------+
1 row in set (0.00 sec)

These are handy functions to have, and we're excited to have them supported in Dolt.

MediaWiki Integration

Dolt strives to be a drop-in replacement for MySQL, which means it has to be compatible with many different tools. We've demonstrated Dolt's compatibility with a number of tools, such as ASP.NET, GORM, and (spoilers) MediaWiki.

During the process of integrating Dolt with MediaWiki, we discovered a few bugs. These mostly revolved around Character Sets / Collations, and FullText Indexes.

Character Sets / Collations

We had some parsing issues with character sets and collations, specifically BINARY. This was a relatively simple fix to our fork of the Vitess parser.

FullText Indexes

We had some issues with FullText Indexes. One issue involved the presence of large tokens; we were panicking when we encountered tokens larger than 84 characters. It turns out, MySQL just ignores large tokens as they are unlikely to be useful. Dolt now does the same.

An optimization issue was also discovered when rebuilding FullText Indexes, resulting in a performance improvement.

All these issues have been fixed, and we are now more MySQL compatible than ever.

AUTO_INCREMENT

A customer came in with a MySQL dump that was throwing a duplicate key error. After some investigation, we discovered that the dump was using SQL_MODE='NO_AUTO_VALUE_ON_ZERO'. Typically, when inserting into an auto-increment column, MySQL will generate a new value if the value is 0 or NULL. However, NO_AUTO_VALUE_ON_ZERO allows 0 to be inserted into an auto-increment column without generating a new value.

tmp/main> create table t (i int auto_increment, index (i));
tmp/main*> select @@sql_mode;
+---------------------------------------------------------------+
| @@sql_mode                                                    |
+---------------------------------------------------------------+
| NO_ENGINE_SUBSTITUTION,ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES |
+---------------------------------------------------------------+
1 row in set (0.00 sec)

tmp/main*> insert into t values (0), (0);
Query OK, 2 rows affected (0.00 sec)
tmp/main*> select * from t;
+---+
| i |
+---+
| 2 |
| 1 |
+---+
2 rows in set (0.00 sec)

tmp/main*> set @@sql_mode='NO_AUTO_VALUE_ON_ZERO';
tmp/main*> insert into t values (0), (0);
Query OK, 2 rows affected (0.00 sec)
tmp/main*> select * from t;
+---+
| i |
+---+
| 2 |
| 1 |
| 0 |
| 0 |
+---+
4 rows in set (0.00 sec)

Consequently, this uncovered another issue in our table_options parsing. Namely, the AUTO_INCREMENT=... option was not displaying in SHOW CREATE TABLE statements and was not being respected when creating tables. Naturally, this led to some large refactors and various bug fixes (involving other DDL queries), but Dolt is better for it.

tmp/main> create table t (i int auto_increment primary key) auto_increment = 10;
tmp/main*> show create table t;
+-------+------------------------------------------------------------------------------------+
| Table | Create Table                                                                       |
+-------+------------------------------------------------------------------------------------+
| t     | CREATE TABLE `t` (                                                                 |
|       |   `i` int NOT NULL AUTO_INCREMENT,                                                 |
|       |   PRIMARY KEY (`i`)                                                                |
|       | ) ENGINE=InnoDB AUTO_INCREMENT=10 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_bin |
+-------+------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

tmp/main*> insert into t values ();
Query OK, 1 row affected (0.00 sec)
tmp/main*> select * from t;
+----+
| i  |
+----+
| 10 |
+----+
1 row in set (0.00 sec)

tmp/main*> show create table t;
+-------+------------------------------------------------------------------------------------+
| Table | Create Table                                                                       |
+-------+------------------------------------------------------------------------------------+
| t     | CREATE TABLE `t` (                                                                 |
|       |   `i` int NOT NULL AUTO_INCREMENT,                                                 |
|       |   PRIMARY KEY (`i`)                                                                |
|       | ) ENGINE=InnoDB AUTO_INCREMENT=11 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_bin |
+-------+------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

Conclusion

There is still more work to be done after reaching 100% correctness. We are always improving Dolt's correctness and compatibility with MySQL. Have a function or an issue you want us to prioritize? Make an issue on our Github Issues page. Alternatively, we're always available on our Discord.

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt

Or join our mailing list to get product updates.