Dolt and Fuzz Testing

May 12, 2021

9 min read

Dolt is a SQL database with Git-style versioning. With each new version of Dolt, we increase the number of supported SQL features, moving toward our goal of being a complete drop-in replacement for MySQL, while adding all of the versioning features you know and love from Git applied to a database, such as branching, diffs, merging, etc. As a result, it is imperative that we build a stable product, and in this blog post I'll briefly go over how we're striving to achieve that through fuzz testing.

The Importance of Testing

Testing is an important part of any program, and has many useful aspects. These aspects range from ensuring program correctness by validating output from a given input, to even defining the goal of an implementation before starting it (see test-driven development). In most of these cases, the tests and results are coded by the engineer. Although one can generally write a comprehensive suite of tests for their program, they are limited by the aspect of time, as any program involving user input essentially has an infinite number of testable cases. With such a vast number of testable cases, it can be easy to miss an important test case, and therefore end up missing a bug that a user eventually finds.

As an example, let's pretend that we've written a calculator similar to an old handheld calculator. We only allow numbers between 0 and 1000 as input, and we disallow any input that will push the current result outside the aforementioned bounds. We write tests starting from the initial result of zero, and cover addition, subtraction, multiplication, and division. We use the powers of 10 as our test input (1, 10, 100, 1000, etc.) along with a collection of random numbers of our choosing. All seems well, and we release the product to customers. A week later, we get a bug report from a user who was able to reach a result of 2000, and it turns out they used addition twice in succession without typing a number. This was a case that we never considered, and although we can now fix it, due to the near infinite number of cases out there, how many more may exist? This is where fuzz testing comes in.

What is Fuzz Testing?

Fuzz testing, also known as fuzzing (using fuzzers), consists of generating a lot of random input, hoping to find bugs, unexpected behavior, or outright crashes. In the previous example, it is very likely that a fuzzer would have been able to catch the double addition bug. Programmers have a general (unintentional) tendency to test against the expected workflow of their program, while random data or inputs have no such tendency. This allows one to trade time for an increasing assurance of stability and correctness, as (depending on the implementation) the fuzzer may be able to run indefinitely, throwing all sorts of random data at the program. In addition, any issues found through the fuzzer may then become explicit tests within the test suite to prevent regressions from future changes.

Program Complexity

For Dolt, fuzzing is very important. We are building toward being a complete drop-in replacement for MySQL, and that means it is critical that we have the reliability that people expect from MySQL itself. MySQL, however, is a product that is over 25 years old, and is very complex regarding the interactions of every statement, function, variable, etc. Dolt inherits that very same complexity by design. In addition, Dolt has Git-style versioning. So for every possible MySQL database, multiply that complexity by the behavior of branching, diffing, merging, committing, etc. This gives us an absolutely enormous surface area of functionality to test, and although we do our best to test everything as thoroughly as possible, it is strictly impossible to truly test everything. Our fuzzer helps us with this, by testing many cases that we do not have the time to test, and also by being able to work nonstop.

How We Fuzz

Dolt's fuzzer is a separate program that takes in a vast number of parameters—from which SQL types are used and their frequency, to whether each table in each branch adheres to a minimum number of rows. All of these parameters are presented as a range in a config file. Although it's not strictly necessary to allow so much configuration, it allows us to tailor specific fuzzer instances to favor (or strictly use) certain code paths. For example, one instance may only use the command line interface, while another uses only the server interface, while yet another uses a combination of both. With this config file, the fuzzer runs in "cycles", where one cycle represents one generated and tested repository.

At the beginning of a cycle, we take a random value for every parameter according to the config file, which then becomes the blueprint of that cycle (how many tables will be created, how many branches made, presence of foreign keys and how many, etc.). From there, the blueprint is followed until a complete repository has been generated. For now, the fuzzer only tests valid (but random) data and statements (according to the blueprint), so any failures are bugs. If no further commands were given to the fuzzer, then the cycle was a success, and a new cycle begins. If not, the exact error is logged, all further work on the cycle is terminated, and a new cycle begins.

In addition to general repository generation, you can issue different commands to the fuzzer to further test functionality, such as merging two (or more) branches. In this case, the command may influence the repository generation, so the merge command would alter the blueprint so that the branch count is no less than two. Just as before, the merge command would test that the merged data is correct, reporting an issue if one was found.

For now, all statements and all generated data is valid. To ensure this validity, only simple SQL commands are being used for data input (no INSERT ... SELECT, common table expressions, etc.). In the future, we're going to expand the fuzzer to make use of as many statements as possible, while also allowing invalid statements to test that our errors are valid (such as verifying that a duplicate primary key error did not modify any data). As well, we will be adding more commands that specifically test our diff functionality, garbage collection, and even directly comparing output to MySQL.

Run Example

As an example, here is a log file of a small run. This is just the repository generation, with no other commands present. Single table, no branches, limited column count, limited row count, and longer types disabled (otherwise the file would be too large).

INFO: Cycle started: 2021-05-12 12:00:00
CLI:  dolt init
SQLS: CREATE TABLE `oZkCDzDiZa` (`0t11Ww` TINYINT UNSIGNED, `gU9rl9` DATE, `O6R2gZ` YEAR, `vpI6uD` VARCHAR(48) COLLATE utf8mb4_0900_ai_ci, `_Avasi` TIME, `AewVgw` MEDIUMINT, `3pxWLg` SET('-Y2wIK','E6kk3=J%-r#','hQ7ehC5u'), PRIMARY KEY (`0t11Ww`));
SQLS: REPLACE INTO `oZkCDzDiZa` VALUES (104,'2053-11-21',2069,'v:|dBhU*9KIO@%9dHZR#Jj4#5TmKvFL+bh8U#:bRh*n S','-519:26:58',-5056129,4);
SQLS: UPDATE `oZkCDzDiZa` SET `gU9rl9` = '4194-03-30',`O6R2gZ` = 2136,`vpI6uD` = '7sANaO:LD-k1v51F*YLlMKg3Tn2' WHERE `0t11Ww` = 104;
SQLS: INSERT INTO `oZkCDzDiZa` VALUES (19,'2813-08-22',1948,'RPqY+0$7yd|1#iD05Yz2PNw#v!~ 0SY2Pf89~h*','377:54:43',2775885,6);
SQLS: REPLACE INTO `oZkCDzDiZa` VALUES (76,'9810-02-22',2114,'*e','-684:53:59',5215986,7);
SQLS: UPDATE `oZkCDzDiZa` SET `gU9rl9` = '7221-04-02',`O6R2gZ` = 2073,`vpI6uD` = '|x!bT' WHERE `0t11Ww` = 104;
SQLS: INSERT INTO `oZkCDzDiZa` VALUES (149,'3448-08-20',2083,'','84:12:30',-7152662,2);
SQLS: DELETE FROM `oZkCDzDiZa` WHERE `0t11Ww` = 19;
SQLS: REPLACE INTO `oZkCDzDiZa` VALUES (201,'9349-03-13',2106,'im#Q0!Z02Td4Sb8Llx%Y!','-39:35:26',-5791957,2);
SQLS: DELETE FROM `oZkCDzDiZa` WHERE `0t11Ww` = 104;
SQLS: UPDATE `oZkCDzDiZa` SET `gU9rl9` = '2357-04-08' WHERE `0t11Ww` = 19;
SQLS: INSERT INTO `oZkCDzDiZa` VALUES (116,'9164-07-08',2000,'0%#%A#L-6N:Pm','264:29:39',-2692378,3);
SQLS: INSERT INTO `oZkCDzDiZa` VALUES (56,'1139-03-14',2150,'K#3!KR4Tk=cl+7!4x!qP4xo','31:37:14',4608127,2);
SQLS: UPDATE `oZkCDzDiZa` SET `gU9rl9` = '2050-11-06',`O6R2gZ` = 1929,`vpI6uD` = 'l#0a_1:86m3DWqiYy4FqN07QU0fcQfYTSqYpC:a69i',`_Avasi` = '-331:40:12',`AewVgw` = 4368147 WHERE `0t11Ww` = 76;
SQLS: REPLACE INTO `oZkCDzDiZa` VALUES (117,'6686-05-27',2035,'Ik!yk0dX1j:','-165:09:17',412670,2);
SQLS: DELETE FROM `oZkCDzDiZa` WHERE `0t11Ww` = 19;
SQLS: UPDATE `oZkCDzDiZa` SET `gU9rl9` = '2892-05-11',`O6R2gZ` = 1926,`vpI6uD` = '2!40gN7.|K+RT@_G=JO*=WhRo5AMW' WHERE `0t11Ww` = 149;
SQLS: REPLACE INTO `oZkCDzDiZa` VALUES (239,'8246-08-16',2145,'65Z7:%@Hv:u!J%ES%2iA~08s-cv_0.y#!','124:51:26',-4864970,5);
SQLS: INSERT INTO `oZkCDzDiZa` VALUES (36,'8210-12-07',2056,'XG2T..P|9#PjlhK-*2I$BRGG5KK','821:58:00',731367,4);
SQLS: DELETE FROM `oZkCDzDiZa` WHERE `0t11Ww` = 117;
SQLS: REPLACE INTO `oZkCDzDiZa` VALUES (39,'8084-07-09',2133,'LA~JJgSc','758:38:07',2850861,6);
SQLS: INSERT INTO `oZkCDzDiZa` VALUES (204,'5049-12-01',1979,'MWCoQUgToP^d_rH','699:32:31',-192755,3);
SQLS: REPLACE INTO `oZkCDzDiZa` VALUES (223,'2506-01-20',2055,'n3_e$|3QamUsErm=','86:55:08',-2111156,7);
SQLS: DELETE FROM `oZkCDzDiZa` WHERE `0t11Ww` = 149;
SQLS: DELETE FROM `oZkCDzDiZa` WHERE `0t11Ww` = 36;
SQLS: UPDATE `oZkCDzDiZa` SET `gU9rl9` = '3571-01-15' WHERE `0t11Ww` = 149;
SQLS: INSERT INTO `oZkCDzDiZa` VALUES (113,'1847-02-18',2084,'X@ n6QjZ15F:bO#qfdqB','-735:43:28',-1509,2);
SQLS: REPLACE INTO `oZkCDzDiZa` VALUES (194,'3262-07-12',2124,'F.I!#qYF5nNAGGe~v=x@i%Mz2@H=012cLG','-27:17:36',5447393,3);
SQLS: UPDATE `oZkCDzDiZa` SET `gU9rl9` = '7368-06-04',`O6R2gZ` = 2011,`vpI6uD` = 'dD*N6xF=aa',`_Avasi` = '-238:04:31' WHERE `0t11Ww` = 76;
SQLS: DELETE FROM `oZkCDzDiZa` WHERE `0t11Ww` = 149;
CLI:  dolt gc
INFO: Cycle finished successfully: 2021-05-12 12:00:00

Bugs Found So Far

Even in its incomplete state, the fuzzer has already caught some very obscure bugs. I'll cover two notable ones.

The first was a bug when making use of the VARBINARY type. Internally, Dolt stores VARBINARY data as a string, and the length of the string is a prefix to the data for the string. Specific to the VARBINARY type, the length was being read incorrectly, however the resulting value was somehow still correct for every test that had been written, and even for users who had VARBINARY columns in their own data. The fuzzer ended up finding a string that caused the length reading to return an incorrect result, which we could track down and fix.

The second was a bug when using a BLOB as a type in the primary key. For some inputs, an internal writer would not be passed as a parameter, which the BLOB type uniquely needs (no other SQL type so far uses this writer). Whether it was passed or not depended on the code path taken, which resulted from the input, as changing the string by even a few characters would cause the issue to not manifest. Similar to before, it took a specific string to find a bug, which is very hard to manually test for.

Conclusion

Fuzzing in Dolt has already found bugs that we've been able to fix, making the product much more stable. After each fix, it takes an increasingly longer amount of time before the next bug is found, directly showing its value. Over the coming months, we will continue to add more testing features to the fuzzer, making Dolt an overall better product. You can stay up-to-date on our progress by following our releases, and you can directly interact with us by joining our Discord server. We hope you'll join us for the ride!

Blog