Dolt is a SQL database with Git-style versioning.
With each new version of Dolt, we increase the number of supported SQL features.
We're moving toward our goal of being a complete drop-in replacement for MySQL, while adding all of the versioning features you know and love from Git applied to a database, such as branching, diffs, merging, etc.
This creates a very large surface area that we have to cover, and handwritten tests can only cover so many possibilities.
As a result, we've added continuous fuzz testing to Dolt, and in this blog post I'll go over a bit of our infrastructure on how it's set up.
What is Fuzz Testing?
I've previously written about our fuzzer in a blog post, but to sum it up: fuzz testing, also known as fuzzing (using fuzzers), consists of generating a lot of random input, hoping to find bugs, unexpected behavior, or outright crashes.
At the time of this post, our fuzzer attempts to create a valid Dolt repository full of random data (with a random number of branches, commits, tables, columns, etc.), validate that the Dolt repository contains the data as expected, and performs a merge (which, again, has the desired behavior).
By focusing on generating valid repositories, we can easily verify the expected behavior, as any error is a failure.
In the future, we'll expand the fuzzer to include many more operations (proper behavior for errors, diff output properly correlates to merge, etc.), however our first iteration focusing only on merges has produced interesting results, which I'll detail later.
The Long-Running Job
Using Kubernetes, we have a job that oversees our continuous fuzzing.
Every 15 minutes, we build from the tip of the fuzzer's GitHub repository, and we grab the hash of the version of Dolt that we desire to build from an S3 file, and build that specified version.
With our tools in place, we reference a configuration file stored in S3 that details all of the parameters that our run will adhere to.
This allows us to easily modify the configuration file.
From here, we run the fuzzer and generate some number of repositories (10 at the time of this post) per run, deleting all repositories that did not report any errors.
Afterward, we do a directory scan for any repositories that still exist, upload them to S3, and generate a GitHub issue with a link to the uploaded files, along with the error that was generated.
The job that creates a new fuzzer run every 15 minutes does not wait for the previous job to end, and thus we can have multiple fuzzer instances each generating their own repositories.
At the time of this post, we are generating about 960 repositories per day.
The Gate Keeper
As mentioned in the previous section, we do not build Dolt from the tip of its GitHub repository on every run.
Instead, whenever we push to our primary branch, we have a GitHub action that runs on that commit, and generates just 5 repositories.
We determined that any bugs that the fuzzer would normally find will occur after hundreds or thousands of runs, and therefore finding an error within 5 runs would mean that something is horribly broken and cause the fuzzer to fill our issues page with error after error.
To prevent this, the job triggered from the GitHub action only updates that hash stored in S3 once all 5 repositories were generated and validated without error.
And, just like with our long-running job, we upload any errors that we encounter to our GitHub issues.
By taking this approach, we're able to capture bugs that we introduce as soon as possible, preventing our users from seeing these in production.
All that I have said before is how our job works now, but it wasn't quite as simple when we first started out.
While we have found a few valid issues, anyone visiting our issues page will notice several hundred closed issues.
This was primarily caused by three issues.
The first was that our gate keeper failed to run.
We tested it on a separate branch to verify its behavior, however we didn't properly update it for our primary branch.
As a result, all pushes to our primary branch never prompted the GitHub action, so any bugs that were being reported would continue to report regardless of being fixed.
The second was that there were bugs in the fuzzer itself.
Although it is meant to find bugs in Dolt, it is also a piece of software, and thus subject to bugs of its own.
Determining whether a bug is due to Dolt or the fuzzer is fairly easy for me (the primary developer on the fuzzer), but as anyone that has had to debug a new application can attest to, it is far more difficult to debug an unfamiliar application, especially when determining whether the source of the bug is your product or the other.
This prompted an addition to the fuzzer that dumps all of its internal data for certain errors, along with a handy script that can create a Dolt repository of that data for easy consumption.
This makes it far easier for all Dolt developers to sift through issues and uncover what needs to be fixed.
Having to write that dumping logic took time, which brings us to the third issue: priority.
Knowing that the fuzzer has caught a bug does not mean that the bug is immediately fixed.
Just like all other bugs in any product, it is placed on a priority list.
This is an established practice, but it has a side effect that the fuzzer will continually create an issue every time it encounters this bug.
We are still investigating the best way to handle this, as multiple reports of a bug can yield benefits when debugging, however too many reports are essentially noise.
Here are a few issues that the fuzzer has found since it has been continuously running.
- https://github.com/dolthub/fuzzer/issues/39 => We made a few changes to how garbage collection works while running commands in Dolt. We accidentally assumed that a function would only be called from a single thread, however there were situations where multiple threads could call it, causing concurrency issues. It was easily fixed by adding a lock.
- https://github.com/dolthub/fuzzer/issues/92 => The fuzzer didn't handle
DECIMAL types correctly. Internally,
DECIMAL values are tracked by their string representation, which seemed to work in the majority of instances. However, strings also have a length component, and this was not being taken into account, causing smaller values to sort after larger values. This was fixed by the fuzzer using fixed-width strings for
DECIMAL internally. This same bug appeared in a few other types as well.
- https://github.com/dolthub/fuzzer/issues/338 => This error is still under active investigation, but it appears that our underlying blob type causes issues with specific statements once the repository gets large enough. We've made several recent changes to our blob type, therefore it's easy to deduce that this is probably from one of those changes. This also shows a need to improve some error messages, as it is misleading since the database is not actually corrupted.
- https://github.com/dolthub/fuzzer/issues/368 => Found the day of this post by the gate keeper. A user reported an issue regarding case sensitivity for some table names, and a fix was pushed that passed all of our tests. Immediately, the gate keeper reported that all 5 repositories failed on table-name errors. This also shows just how useful a fuzzer is, as our tens of thousands of tests did not catch something that was caught immediately through randomized input.
Dolt's fuzzer now generates almost a thousand repositories every day, which is a huge step toward our promise of stability.
As the fuzzer is able to test more features, we'll increase the number of repositories that are generated, thereby increasing the confidence factor in our long-term stability and viability as a fully-fledged SQL database.
When you include all of our versioning features, we believe that Dolt will be a serious contender in the database market, offering the characteristics that one expects from a database, while possessing features that no other database has.
You can stay up-to-date on our progress by following our releases, and you can directly interact with us by joining our Discord server.
We're very attentive to feedback, and you can help us grow Dolt to work perfectly with your specific use case.
We hope you'll join us for the ride!