The Foreign Key Eviction

SQL
5 min read

When building a database, there are many decisions to be made regarding its implementation. Our database is Dolt, which is a SQL database with Git-style versioning. We're modeling our SQL implementation on MySQL, so much so that we're a drop-in replacement for MySQL. Our challenges and overall goal are different than MySQL though, as we believe that versioning capabilities should not be restricted to source code, as it opens up a world of possibilities when applied to data. With that goal in mind, each decision we make on how to implement a feature is considered from the aspect of versioning support. And, well, sometimes the right decision at the time is the wrong decision for the future.

The Original Implementation

When Dolt started out, it didn't support SQL at all. There were CLI commands to modify data on tables, but complex relationships between data wasn't possible at that time. The solution to this was to add SQL support, and we adopted the go-mysql-server (GMS) project from source{d}. As it was still a WIP, there were many core SQL features that had yet to be added, so we began adding them in. One of these features was foreign keys, considered by some to be one of the pillars in relational databases. In July 2020, we landed foreign key support in Dolt, however if one were to look at the source code for GMS then they'd notice a lack of real support for foreign keys. In fact, GMS only handled their parsing and display, with all of the validation logic occurring in Dolt! Why?

Well, the simple answer: data imports. Early on, the primary method of getting data into Dolt would be to have a large CSV, and use the import command to load all of that data into Dolt. import was blazingly fast compared to a SQL script with a bunch of INSERT statements (over 20 times faster in some of our old benchmarks), as it had a custom path specifically written to process rows as fast as possible. This custom path, of course, completely skipped the SQL engine, and thus foreign keys were implemented in Dolt itself.

Our Current Needs

In the 2 years since the initial addition of foreign keys, the differentiation between GMS and Dolt has become a lot clearer. Current Dolt is primarily focused on optimizing disk operations, with GMS processing queries and transforming them into a small set of commands that Dolt understands. This allows both projects to work to their strengths, similar to how an API provider and an API user work together to accomplish a task, with the API acting as the intermediary. In fact, many CLI commands have just become shortcuts for SQL queries! This distinction gives rise to many exciting possibilities for future Dolt expansions (such as having drop-in support for PostgreSQL), but for any of these ideas to ever become a reality, we truly need to have a separation between the SQL engine and Dolt. Our foreign key flavor is 100% aligned with MySQL, and thus it had to move to its rightful home.

For Dolt, the move was fairly dirty, as foreign keys had rooted themselves deeply in the codebase over the years. The three pillars of data manipulation—INSERT, UPDATE, and DELETE—each had to handle table tracking, recursive key cascades, row referencing, and much more. There was even an entire concept called session tables that were introduced in support of foreign keys, so that any changes made to one table were able to be picked up by a foreign key on any other table. For GMS, this concept of session tables was completely unnecessary, as it had all of the information needed to load any table that could potentially be touched by a foreign key. In fact, besides the data manipulation statements, all GMS needed was a way to check if any given row exists in a table. That's all, just four commands—INSERT, UPDATE, DELETE, and GET ROWS—are all that are needed to handle even the most complex foreign key operations. Even better, a form of GET ROWS already existed in Dolt due to the previous foreign key implementation, so it was reworked to be operable from GMS. A few changes later and foreign keys now reside in their rightful home, the SQL engine.

Additional Benefits

GMS was originally an open-source framework to provide a MySQL engine that can be embedded into any application. We continue this goal today, and use Dolt as the catalyst for a lot of our GMS development. By bringing proper foreign key support to GMS, this allows all other integrators of the project to use the built-in foreign key support, as we no longer require them to implement it on their end. This is a win not only for us, but for everyone!

Conclusion

We're hard at work rewriting how Dolt stores and handles data on the disk, and with such a massive rewrite, we figured it was time to move foreign keys to their rightful home in the SQL engine. Otherwise, we'd have to completely reimplement foreign keys on the new storage format as well! This goes to show that the considerations that may be made years prior may no longer apply as time goes on. SQL queries no longer have the overhead that they used to, and new statements such as LOAD DATA allow for importing through the SQL engine, naturally allowing foreign key support.

I hope that you'll continue to join us as we improve Dolt! We've got a lot of exciting changes coming in the near future. You can keep up to date with us through Twitter, or you can chat directly with us through Discord. You can even check out our ever-growing list of open source repositories at DoltHub!

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt

Or join our mailing list to get product updates.