Dolt for MySQL Database Versioning

REFERENCE
5 min read

Recently, I came across this Reddit post going over how they manage database versioning. I was particularly interested in this response

Reddit Response

To be fair it is really strong process for tracking and recording database changes. The person uses Git for versioning schema changes and Debezium for tracking data changes. Data changes are streamed to their data warehouse. Additionally, the person uses backups and archived binlogs to restore the database to a previous state if need be.

There are a few downsides to this approach as noted by the author. Large temporal moves are not possible due to the large amount of change in the database. The cost of storing backups past 7 days would be expensive. Finally, disaster recovery only allows you to restore to previous state and is not a fool proof way to get your application back quickly.

As we are in the version controlled database business, this begs the question, would it be better with Dolt? This blog will respond to each section of this post in detail.

First, What is Dolt?

A quick Dolt intro for those of you unfamiliar with it. Dolt is the first and only SQL database that you can fork, clone, branch, merge, push and pull just like a Git repository. Dolt is a version controlled database. Dolt is Git for Data. Think Git and MySQL had a baby.

Dolt implements the Git command line and associated operations on table rows instead of files. Data and schema are modified in the working set using SQL. When you want to permanently store a version of the working set, you make a commit. In SQL, Dolt implements Git read operations (ie. diff, log) as system tables and write operations (ie. commit, merge) as stored procedures. Dolt produces cell-wise diffs and merges, making data debugging between versions tractable. Dolt is the only SQL database that has branches and merges.

You can run Dolt online, like you would PostgreSQL or MySQL. Or you can run Dolt offline, treating data and schema like source code. Connect to Dolt just like any MySQL database to run queries. Use Dolt system tables, functions, or stored procedures to access version control information and features.

Check out our docs to learn more.

Back to the Question. Can Dolt make this process better?

Let's address each piece of the version control process in this response, explain how Dolt does it, and examine if the process is cleaner and easier with Dolt as your production database.

Schema Changes in Git

Where I work, we dump the schema of the database server every night and check it into a local git. We then run a git diff to see schema changes, and connect these with change tickets requesting the schema change for compliance.

Dolt really shines here as your database. Dolt versions data and schema at each commit which enables you to produce diffs of schema and data changes for review. This is the exact functionality this user is getting from Git but it's all native to the database and you get data diffs for review as well. Any rollback of schema or data changes will now instantaneous as all previous copies exist in the Dolt database.

Dolt provides an added bonus too. Dolt allows for branches of the the database so you can test any changes to schema or data using production data before making them live on production.

Data Changes streamed using Debezium

Data changes are logged by the database automatically, in the binlog. The binlog of MySQL is used for replication, but can also be consumed by other applications such as Debezium to put change records onto a Kafka bus, which in our case is then archived in Hadoop, again mostly for compliance.

Dolt tracks data changes using Git-style version control. Each change to the data is recorded as a commit, which contains a reference to the parent commit, a timestamp, and the user who made the change. The differences between each version of the data are stored as a set of deltas, or "diffs".

Dolt tracks the ancestry of each commit and is able to efficiently store the history of the data and provide a way to revert to previous versions, and compare changes over time. Dolt natively supports an audit log of every cell. Dolt is perfect for use cases requiring audit as there are no additional processes which require controls. Dolt is an immutable record of what changed, when, and why.

Nightly Backups

We also create backups of the database (nightly), and archive binlogs, which allows us to go back and forward in time at will.

Dolt makes nightly backups obsolete. If you want a copy of your data on another host, just push to a remote or use the Dolt backup commands.

Since Dolt is version controlled database, time-travel is built-in already. The magic of Dolt is the ability to look at your data and your database schema and be able to answer exactly where the data came from, when it changed, how it changed, who changed it, and hopefully a good commit message will also tell you why it changed.

Known Issues

Large temporal moves (i.e. further back than, say, 7 days) are usually not possible because the amount of change in our databases makes it impossible to keep change records (binlogs) for too long. The amount of storage for them (and the time to apply them) would be too costly.

Dolt doesn't save the whole database for each new version that you backup or commit. Dolt effectively saves the diff between the new version and old version at each commit. This allows for more efficient storage and fast retrieval from previous versions. Dolt can store orders of magnitude more history than backups and binlogs using the same storage footprint.

What's the Catch?

There are a few downsides to this:

  • You have to switch your production database to Dolt
  • Dolt is currently slower than MySQL, so you will need a bigger machine. This is not noticeable at the application layer though.
  • Dolt is not 1.0

Dolt Features without the Risk

Most of the features above are available by migrating your backup process to Dolt. You have to two options there:

  • binlog replication is coming soon and in preview. Email me if you want to try.
  • Create a dump file to load the data into Dolt. Dolt is MySQL compatible so this an easy process once you have the file.
    • mysqldump > dolt sql && dolt commit -am "This is a backup"

Conclusion

In our opinion, Dolt is the better choice to version control your MySQL database. Migrating your replication or backup process to Dolt is the best way to go about this today.

If the user above started running a regular job to store their backups in Dolt, their backup processes become much simpler, they've gained a better UI for diff'ing the entire database history in one application, and the storage cost is no longer a factor for saving backups. We are really excited about this use case for Dolt especially once we have binlog replication. This will be great way to get started with Dolt without making big changes to your environment.

Want to test this out? Send me email and drop by our Discord and we will get you setup.

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt

Or join our mailing list to get product updates.