So you want Database Version Control?
Database Version Control is a poorly ranked Google search. For me, it starts with a horizontally scroll-able section on code version control: irrelevant.
Then, next up is ads for Redgate, Liquibase, and verta.ai. The rest of the non-sponsored results are a mixture of Liquibase, Redgate, and some not very useful articles about why you need to version control your database. Liquibase and Redgate offer schema migration tools, version control for your database schema. Google maps this search term to a specific aspect of database version control. In 2021, this query response to "database version control" is both too specific and antiquated.
We have new technology! We now have true database version control, versioning both schema and data. This blog surveys the space and attempts to answer the query more aptly for the state of technology in 2021.
What do you mean by Database Version Control?
What do you mean by database? Do you mean relational databases? If relational, do you care about schema or just data (or vice versa)? For most of this blog article, we're going to assume you mean a SQL database though towards the end, a graph database will make an appearance.
What do you want to version control? Do you care about rollback? Do you want time travel? Lineage, who changed what and when? Diff, do you want to be able to quickly calculate the difference between versions of the schema or data? Branch/merge, multiple editors making concurrent long running changes? What do you want to do when a conflict happens? What's your conflict detection and resolution strategy?
The term of art for version controlling database schema is database migrations. You start with a core schema, check that into version control, and then any schema changes you do, you apply as a patch to that schema, also checking that into version control.
At DoltHub, we run a PostgreSQL database backing the DoltHub website and we check the schema and schema patches into Git. As part of our development environment set up and production deployment process we reference this schema and patches. This is common practice.
There are a bunch of tools that help with various pieces of this process if you don't want to roll your own. For this article, we focused on some of the most popular and interesting tools, Liquibase, Redgate Deploy, and Planetscale. We dive a bit deeper into what they do so you can get a clearer picture of the space.
If you want your application to be aware of data versioning, you can create some semblance of version control with schema. The term of art for this is Slowly Changing Dimension. This is not true version control. You concoct a schema using columns that indicate inactive or active or "active from" to "active to" dates. Version number or timestamp columns on rows can also be used. Then, your application uses these columns to construct present and historical views of the data. Your application never deletes, just changes state from active to inactive. I don't think anyone would consider this true version control as the application or an operator can still make a mistake and do really bad things. But, a number of databases are designed this way to give version control like features to their applications.
If you just want rollback, backups and transaction logs are supported by every database solution. Maybe you take a nightly backup and you keep transaction logs since the last backup. A mistake that requires using these backups is usually corrected in hours at best but most likely days. This AWS Elastic Load Balancer outage is case in point. True version control means rollback on the order of seconds.
We have SQL "as of" syntax. This is in the SQL standard as of 2011 (pun intended) and supported by Oracle, Microsoft SQL Server, and MariaDB. With this syntax you can configure a table to be versioned or a "temporal table" and then query a version using
as of <timestamp>. You don't get diffs and merges but you do get instant rollbacks.
We now have the technology to fully version (ie. lineage, branch, merge, diff) large data! Noms laid the groundwork with a core storage format called a prolly tree, a content addressed binary tree. This data structure allows for fast diff and merge on binary trees, the core data structure in most databases, without compromising too much read or write performance. Recently, a couple new databases have emerged using this or similar technology, TerminusDB for graph or document databases and Dolt for relational databases.
We're biased but we think database version control means full Git-style versioning, branch, merge, and diff, of schema and data.
With that introduction, let's consider our options. If we missed you, please email me at firstname.lastname@example.org. We broke the tools down into two categories: Database Migration Tools and Version Controlled Databases.
Database Migration Tools
- Version Control for Databases
- Initial Release
- April 2012
Liquibase formalizes your database migration language with a configuration language for database schema and alterations. The examples lean into XML but further digging say they support SQL, JSON, and YAML as well. Using something other than SQL makes Liquibase cross platform. You can use the same Liquibase XML descriptors to migrate from say, PostgreSQL to MySQL, which is nice.
The Liquibase magic is how it applies these changes to your database. Liquibase supports rollbacks and preview.
Liquibase has been around for a decade and enjoys high relevance in this category. But there's been a lot of innovation in this space, so keep reading and check out what's new. If you're looking for the leading database migration tool, Liquibase is for you.
- Source Control Your Database
- Initial Release
- Earliest Mention 2014
- Not Open Source
Redgate deploy is for Microsoft SQL Server only. The main selling point is its integration with the Microsoft Developer ecosystem including Visual Studio. The approach is the same as Liquibase, define your schema and alterations and apply those in a principled way.
The one feature that seems cool from a versioning perspective is that Redgate deploy allows for configurable version control of "static table data". From the documentation, this feature is "intended only where the table has fewer than a thousand rows". If you want to version more than 1000 rows, keep reading.
- The database for developers
- Initial Release
- March 2016
Planetscale is awesome and takes database migrations to the next level. Imagine if someone ran the database for you and controlled how schema patches were applied. What would be possible?
Planetscale is run by the good folks who wrote Vitess. Vitess is an open source "database clustering system for horizontal scaling of MySQL". Dolt is a heavy user of Vitess' MySQL dialect parsing code. We wouldn't be here without them.
So, on Planetscale, you get all the schema branch/merge functionality of Liquibase or Redgate. Additionally you also get a world class, modern deployment environment for your changes. On the downside, if this is a downside for you, Planetscale runs your database for you. Planetscale is MySQL only so if you have some aversion to that database format, that's also a downside.
Note, for the release date, I went with the first release of Planetscale's fork of Vitess. I'm not exactly sure when Planetscale in its current form launched and my cursory research didn't turn it up.
Version Controlled Databases
- Making Data Collaboration Easy
- Initial Release
- October 2019
TerminusDB has full schema and data versioning capability but offers a graph database interface using a custom query language called Web Object Query Language (WOQL). WOQL is schema optional. TerminusDB just released the option to query JSON directly, similar to MongoDB, giving users a more document database style interface.
The versioning syntax is exposed via TerminusDB Console or a command line interface. The versioning metaphors are similar to Git. You branch, push, and pull. See their how to documentation for more information.
TerminusDB is new but we like what we see. The company is very responsive, has an active Discord, and is well funded. If you think your database version control makes more sense in graph or document form, check them out.
- It's Git for Data
- Initial Release
- August 2019
Dolt takes “Database Version Control” rather literally. Dolt implements the Git command line and associated operations on table rows instead of files. Data and schema are modified in the working set using SQL. When you want to permanently store a version of the working set, you make a commit. In SQL, dolt implements Git read operations (ie. diff, log) as system tables and write operations (ie. commit, merge) as functions. Dolt produces cell-wise diffs and merges, making data debugging between versions tractable. That makes Dolt the only SQL database on the market that has branches and merges. You can run Dolt offline, treating data and schema like source code. Or you can run Dolt online, like you would PostgreSQL or MySQL.
We are biased but we think if you want Database Version Control for a SQL database, there is only one product that fits that label and that's Dolt.