Things that aren't Version Controlled Databases

REFERENCE
7 min read

We built Dolt and Doltgres, the world's first and only version-controlled SQL databases. These are the only SQL database products in existence that support the full range of Git version control operations -- diff, branch and merge, push and pull, fork and clone. We know this is true, because we built these products from the ground up using novel data structures and storage techniques to support these operations. You can't start with a normal relational database like MySQL or Postgres and then add version control on top. You have to start with that goal in mind, build a content-addressed storage solution to support it, and then build a SQL engine on top. That's what we did, and so far we're the only ones to have done so.

We've been at it for about 7 years now, and one of our biggest ongoing challenges is getting potential customers to understand what a version controlled database even is, or why you would want to use one. When we tell new people about Dolt in a forum or at a tradeshow, we often get one of two reactions:

  1. That's really cool, I can think of a ton of ways to use that at my company!
  2. Huh, what would you use that for?

In short, people either "get it" immediately and start generating ideas for how they would apply it, or they suffer a total failure of imagination. For the people who don't get it, we've written a lot about how our existing customers are using the product in their own domains. If you find yourself asking "what's this good for?", you should check out those blogs.

But there's another common objection we hear a lot, especially on forums like Reddit or Hacker News. This person doesn't understand what a version-controlled database is, but they think they do. And they pattern-match to other database-like features they've heard discussed in other products or contexts, and can't understand how Dolt is any different.

This blog post is for them. It's a list of database features and products that aren't version-controlled, but might have some things in common with version control.

MVCC is not version control

MVCC stands for multi-version concurrency control, and it's understandable how people get confused on this point -- it has "version" and "control" right in the name, almost next to each other! We sometimes get comments that every commercial database has already had version-control built-in for decades because of this feature.

MVCC refers to the ability of databases to support multiple simultaneous writers in their own isolated transactions, typically through the use of a journal file. MVCC means that each writer doesn't see the updates from other writers until the transaction concludes. In almost all cases, if two writers update the same rows, one will get a lock on those rows and block the other from continuing until its transaction is over.

postgres logo

But MVCC is not version control. It is a useful, even vital feature of databases. They couldn't scale up to many simultaneous writers without MVCC. But true version control means two collaborators on the data can have their own isolated, long-lived versions of the data (branches) that they can work on and read independently, and that these versions can be merged back together when the time comes. With MVCC, the different versions of the data only live as long as their transactions, which for practical reasons can't be held open for very long. By contrast, real branches in version control can go on for weeks or months, with multiple people collaborating on them, before being merged back in. Dolt lets you use branches as if they were a database transaction that goes on for months.

And of course, MVCC doesn't provide any of the other features you need from version control, like diffs or audit logs.

Time travel is not version control

Many commercial databases now give users the ability to query the database as it existed at different points in time. MySQL, Postgres, and SQL Server all implement this feature in different ways. All of them have different quirks in how the feature must be enabled and configured, and different limitations in how it must be applied. For example, typically older changes will be compressed and no longer directly queryable after a certain threshold has passed.

mysql logo

Before databases had time travel, people would build it themselves manually with a pattern called slowly changing dimension, often abbreviated SCD with a number, like SCD2. This was inferior to vendor-provided solutions in most respects, because it required you to make your application code aware of the special schema elements on each table, write different code than you usually would for an UPDATE or a DELETE, and so on. But it also made your solution vendor-agnostic, which is nice to have.

But time-travel, however it's implemented, is not version control. It satisfies a single property of version control, namely that you can inspect the data as it existed at some previous revision. That's nice to have, but it's only one part of what version control provides. You still can't have multiple different current versions of the same data (branches), efficiently find what changed between revisions, etc.

Dolt of course has time-travel, since it's version controlled. In Dolt, you can query past revisions with the AS OF syntax, like this:

SELECT * FROM myTable AS OF 'HEAD^2';
SELECT * FROM myTable AS OF TIMESTAMP('2020-01-01');

Schema migrations aren't version control

Schema migrations are a technique to manage the inevitable need to change a database's schema over the lifetime of the application it serves. Typically you manage this by storing every change to the schema in source control as a .sql file, and then use some library to manage applying those changes to the live database. We do this too, for our web property DoltHub. Here's what ours looks like:

0001_create_users.down.sql
0001_create_users.up.sql
0002_create_identity_links.down.sql
0002_create_identity_links.up.sql
0003_create_email_addresses.down.sql
0003_create_email_addresses.up.sql
0004_alter_users_add_column_primary_email_fk.down.sql
0004_alter_users_add_column_primary_email_fk.up.sql
0005_create_organizations.down.sql
0005_create_organizations.up.sql
0006_create_organization_members.down.sql
0006_create_organization_members.up.sql

Each contains some snippet of SQL to execute, like this:

CREATE TABLE users (
    id varchar(36) primary key,
    name varchar(32) collate utf8mb4_0900_ai_ci not null unique,
    display_name varchar(255) collate utf8mb4_0900_ai_ci,
    url varchar(2048),
    company varchar(255) collate utf8mb4_0900_ai_ci,
    location varchar(255),
    updated_at timestamp(6),
    created_at timestamp(6)
);

These files are source controlled, which means you get all the benefits of source control when managing them, including working on branches in isolation, doing merges, diffing with past revisions, etc. Beyond the open source libraries, lots of commercial software solutions have launched to help manage these migrations, like Liquibase or Redgate Flyway.

Redgate Flyway

But schema migrations are not version control. Yes, they are tools to manage how your database changes over time. And because they are files managed by source control, you can use tools like git diff and git log to understand those changes. However, with few exceptions, these tools only manage schema changes, not data changes. They don't provide the ability to, at runtime, examine how the schema or data have changed between revisions. They don't let you maintain two long-running branches of your database with different schemas and different subsets of users connnected to them.

Dolt of course supports schema migrations, with or without the assistance of files checked into source control. But you make schema changes live, on a branch, and then merge that schema change back into main after you test it. There are also several more complicated workflows involving making schema changes and then slowly rolling them out to different customers. Like Git, Dolt gives you a set of powerful tools that let you build workflows as complicated as they need to be to match your desired application logic.

No-merge branches are not version control

"Branches" are having a bit of a moment in the OLTP database world. Supabase has them. Planetscale has them. Neon has them. There are probably several others that we haven't heard about yet, and they all work slightly differently. But they all have one thing in common: the branches you create cannot be merged back to their origin branch when you're done making changes on them.

NeonDB

To take one example, Neon puts Postgres on a copy-on-write file system. This means when you create a branch from your production database, it's super fast, it just needs to create a new marker in that file system to denote where the new branch begins. Your new DB instance backed by that new branch can make new writes and they'll be independent from anything happening on the production server. Nobody will see those changes on prod, which is what you want in the case of running tests or lots of other workflows. But it's not what you want in lots of other workflows, basically any time when you want to experiment with something in isolation but keep your changes afterward and eventually get them back to main. If that's what you're after, you're just out of luck with all these "branching" database products.

In our view, a branch you can't merge back to main isn't a real branch. It's a fork. You obviously wouldn't accept that limitation from a source control solution -- imagine making your code changes on a branch in Git, but you couldn't open a PR at the end. If you really wanted to get those changes back on main, you would need to copy them there one by one. Hope you don't make a mistake!

Dolt of course supports merges, of both data and schema. It's as easy as a procedure call.

CALL DOLT_MERGE('feature-branch');

Conclusion

Lots of database products offer some of the benefits of version control. But Dolt and Doltgres are still the only ones in the world that provide the real deal. We think that's pretty great, and we hope you're interested enough to give it a shot. Dolt and Doltgres are free and open source, so why not try it today?

Questions about Dolt, or about version controlled databases? Come by our Discord to talk to our engineering team and meet other Dolt users.

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt

Or join our mailing list to get product updates.