Things that aren't Version Controlled Databases
We built Dolt and Doltgres, the world's first and only version-controlled SQL databases. These are the only SQL database products in existence that support the full range of Git version control operations -- diff, branch and merge, push and pull, fork and clone. We know this is true, because we built these products from the ground up using novel data structures and storage techniques to support these operations. You can't start with a normal relational database like MySQL or Postgres and then add version control on top. You have to start with that goal in mind, build a content-addressed storage solution to support it, and then build a SQL engine on top. That's what we did, and so far we're the only ones to have done so.
We've been at it for about 7 years now, and one of our biggest ongoing challenges is getting potential customers to understand what a version controlled database even is, or why you would want to use one. When we tell new people about Dolt in a forum or at a tradeshow, we often get one of two reactions:
- That's really cool, I can think of a ton of ways to use that at my company!
- Huh, what would you use that for?
In short, people either "get it" immediately and start generating ideas for how they would apply it, or they suffer a total failure of imagination. For the people who don't get it, we've written a lot about how our existing customers are using the product in their own domains. If you find yourself asking "what's this good for?", you should check out those blogs.
But there's another common objection we hear a lot, especially on forums like Reddit or Hacker News. This person doesn't understand what a version-controlled database is, but they think they do. And they pattern-match to other database-like features they've heard discussed in other products or contexts, and can't understand how Dolt is any different.
This blog post is for them. It's a list of database features and products that aren't version-controlled, but might have some things in common with version control.
MVCC is not version control
MVCC stands for multi-version concurrency control, and it's understandable how people get confused on this point -- it has "version" and "control" right in the name, almost next to each other! We sometimes get comments that every commercial database has already had version-control built-in for decades because of this feature.
MVCC refers to the ability of databases to support multiple simultaneous writers in their own isolated transactions, typically through the use of a journal file. MVCC means that each writer doesn't see the updates from other writers until the transaction concludes. In almost all cases, if two writers update the same rows, one will get a lock on those rows and block the other from continuing until its transaction is over.
But MVCC is not version control. It is a useful, even vital feature of databases. They couldn't scale up to many simultaneous writers without MVCC. But true version control means two collaborators on the data can have their own isolated, long-lived versions of the data (branches) that they can work on and read independently, and that these versions can be merged back together when the time comes. With MVCC, the different versions of the data only live as long as their transactions, which for practical reasons can't be held open for very long. By contrast, real branches in version control can go on for weeks or months, with multiple people collaborating on them, before being merged back in. Dolt lets you use branches as if they were a database transaction that goes on for months.
And of course, MVCC doesn't provide any of the other features you need from version control, like diffs or audit logs.
Time travel is not version control
Many commercial databases now give users the ability to query the database as it existed at different points in time. MySQL, Postgres, and SQL Server all implement this feature in different ways. All of them have different quirks in how the feature must be enabled and configured, and different limitations in how it must be applied. For example, typically older changes will be compressed and no longer directly queryable after a certain threshold has passed.
Before databases had time travel, people would build it themselves manually with a pattern called
slowly changing dimension, often
abbreviated SCD with a number, like SCD2. This was inferior to vendor-provided solutions in most
respects, because it required you to make your application code aware of the special schema elements
on each table, write different code than you usually would for an UPDATE
or a DELETE
, and so
on. But it also made your solution vendor-agnostic, which is nice to have.
But time-travel, however it's implemented, is not version control. It satisfies a single property of version control, namely that you can inspect the data as it existed at some previous revision. That's nice to have, but it's only one part of what version control provides. You still can't have multiple different current versions of the same data (branches), efficiently find what changed between revisions, etc.
Dolt of course has time-travel, since it's version controlled. In Dolt, you
can query past revisions with the AS OF
syntax, like this:
SELECT * FROM myTable AS OF 'HEAD^2';
SELECT * FROM myTable AS OF TIMESTAMP('2020-01-01');
Schema migrations aren't version control
Schema migrations are a technique to manage the inevitable need to change a database's schema over
the lifetime of the application it serves. Typically you manage this by storing every change to the
schema in source control as a .sql
file, and then use some library to manage applying those
changes to the live database. We do this too, for our web property
DoltHub. Here's what ours looks like:
0001_create_users.down.sql
0001_create_users.up.sql
0002_create_identity_links.down.sql
0002_create_identity_links.up.sql
0003_create_email_addresses.down.sql
0003_create_email_addresses.up.sql
0004_alter_users_add_column_primary_email_fk.down.sql
0004_alter_users_add_column_primary_email_fk.up.sql
0005_create_organizations.down.sql
0005_create_organizations.up.sql
0006_create_organization_members.down.sql
0006_create_organization_members.up.sql
Each contains some snippet of SQL to execute, like this:
CREATE TABLE users (
id varchar(36) primary key,
name varchar(32) collate utf8mb4_0900_ai_ci not null unique,
display_name varchar(255) collate utf8mb4_0900_ai_ci,
url varchar(2048),
company varchar(255) collate utf8mb4_0900_ai_ci,
location varchar(255),
updated_at timestamp(6),
created_at timestamp(6)
);
These files are source controlled, which means you get all the benefits of source control when managing them, including working on branches in isolation, doing merges, diffing with past revisions, etc. Beyond the open source libraries, lots of commercial software solutions have launched to help manage these migrations, like Liquibase or Redgate Flyway.
But schema migrations are not version control. Yes, they are tools to manage how your database
changes over time. And because they are files managed by source control, you can use tools like git
diff
and git log
to understand those changes. However, with few exceptions, these tools only
manage schema changes, not data changes. They don't provide the ability to, at runtime, examine how
the schema or data have changed between revisions. They don't let you maintain two long-running
branches of your database with different schemas and different subsets of users connnected to them.
Dolt of course supports schema migrations, with or without the assistance of files checked into
source control. But you make schema changes live, on a branch, and then merge that schema change
back into main
after you test it. There are also several more complicated
workflows involving making schema
changes and then slowly rolling them out to different customers. Like Git, Dolt gives you a set of
powerful tools that let you build workflows as complicated as they need to be to match your desired
application logic.
No-merge branches are not version control
"Branches" are having a bit of a moment in the OLTP database world. Supabase has them. Planetscale has them. Neon has them. There are probably several others that we haven't heard about yet, and they all work slightly differently. But they all have one thing in common: the branches you create cannot be merged back to their origin branch when you're done making changes on them.
To take one example, Neon puts Postgres on a copy-on-write file system. This
means when you create a branch from your production database, it's super fast, it just needs to
create a new marker in that file system to denote where the new branch begins. Your new DB instance
backed by that new branch can make new writes and they'll be independent from anything happening on
the production server. Nobody will see those changes on prod, which is what you want in the case of
running tests or lots of other workflows. But it's not what you want in lots of other workflows,
basically any time when you want to experiment with something in isolation but keep your changes
afterward and eventually get them back to main
. If that's what you're after, you're just out of
luck with all these "branching" database products.
In our view, a branch you can't merge back to main isn't a real
branch. It's a fork. You obviously wouldn't accept
that limitation from a source control solution -- imagine making your code changes on a branch in
Git, but you couldn't open a PR at the end. If you really wanted to get those changes back on
main
, you would need to copy them there one by one. Hope you don't make a mistake!
Dolt of course supports merges, of both data and schema. It's as easy as a procedure call.
CALL DOLT_MERGE('feature-branch');
Conclusion
Lots of database products offer some of the benefits of version control. But Dolt and Doltgres are still the only ones in the world that provide the real deal. We think that's pretty great, and we hope you're interested enough to give it a shot. Dolt and Doltgres are free and open source, so why not try it today?
Questions about Dolt, or about version controlled databases? Come by our Discord to talk to our engineering team and meet other Dolt users.