DoltDB is a MySQL-compatible database with Git's distributed versioning features. It's the first relational database that versions your data changes AND your schema changes. Today's blog post is a teaser for a new project we're working on... MySQL binlog replication. Replication is an important feature for databases, and Dolt already supports replicating a primary Dolt server to Dolt replicas. This new style of replication enables Dolt servers to replicate directly from a MySQL primary, using MySQL's native binlog replication protocol.
MySQL binlog replication is a powerful feature that has been around in MySQL for a long time and is widely used for replicating writes on a primary server to replica servers. Those replicas are often used to handle read-only traffic to help spread out load and to scale a system. They are also useful for executing long-running analytic queries or backups that you don't want to run on your primary database.
We're excited about this feature because it gives customers an easy way to start getting some of the benefits of versioned data without having to take on a database migration. You can configure a Dolt database as a replica of a MySQL database and automatically keep the data in sync on the replica. MySQL doesn't track data versions, but you can configure Dolt to create Dolt commits at different events or intervals, which gives you a versioned history of the data being replicated. For example, you could have Dolt configured to create a Dolt commit at midnight each night, and then you'll have daily versioned history going forward for of all the data changes from your MySQL database. From there, you can use all the Dolt diff features to diff your data and understand exactly how and when your data changed. This is super useful in a many different scenarios, such as running analysis of data changes, or recovering data that was deleted from your primary server.
DoltHub ❤️ Vitess
First off, I want to give the Vitess.io project a big shoutout. 🙌🏻 vitessio/vitess is an extremely popular Go project for deploying, scaling, and managing large clusters of MySQL databases. In particular, it's got some really cool support for data sharding. We've been relying on a fork of Vitess for several years now, primarily for the SQL parser and for the low-level MySQL wire protocol implementation. Vitess is a fantastic library and it would have been MUCH more challenging for us to build DoltDB without it.
Vitess also provides some interesting replication features built on top of MySQL's binlog protocol. Because the Vitess team built the core support for parsing binlog protocol messages, it was pretty easy for us to build a prototype of a Dolt server that replicates from a MySQL server using the standard MySQL binlog protocol.
So, big thank yous to all the maintainers and contributors of Vitess for all the great work they're doing. We hope we can find some opportunities to contribute fixes or features back upstream in the future! 🙏
Let's jump into setting up a demo of our binlog replication prototype and see it in action. If you want to try this out yourself, you can checkout my demo branch and create a dev build of dolt using the instructions below. Just remember that you will need to use the dev build of dolt that you create (at
~/go/bin/dolt), instead of any production release already on your system!
git clone https://github.com/dolthub/dolt
git checkout fulghum/binlog-replication-demo
Step 1: Configure the MySQL primary
The first thing we need to do is get our MySQL primary set up to replicate. We're going to create a fresh MySQL instance for our testing, but you could also do this with a MySQL server with existing data – you'd just need to make sure you have the same settings used below and that you copy a snapshot of the data to the Dolt replica to ensure the two databases are in sync before you start replicating data changes.
First, we need to initialize a new MySQL data directory.
Then we can start up the MySQL primary on port 54321. Note that Dolt binlog replication prototype requires a few settings on the server, such as ensuring the server is using GTIDs, enforcing GTID consistency, recording row-based replication binlog events, and not sending binlog checksums.
mysqld --datadir=$PWD/mysql_data \
--gtid-mode=ON --enforce-gtid-consistency=ON \
--port=54321 --server-id=11223344 \
The command above will start a new MySQL server and will not return until the server is stopped. (When you're ready to stop the server, from another terminal, run
mysqladmin shutdown --protocol TCP --port 54321 -uroot.)
Now let's log into the MySQL primary server and create a database so that replication events can get started.
mysql --protocol TCP -uroot --port 54321
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 8
Server version: 8.0.31 Homebrew
Copyright (c) 2000, 2022, Oracle and/or its affiliates.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> create database db01;
While we're in that MySQL shell, let's also go ahead and grab the server's unique ID that we'll need in the next step when we configure our Dolt replica.
mysql> select @@server_uuid;
| @@server_uuid |
| df559c62-7cc4-11ed-b66d-04559e1bab08 |
1 row in set (0.00 sec)
Step 2: Configure the Dolt replica
Next we need to start up the Dolt replica. Start by creating a directory with the same name as the database we created in the last step and initializing it.
This is a scrappy prototype, so we haven't implemented the MySQL control statements yet to configure, start, and stop replication; instead, we're passing configuration via environment variables. As we continue building this feature, we'll add support for controlling MySQL replication through the standard commands of course.
NOTE: Remember to use your MySQL server's UUID that we found in the last step; it will NOT be the same as mine!
Next, start up a Dolt SQL server. This prototype will automatically connect to the MySQL primary configured above if we've set our env vars correctly.
dolt sql-server -uroot --port 1122
NOTE: If you're trying this out yourself, remember to use the dolt binary you built, at
~/go/bin/dolt, instead of any other versions of dolt on your system! (I'll admit I forgot that multiple times while testing this setup... 🤦)
Step 3: Create some data and watch it replicate
Okay, now on to the fun stuff! Replication is up and running between our MySQL primary and our Dolt replica, so let's make some changes to our data on the primary and see how our replica stays in sync...
In the video below, I'm logged into the MySQL primary server on the left side of the screen, and I'm logged into my Dolt replica on the right side of the screen. It shows how we can create tables, insert, update, and delete data, and then immediately see the changes synced up on our Dolt replica.
Step 4: Try out some of Dolt's versioning features
The really cool thing about replicating a MySQL database to a Dolt database is that we can create Dolt commits and track versioned history for the replicated data. This prototype currently creates a Dolt commit for every SQL transaction commit, but as we grow this feature, we'll make this more configurable.
Let's start off by looking at the log of commits on our Dolt database. I'm using the command line here because the formatting is slightly easier to read, but all of this same functionality is directly accessible from within a SQL shell, too.
dolt log --oneline
booi6sd88mcrmlu7nk9gmr3mkc5ig9ou (HEAD -> main) automatic Dolt replication commit
7nerqlpjgronql3vmh5fp5l0t2e46vpl automatic Dolt replication commit
2qllq6qvpqbbvfj95o52a5qbdirro1sb automatic Dolt replication commit
2a7b5fhr5kv8bbkcaqmpbv14oc7h36gc automatic Dolt replication commit
6bc3bl5onc0gmnmouu83da0hi6it37ds Initialize data repository
Next let's take a look at the diff of the most recent commit.
dolt diff booi6sd88mcrmlu7nk9gmr3mkc5ig9ou^ booi6sd88mcrmlu7nk9gmr3mkc5ig9ou
diff --dolt a/t1 b/t1
--- a/t1 @ lp7pd1ivm1gv51ei5m47v0789e1o2v0i
+++ b/t1 @ lmqp2ipojqr90v266cn4rg5anvjbrg7p
| | pk | color |
| < | 3 | purple |
| > | 3 | PURPLE! |
As another example, let's look at the cumulative diff across all our commits, which also includes the schema change for the table we created.
dolt diff 6bc3bl5onc0gmnmouu83da0hi6it37ds booi6sd88mcrmlu7nk9gmr3mkc5ig9ou
diff --dolt a/t1 b/t1
+CREATE TABLE `t1` (
+ `pk` int NOT NULL,
+ `color` varchar(200),
+ PRIMARY KEY (`pk`)
+) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_bin;
| | pk | color |
| + | 2 | green |
| + | 3 | PURPLE! |
Finally, to show off one more Dolt feature...
As you can see, Dolt makes it super easy to see the exact schema changes and data changes that are going into your database. These features combined with MySQL binlog replication gives you additional visibility into how your data and schema is changing and allows you to easily go back to a specific point in time, compare your data, or even extract deleted data and reinsert it back on the MySQL primary. Having your MySQL primary replicate to a Dolt replica also enables you to incorporate other Dolt features into your workflows, such as easily cloning a copy of the replica and pulling down changes to a local database for development and testing.
We're excited about MySQL binlog replication support. It's a feature we've wanted to build for a while, and as DoltDB has matured, now feels like the right time to add it. It's also a pretty fun feature to build. 😎 With just a few days of hacking, we've got a working, end-to-end prototype, thanks in large part to the awesome MySQL binlog protocol support that the Vitess project provides. 🙌🏻 We've still got a LOT left to build and test before this is ready for wide use though, so we'll be busy grinding away at it and polishing it up.
If you're interested in this work, either as a Dolt customer or if you're just into the internal of databases and MySQL, come on over to the DoltHub Discord server and chat with us!