Dolt to MySQL Replication

FEATURE RELEASE
9 min read

Introduction

Dolt DB is the world's first, and only, version-controlled SQL database. Dolt is a drop-in replacement for MySQL, so any tools or applications that work with MySQL will work with Dolt (if they don't... let us know so we can fix it!). Dolt uses its own, custom storage engine to efficiently store data, and reimplements MySQL's features, without including any MySQL source code.

One of the most powerful features that MySQL provides is the ability to replicate between databases. Dolt already provides a few options for Dolt-to-Dolt replication, but to replicate between Dolt and a MySQL database, we needed to implement MySQL's binlog replication protocol. We launched MySQL to Dolt replication last year, and this year, I've been working on support for Dolt-to-MySQL binlog replication. By the way, if you're curious about how MySQL's binlog replication protocol actually works over the wire, check out our deep dive on the topic.

There's still a bit more work left to wrap up support for binlog replication with a Dolt primary (e.g. replaying binlog events from when a replica was disconnected), but the current support as of Dolt (1.41.3) allows you to connect a MySQL replica to a Dolt server and replicate data changes, so long as the replica stays connected to the Dolt primary.

In this blog post we're going to take an early look at Dolt-to-MySQL replication, talk about why our customers have been asking for this feature, and show a simple demo of it in action.

Why replicate between Dolt and MySQL?

Dolt's initial support for MySQL's replication protocol allowed customers to replicate a MySQL database to a Dolt replica. This allowed customers to easily add a Dolt database into an existing MySQL-based system, and get a taste for some of the benefits of version controlling their data. For example, you could schedule automatic Dolt commits in the Dolt replica every hour or every day, and easily get a versioned history of your data, without having to replace your current database. This is a great way to try out Dolt with your own data, however, to really get the full benefits of Dolt (e.g. branching, merging, diffing, resetting, reverting, rebasing), you need to run Dolt as your primary database.

So, what does the new support for Dolt-to-MySQL replication enable? The two primary use cases we've been hearing our customers ask for are:

  1. Change Control Data Capture – many organizations have a requirement to record data changes to a central data platform. One great way to do this is to use a Change Data Capture tool, like Debezium, to connect to a MySQL server via the binlog replication protocol, and feed data change events back to the central data platform. We'll be diving deeper into how this use case works and taking a closer look at how Dolt works with Debezium in a future blog post.
  2. Read-only Query Host – some teams with very strict performance requirements want the data versioning features of Dolt, but the query processing speed of MySQL. Dolt's performance has improved dramatically over the past year or two, and is currently about 1.8x the latency of MySQL for a standard suite of sysbench tests. There are some cases where Dolt performs better than MySQL, but generally, the current overhead is about 1.8x. This may sound like a big difference, but keep in mind that these latencies are on the order of microseconds, so they are typically dwarfed by milliseconds of network latency anyway. For the vast majority of customers, the actual difference is not noticeable. That said, there are some customers who really do need to squeeze out every microsecond they can. In those cases, it can make sense to have Dolt manage and version your data, while replicating changes from a branch to a MySQL replica where read-only queries can be performed.

Demo

Setting up Dolt-to-MySQL replication is easy, especially if you've ever set up MySQL-to-MySQL replication before. We'll keep it simple in this example and skip backing up existing data to warm up the replica and just work with new data. We'll start up a Dolt primary server and a MySQL replica server, then test out replication, including seeing how Dolt's version control features work with replication.

Start the Dolt Primary Server

To enable binlog replication, we need to set a couple of system variables. Note that these system variables MUST be configured when the Dolt server starts up, in order for replication to be enabled. It's not enough to start up the Dolt server, then set the variables. You can start the Dolt sql-server first, persist the values, and then restart the Dolt sql-server, but in this example, we're going to initialize a database, persist the system variables, and then start the Dolt sql-server.

# Create a new directory for the Dolt database
mkdir doltPrimary && cd doltPrimary

# Initialize the new directory as a Dolt database
dolt init --fun

# Persist system variables to enable replication
dolt sql -q "SET @@PERSIST.log_bin=1;"
dolt sql -q "SET @@PERSIST.gtid_mode=ON;"
dolt sql -q "SET @@PERSIST.enforce_gtid_consistency=ON;"

# Start up a Dolt SQL server on port 11229
dolt sql-server --loglevel DEBUG --port 11229

In the log output for dolt sql-server you should see these lines, indicating that Dolt is correctly configured to replicate changes on the main branch:

DEBU[0000] Enabling binary logging
DEBU[0000] Setting binary logging branch to main

Start the MySQL Replica Server

Next we'll start up our MySQL replica. We're creating a fresh MySQL setup here, but you can use an existing install, too. Whichever way you choose, make sure these three system variables get set: gtid_mode=ON, enforce_gtid_consistency=ON, and server_id=11223344 (or any ID higher than 1, so that the primary and the replica have unique IDs).

# Create a new directory for the MySQL database
mkdir -p mysqlReplica/mysql_data && cd mysqlReplica
chmod -R 0777 . 

# Initialize the new MySQL data directory
mysqld --no-defaults --user=mysql \
	--initialize-insecure \
	--datadir=./mysql_data \
	--default-authentication-plugin=mysql_native_password

# Start up the MySQL server on port 11230
mysqld --no-defaults \
    --datadir=./mysql_data \
    --gtid-mode=ON \
    --skip-replica-start=ON \
    --enforce-gtid-consistency=ON \
    --port=11230 \
    --server-id=11223344 \
    --socket=mysql-11230.sock \
    --general_log_file=./general_log \
    --log-bin=./log_bin \
    --slow_query_log_file=./slow_query_log \
    --log-error=./log_error \
    --pid-file=pid-11230.pid

That last mysqld command starts up the MySQL server. The command will remain running in the shell, just like the dolt sql-server command, but unlike dolt sql-server, you can't kill it with CTRL+C. When you're ready to stop the MySQL server, you can run mysqladmin --protocol TCP --port 11230 -uroot shutdown to terminate it.

Start Replication

Now that we've got our primary and replica servers running, let's hop onto the MySQL replica to configure and start up replication.

-- Open up a SQL shell to the MySQL replica
mysql -uroot --protocol TCP --port 11230

-- From inside the SQL shell on the MySQL replica, configure the replication source
CHANGE REPLICATION SOURCE TO SOURCE_HOST='localhost', SOURCE_PORT=11229, SOURCE_USER='root', SOURCE_AUTO_POSITION=1;

-- Start up replication!
start replica;

-- Check replication status (use \G for vertical output)
show replica status \G

If replication connected successfully, you should see output from SHOW REPLICA STATUS similar to:

             Replica_IO_State: Waiting for source to send event
                  Source_Host: localhost
                  Source_User: root
                  Source_Port: 11229
                Connect_Retry: 60
              Source_Log_File: binlog-main.000001
		              ...

Test Replication

Now that replication is running, any changes made on our Dolt primary will be replicated over on our MySQL replica. Let's try a simple change first to sanity check that replication is working. Our initial database was created on the primary before binary logging was enabled, so we'll create a new database, a table in it, and a single row.

In a new terminal, open up a SQL shell to our Dolt primary server, using the mysql command:

-- Open up a SQL shell to the Dolt primary server
mysql -uroot --protocol TCP --port 11229 doltPrimary

-- Create a new database and a test table
CREATE DATABASE db01;
USE db01;
CREATE TABLE test1 (pk int primary key, name varchar(200));
INSERT INTO test1 VALUES (1, "Doug Fir");

Hop back over to your terminal with the open SQL shell to your MySQL replica, and let's see if our data was replicated.

-- We should see our new db01 database
show databases;
+--------------------+
| Database           |
+--------------------+
| db01               |
| information_schema |
| mysql              |
| performance_schema |
| sys                |
+--------------------+

-- Look for the new test1 table in the db01 database
USE db01;
SHOW TABLES;
+----------------+
| Tables_in_db01 |
+----------------+
| test1          |
+----------------+

-- The table data should be replicated, too
SELECT * FROM test1;
+----+----------+
| pk | name     |
+----+----------+
|  1 | Doug Fir |
+----+----------+

It worked! The new database, table, and row we created on the Dolt primary was correctly replicated to our MySQL replica.

Test Replication with Dolt Versioning Features

Let's jump back into the SQL shell for our Dolt primary and check out what happens when we use some of Dolt's version control features to change the contents of our table.

-- Make a Dolt commit with our new table and its one row
CALL dolt_commit('-Am', 'Adding new table: test1');

-- Checkout a new branch called "dev1", to work on some data changes
CALL dolt_checkout('-b', 'dev1');
    
-- Make some data changes on the dev1 branch
UPDATE test1 SET name="White Pine";
INSERT INTO test1 VALUES (2, "Western Cedar");

-- Here's what our data looks like on dev1
select * from test1;
+----+---------------+
| pk | name          |
+----+---------------+
|  1 | White Pine    |
|  2 | Western Cedar |
+----+---------------+

-- Commit these changes on dev1
CALL dolt_commit('-Am', 'Editing test1'); 

Because MySQL doesn't have any concept of branches, Dolt-to-MySQL replication can only replicate a single branch from the Dolt primary server. By default, this is the main branch, but this is configurable through the @@log_bin_branch system variable (Note that this system variable, like the other replication system variables, must be set before the server starts up). Since we made our data changes on the dev1 branch, we shouldn't see any changes on the replica yet. Let's test that out by querying our table on the replica.

-- Back in the MySQL replica SQL shell, check the contents of test1
SELECT * FROM test1;
+----+----------+
| pk | name     |
+----+----------+
|  1 | Doug Fir |
+----+----------+

Okay, that looks like what we expected. Let's test what happens when we merge the changes from the dev1 branch back to the main branch on our Dolt primary.

-- Back in the SQL shell connected to the Dolt primary,
-- check out the main branch and merge the dev1 branch into it
CALL dolt_checkout('main');
CALL dolt_merge('dev1');

-- Here's what's in test1 on main now
SELECT * FROM test1;
+----+---------------+
| pk | name          |
+----+---------------+
|  1 | White Pine    |
|  2 | Western Cedar |
+----+---------------+

Now let's hop back over to the SQL shell for the MySQL Replica and see what data it has.

SELECT * FROM test1;
+----+---------------+
| pk | name          |
+----+---------------+
|  1 | White Pine    |
|  2 | Western Cedar |
+----+---------------+

And sure enough, the data we merged over in our Dolt primary server has been correctly replicated to our MySQL replica. We can use other Dolt versioning features to change our data and the results will replicate correctly to our MySQL replica. Let's try one more example.

-- Back in the SQL shell for our Dolt Primary server...
-- Check out the commit log to see what commits are in our history so far 
SELECT * FROM dolt_log;
+----------------------------------+---------------+-------------------+---------------------+----------------------------+
| commit_hash                      | committer     | email             | date                | message                    |
+----------------------------------+---------------+-------------------+---------------------+----------------------------+
| vver9vgln4rn1qra310pu9re3f1kplvi | root          | root@%            | 2024-06-28 23:27:33 | Editing test1              |
| t3pk8btn9iiebqav9p75ub5k9iq3p6jv | root          | root@%            | 2024-06-28 23:26:46 | Adding new table: test1    |
| 5j02m23439t8aa92iqg73oi3ffj6k55t | Jason Fulghum | jason@dolthub.com | 2024-06-28 23:25:14 | Initialize data repository |
+----------------------------------+---------------+-------------------+---------------------+----------------------------+

-- Let's reset back to the previous commit, the one where we had just added the test1
-- table and it just had one row in it. 

-- Use dolt_reset() to make our main branch point at an older commit
-- We'll use ancestor syntax to denote the commit right before our current HEAD commit
CALL dolt_reset('--hard', 'HEAD~');

-- Here's what's in our table now
SELECT * from test1;
+----+----------+
| pk | name     |
+----+----------+
|  1 | Doug Fir |
+----+----------+

Now let's go back and look at our MySQL Replica and see what data it's showing.

-- Back in the SQL shell for our MySQL Replica server...
SELECT * FROM test;
+----+----------+
| pk | name     |
+----+----------+
|  1 | Doug Fir |
+----+----------+

As expected, when we reset the main branch on our Dolt primary to a different commit, the primary updated the replica to match the current data on the primary's main branch.

Conclusion

Replication between MySQL and Dolt databases opens up some cool new use cases for customers. Adding a Dolt replica to an existing MySQL system allows you to start testing out Dolt's versioning features with your own data, without having to migrate your entire system over to Dolt. We're wrapping up support for Dolt-to-MySQL replication now and the latest releases of Dolt are ready for you to experiment with this new feature. Stay tuned for more updates and examples of using Dolt-to-MySQL replication in future blog posts. As always, we hope you'll join us on Discord to talk to our engineering team and other Dolt users.

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt

Or join our mailing list to get product updates.