Journaling Chunk Store

TECHNICAL
4 min read

Today's blog is announcing the 0.75.0 release of Dolt. With it comes the general release of Dolt's Chunk Journal, a new ACID-compliant persistence layer that is both faster and more reliable than our existing store. Over the last few years, Dolt's roadmap has been focused on creating a production-ready OLTP database. We built SQL transactions, rewrote our query analyzer, and shipped a new storage format.
Our goal is to create a modern replacement for MySQL that matches its performance while empowering users with version-control features. This release marks another step to becoming the best-in-class OLTP database.

Designing a new ChunkStore

Dolt is a unique database. It's the only SQL database capable of schema and data merges. We built Dolt from the ground up to use a storage engine inspired by Git. Dolt speaks the MySQL protocol, but internally it uses a storage index that supports data versioning including diff, merge, and sync.

At the bottom of the storage stack is a persistence layer called a ChunkStore. A ChunkStore is responsible for reading and writing "chunks" of data to and from disk. We inherited our previous ChunkStore implementation from Noms, the open source project that pioneered Prolly Tree indexes. Noms wasn't designed to support an online database service and consequently neither was its persistence layer. Each new transaction against the store required writing a new file in the data directory. In order for Dolt to serve as a low-latency application server, it would need to streamline its interactions with the filesystem.

The chunk journal is a new ChunkStore implementation that continuously appends chunks to a single file. This simplified design supports faster writes, and makes it easier to support ACID transactions. For Dolt users, this means their database will have lower latency and greater reliability to their applications.

By guaranteeing durable writes, Dolt's persistence layer eliminates an entire class of failure modes. When a process makes a filesystem write, the written data is copied into kernel-space memory, but it doesn't immediately get written to disk. As a performance optimization, the OS will delay the physical disk-write until it's most convenient for the system. What differentiates durable writes, finalized with fsync, is that they are synchronously flushed to disk. Consequently, any failures on the host machine or OS failures will not affect the durability of the transaction. So long as the disk can be read, the transaction data is recoverable.

Journal Performance

Just how much faster is the new ChunkStore? Let's compare Dolt's sysbench benchmarking suite before and after the chunk journal landed:

+-------------------------+------------------------------+------------------------------+
|                         |            0.54.3            |            0.75.0            |
| read test               |   MySQL |     DOLT |   ratio |   MySQL |     DOLT |   ratio |
| ----------------------- | ------- | -------- | ------- | ------- | -------- | ------- |
|   covering_index_scan   |    1.93 |     2.66 |    1.4x |    1.93 |     2.66 |    1.4x |
|   groupby_scan          |   12.30 |    16.41 |    1.3x |   12.30 |    16.41 |    1.3x |
|   index_join            |    1.16 |     4.18 |    3.6x |    1.21 |     4.25 |    3.5x |
|   index_join_scan       |    1.12 |     2.07 |    1.8x |    1.16 |     2.07 |    1.8x |
|   index_scan            |   30.26 |    51.94 |    1.7x |   30.81 |    52.89 |    1.7x |
|   oltp_point_select     |    0.15 |     0.48 |    3.2x |    0.15 |     0.48 |    3.2x |
|   oltp_read_only        |    3.02 |     8.43 |    2.8x |    2.97 |     8.58 |    2.9x |
|   select_random_points  |    0.30 |     0.74 |    2.5x |    0.30 |     0.74 |    2.5x |
|   select_random_ranges  |    0.35 |     1.12 |    3.2x |    0.36 |     1.14 |    3.2x |
|   table_scan            |   30.81 |    52.89 |    1.7x |   31.37 |    53.85 |    1.7x |
|   types_table_scan      |   69.29 |   158.63 |    2.3x |   70.55 |   158.63 |    2.2x |
| reads mean multiple     |                         2.3x |                         2.3x |
+-------------------------+------------------------------+------------------------------+
|                         |            0.54.3            |            0.75.0            |
| write test              |   MySQL |     DOLT |   ratio |   MySQL |     DOLT |   ratio |
| ----------------------- | ------- | -------- | ------- | ------- | -------- | ------- |
|   oltp_delete_insert    |    2.61 |    12.30 |    4.7x |    5.67 |     5.88 |    1.0x |
|   oltp_insert           |    1.32 |     2.91 |    2.2x |    2.66 |     2.86 |    1.1x |
|   oltp_read_write       |    5.00 |    17.63 |    3.5x |    6.79 |    15.55 |    2.3x |
|   oltp_update_index     |    1.34 |     6.09 |    4.5x |    2.81 |     3.02 |    1.1x |
|   oltp_update_non_index |    1.34 |     6.67 |    5.0x |    2.76 |     2.86 |    1.0x |
|   oltp_write_only       |    2.14 |     8.90 |    4.2x |    3.89 |     7.43 |    1.9x |
|   types_delete_insert   |    2.76 |    13.22 |    4.8x |    5.18 |     6.67 |    1.3x |
| writes mean multiple    |                         3.7x |                         1.3x |
+-------------------------+------------------------------+------------------------------+
| overall mean multiple   |                         2.9x |                         1.9x |
+-------------------------+------------------------------+------------------------------+

On most write benchmarks, Dolt is now twice as fast. This speedup comes despite calling fsync() to durably persist each transaction commit! Faster and more durable. Further, for the 0.75.0 release we've changed the MySQL configuration we use when doing comparative benchmarking. Back in August, we blogged about how we run our performance benchmarks and how we compare to MySQL. In order to get an "apples-to-apples" comparison, we set MySQL's durability level to match Dolt's by setting innodb_flush_log_at_trx_commit=2. Now that Dolt supports ACID compliant transactions by default, we can revert back to MySQL's default durability setting. The result is that MySQL is comparatively slower in the latest benchmarks, while Dolt is much faster!

Dolt performance relative to MySQL over time

Our overall performance relative to MySQL is now under 2x. Breaking barrier 2x has been a long term goal for us and represents the culmination of years performance work. 90% slower than MySQL might sound underwhelming to some users, but consider that when we first started running Sysbench benchmarks two years ago, our overall multiple was 15x! It's time for us to set a new performance goal: we want to be faster than MySQL. Stay tuned.

Fast-Forward to 0.75.0

You may have noticed our release version jumped forward from 0.54.2 to 0.75.0. This release is our first with ACID transactions on by default. Durably writing transactions to disk was the last major feature between now and our 1.0.0 release. Reaching the 1.0.0 milestone will be a major achievement for Dolt and mark our arrival as a mature database.

We believe version-control is the future of databases. Building on top of Dolt gives you access to branches, time-travel, and cell-level audit-logs out of the box. For these reasons and many more, we're excited to see Dolt become a more performant, reliable database. If you're curious about how Dolt fits into your stack or how you can use version-control in your database, come chat with us on Discord!

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt

Or join our mailing list to get product updates.