Enable Dolt Archives with Automatic Garbage Collection
Dolt, the world's first SQL Database with Git-style branching and merging, needs to store all revisions of all data in your database. We take advantage of structural sharing thanks to the use of Prolly Trees, but there are further optimizations we can make. One of those options is to simply make storing similar pieces of data more closely with the use of delta encoding, and another is to cull history from your primary database and put it into long term storage.
There is a storage format called Dolt Archives which we've been working on to help with this problem. Last month we discussed the progress on the Dolt Archive format, and today we can announce that we've added the ability to enable Dolt Archives with automatic garbage collection. Let's get into it!
Background
Dolt Archives are a storage format for Dolt databases which use dictionary compression to store data more efficiently. As discussed last month, we get between 25% - 50% better compression than today's default storage format, which we call Table Files. When we announced the Archive format last year, we also announced a command called dolt archive
which can be used to convert your Dolt database's data into the Archive format. The database will continue to work as you expect, but with less footprint on disk.
Since that time, we've been working to make Dolt Archives the default storage format for Dolt 2.0. We've added the ability to push and pull Archives to DoltHub, and we've added the ability to fetch incremental changes from a remote server when Archives are in the mix.
Dolt Archives with Automatic Garbage Collection
Today we are excited to announce that we have added the ability to enable Dolt Archives with automatic garbage collection. Both of these features are currently considered "experimental", and we are looking for feedback from the community. Rest assured we have high confidence in the stability of these features, but there isn't a need to rush to production with them just yet so we are taking it slow.
Enabling Dolt Archives with your dolt sql-server
If you run a dolt sql-server
for your Dolt database, you can now configure Dolt Archives by modifying the config.yaml
file. The config.yaml
file is a YAML file that is used to configure your Dolt database. If you've ever run dolt sql-server
, then you probably noticed that it creates a config.yaml
file in the directory you ran the command from. If you have a version of Dolt which is 1.52.1 or higher, you can enable Dolt Archives by changing your config.yaml
file to include the following:
- # behavior:
+ behavior:
# read_only: false
# autocommit: true
# disable_client_multi_statements: false
# dolt_transaction_commit: false
# event_scheduler: "OFF"
- # auto_gc_behavior:
- # enable: false
- # archive_level: 0
+ auto_gc_behavior:
+ enable: true
+ archive_level: 1
As you can see, we've added a new section called auto_gc_behavior
. This section has two keys, enable
and archive_level
. The enable
key is a boolean that turns on automatic garbage collection. The archive_level
key is an integer that determines the level of archive compression to use when garbage collecting. Currently the only supported levels are 0
and 1
. 0
is the current default which results in the legacy Table File format. 1
will result the new Archive format. In the future we will add more levels to the archive_level
key to support chunk grouping and higher compression levels.
Enabling Dolt Archives with your Hosted Instance
We've also added the ability to enable Dolt Archives with your hosted instance of Dolt. If you have a hosted instance, you can enable Dolt Archives by going to the deployment page of your instance, scroll down, and smash the "Edit" button:
Then click the box next to behavior_auto_gc_behavior_enable
and set it to true
. Doing that will make the behavior_auto_gc_behavior_archive_level
flag appear.
Then you can set the behavior_auto_gc_behavior_archive_level
flag to true (which we interpret as 1
). After enabling both, smash the "Save" button and you're all set. Now your hosted instance will use Dolt Archives with automatic garbage collection!
What's Next?
We are testing these features extensively, and will be making them the default behavior in our next major release: Dolt 2.0. We've already enabled Dolt Archives with automatic garbage collection in our servers, and we will continue improving the system. Adding the ability to compress further with chunk grouping and moving old history into cold storage is on our roadmap. We'll be rolling those out after Dolt 2.0 is released. If you have any feedback, or just want to tell us how much you love Dolt, please let us know on Discord!