Introducing the `dolt_branch_activity` System Table

FEATURE RELEASE
5 min read

Here at DoltHub, we love hearing from our users. Building the world’s first SQL server with version control requires us to stay close to the people who use the product to ensure we don’t build in a bubble.

A few weeks ago, I wrote a blog post about using branch-based connection caches, and I suggested that there was a missing feature. Specifically, administrators of Dolt servers had no way to determine how frequently a branch was used. I put that out there in the world, and one of our attentive users pressed us for a follow-up:

Follow Up

Today, with release 1.76.3 of Dolt, we’ve closed the gap!

The Problem

If you run a dolt sql-server, how do you determine which branches are in use and which have been abandoned? There are a variety of ways that a branch can be used without ever updating it, and writes can occur on a branch without making a commit. You can even update a branch without having it checked out! For example, it’s possible using the AS OF syntax to read a branch in a query without checking it out:

SELECT
  main.id,
  main.name,
  main.score AS main_score,
  feature.score AS feature_score
FROM users AS OF 'main' main
JOIN users AS OF 'feature' feature USING (id)
WHERE main.score != feature.score;

This query compares the states of two branches, main and feature - what they represent doesn’t really matter. What matters is that the feature branch is being used even though no user connected to the server has checked it out.

There is also the concern that server processes live for a specific time period. If your server has just started and no one has connected to it, it looks like no branches are ever used. This could be solved by storing persistent state outside of the server, which leads to a whole host of headaches. We decided that complexity is merited for this feature.

So where did we land?

The Solution

Introducing the dolt_branch_activity system table:

db5/main> describe dolt_branch_activity;
+-------------------+----------+------+-----+---------+-------+
| Field             | Type     | Null | Key | Default | Extra |
+-------------------+----------+------+-----+---------+-------+
| branch            | text     | NO   | PRI | NULL    |       |
| last_read         | datetime | YES  |     | NULL    |       |
| last_write        | datetime | YES  |     | NULL    |       |
| active_sessions   | int      | NO   |     | NULL    |       |
| system_start_time | datetime | NO   |     | NULL    |       |
+-------------------+----------+------+-----+---------+-------+

This system table can be used to determine how recently a branch has been used in the life of the running dolt sql-server.

The meaning of each column:

  1. branch: The name of the branch in question. All current branches in the server will be listed, even if they have no usage.
  2. last_read: A timestamp indicating the last time the branch was used, either in a query or as a connection directly to the branch in question. A NULL value indicates that the branch has not been read during the life of this server.
  3. last_write: The timestamp for the last time a branch state was updated or committed to. A NULL value indicates that the branch has not been read during the life of this server.
  4. active_sessions: The number of users who have the branch checked out at the moment.
  5. system_start_time: The time the server started. This value will be the same in every row, and all last_read and last_write values will be after this time.

These values are all ephemeral and exist for the lifetime of the server. Given that most organizations run their servers for months, this table should be sufficient to reason about which current branches have been used recently.

Examples

The branch deletion policy which makes sense for each server will depend on the application, but here are some examples of how this could be used.

Find all branches which have never been used, and the server has been up for more than a week:

SELECT *
FROM dolt_branch_activity
WHERE active_sessions = 0
  AND system_start_time < NOW() - INTERVAL 7 DAY
  AND last_read IS NULL
  AND last_write IS NULL;

Requiring that last_read IS NULL AND last_write IS NULL is fairly restrictive though. If your server has been up for 3 months, there will probably be plenty of branches which have those timestamps. To refine the approach, ensure those timestamps are more than 7 days old:

SELECT *
FROM dolt_branch_activity
WHERE active_sessions = 0
  AND system_start_time < NOW() - INTERVAL 7 DAY
  AND (last_read  IS NULL OR last_read  < NOW() - INTERVAL 7 DAY)
  AND (last_write IS NULL OR last_write < NOW() - INTERVAL 7 DAY);

We have another function called dolt_branch_status() which tells you how many commits are ahead or behind a given branch. For example, we have two feature branches, one of which, feature, is in sync with main, and the other, feature2, is not:

mydb/main> select * from dolt_branch_status('main','feature','feature2');
+----------+---------------+----------------+
| branch   | commits_ahead | commits_behind |
+----------+---------------+----------------+
| feature  | 0             | 0              |
| feature2 | 1             | 3              |
+----------+---------------+----------------+

By constructing a query which gets the dolt_branch_status of every branch in relation to main, we can find the union of all branches which have no active connections and have been merged into main:

SELECT GROUP_CONCAT(
         CONCAT('SELECT * FROM DOLT_BRANCH_STATUS(''main'',', QUOTE(branch), ')')
         ORDER BY branch SEPARATOR ' UNION ALL '
       )
INTO @union_sql
FROM dolt_branch_activity
WHERE active_sessions = 0 -- No active sessions.
  AND branch <> 'main';   -- Skip main (don't compare to self)

SET @sql = IF(
  @union_sql IS NULL,
  'SELECT * FROM (SELECT 1 AS noop) t WHERE 0',
  CONCAT('SELECT branch FROM (', @union_sql, ') s WHERE commits_ahead = 0') -- Are not ahead of main
);

PREPARE stmt FROM @sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;

Pulling it All Together!

SQL is a fairly powerful language, so you can come up with the policy which makes sense for your situation. Once you do, you can use the branches returned to build a series of dolt_branch('-D', {branch}) operations to delete your branches:

SELECT IFNULL(
         GROUP_CONCAT(
           CONCAT('CALL dolt_branch(''-D'', ', QUOTE(branch), ')')
           ORDER BY branch SEPARATOR '; '
         ),
         'SELECT 1 FROM DUAL WHERE 0'  -- no-op if nothing matches
       )
INTO @sql
FROM dolt_branch_activity
WHERE active_sessions = 0
  AND system_start_time < NOW() - INTERVAL 7 DAY
  AND (last_read  IS NULL OR last_read  < NOW() - INTERVAL 7 DAY)
  AND (last_write IS NULL OR last_write < NOW() - INTERVAL 7 DAY)
  AND branch <> 'main';                -- never delete main

PREPARE stmt FROM @sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;

And finally, if you’d like to automatically deleting stale branches on a schedule, you can set up a scheduled job to delete them. Exercise left for the reader!

Conclusion

dolt_branch_activity gives you a little more insight into how your users are working on your server. It’s information you didn’t previously have to make administrative decisions. I suppose I’ll need to create a Part II for the demo app to show how you can use this table to help manage connection caches. Let’s see if anyone holds me accountable to write that!

Let us know how you are going to use this, or better yet, what gaps you see in Dolt that we could address. We’re always happy to chat with you on our Discord server!

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt

Or join our mailing list to get product updates.