Dolt Diff Magic: Part 2 – Viewing data diffs

SQL
9 min read

Welcome back to the second post in this blog series exploring the magic that makes Dolt so unique and powerful. Dolt DB is a MySQL-compliant SQL database that also provides the distributed versioning features of Git. The result is a powerful database that lets you fork, merge, branch, push, pull, and deeply inspect the history of your data.

In the first post in this series, we explored how the dolt_diff system table enables you to see what tables have changed, when they changed, and who changed them.
In this post we're going to see a new set of Dolt's magic tricks and explore some of the features in Dolt that show you exactly how your data has changed. We'll see how to view logs of all the changes to the rows in your tables and explore some interesting ways we can use that data to create useful queries. The third post in this series shows how to diff the full contents of a table at any two revisions, even across branches.

Dolt Magic

Diving Deeper into Dolt's Diff Magic

Let’s dive deeper and look at the changes to the actual data in our tables. Dolt provides a few ways to look at diffs of your data. In this post we're going to be focused on the dolt_diff_$tablename system table. This system table exists for every user table in a Dolt database and gives us a list of all the individual changes that commits have applied to a table. The structure of the table consists of our columns, prepended with to_ and from_, as well as to_commit and from_commit columns, to show how the values in our rows have changed at each commit. Looking at this table gives us a really nice way to see how our data has changed over the course of the current branch's commit history – it's a view of how each individual Dolt commit has mutated our data.

The following query shows all the changes on the main branch that we’ve made to a table called simpletable in the DoltDiffMagic database. This is the same database we'll be using for all the examples in this post, so go ahead and dolt clone it so you can try out the queries below, too.

select * from dolt_diff_simpletable;
+-------+------+----------------------------------+-----------------------------------+---------+--------+----------------------------------+-----------------------------------+-----------+
| to_pk | to_c | to_commit                        | to_commit_date                    | from_pk | from_c | from_commit                      | from_commit_date                  | diff_type |
+-------+------+----------------------------------+-----------------------------------+---------+--------+----------------------------------+-----------------------------------+-----------+
| 1     | 10   | 4fevgeq4t6r55d9epvn28a3rj3csdsfv | 2022-03-31 14:56:19.978 +0000 UTC | 1       | 5      | 4koq08rg8iuuefk7ssckoo7m56gvr2rj | 2022-03-31 14:56:07.131 +0000 UTC | modified  |
| 2     | 200  | 4koq08rg8iuuefk7ssckoo7m56gvr2rj | 2022-03-31 14:56:07.131 +0000 UTC | NULL    | NULL   | taf8r3gh7rv3ei0u6f7jio326llgtb61 | 2022-03-31 14:55:50.706 +0000 UTC | added     |
| 1     | 5    | taf8r3gh7rv3ei0u6f7jio326llgtb61 | 2022-03-31 14:55:50.706 +0000 UTC | 1       | 4      | 19pba1s8h7esi2rqrspvarfg61ietsfc | 2022-03-31 14:55:44.64 +0000 UTC  | modified  |
| 1     | 4    | 19pba1s8h7esi2rqrspvarfg61ietsfc | 2022-03-31 14:55:44.64 +0000 UTC  | 1       | 3      | o70kqfvhfr7b1pntp1t8cnn69o9r353j | 2022-03-31 14:55:37.062 +0000 UTC | modified  |
| 1     | 3    | o70kqfvhfr7b1pntp1t8cnn69o9r353j | 2022-03-31 14:55:37.062 +0000 UTC | 1       | 2      | f7gjqm5foil1436kk53iv6e47pc3g4j4 | 2022-03-31 14:55:31.51 +0000 UTC  | modified  |
| 1     | 2    | f7gjqm5foil1436kk53iv6e47pc3g4j4 | 2022-03-31 14:55:31.51 +0000 UTC  | 1       | 1      | gkorppmpemtnh91jrhn46ptcqphm2iaj | 2022-03-31 14:55:12.365 +0000 UTC | modified  |
| 1     | 1    | gkorppmpemtnh91jrhn46ptcqphm2iaj | 2022-03-31 14:55:12.365 +0000 UTC | NULL    | NULL   | u8cvv5pm9elftq9f5c7juovcckq7nlqb | 2022-03-31 14:54:47.729 +0000 UTC | added     |
+-------+------+----------------------------------+-----------------------------------+---------+--------+----------------------------------+-----------------------------------+-----------+

There are a few important things to notice about the returned data. In addition to the to_ and from_ columns that show the exact ways the data in a row changed, there is also a diff_type column that tells us whether the row was added, modified, or removed. Notice also that we edited the same row in multiple Dolt commits, so we see each individual delta reported in the results, not the cumulative diff of all changes across multiple Dolt commits. This makes the dolt_diff_$tablename system tables particularly good for seeing an audit log of how your data has changed over time at each individual Dolt commit.

Diffs in Action

Let’s put our new knowledge of the dolt_diff_$tablename system tables to use! In the next sections, we'll work through practical examples of using Dolt's diff metadata. Each example uses the inventory table from the DoltDiffMagic database.

Example 1: Find when each item was first introduced

Let's start reporting when each item in our inventory was originally introduced in our shop. For every row in our inventory table, how can we report when it was first added to the table?

In a non-versioned database, you would do this by explicitly adding this data to your table and including a timestamp when you insert new items. If you realized you needed this after you started tracking your data, then you'd have to find a way to backfill all the old rows, or make your application aware that not all rows will have this data available, test different scenarios, etc. This is all totally doable of course, but why not let Dolt do the lifting for you here and take advantage of Dolt's built-in history and versioning support to keep your data and your application a little bit simpler?

We can get Dolt to provide this information for us if we take our inventory table and join it on the dolt_diff_inventory system table and apply a few filters:

select inventory.*, date_format(dolt_diff_inventory.to_commit_date, '%Y-%m-%d') as first_created 
  from inventory, dolt_diff_inventory 
 where inventory.item_id=dolt_diff_inventory.to_item_id 
   and dolt_diff_inventory.diff_type='added' 
 order by first_created asc;
+---------+------------+-------+-------+---------------+
| item_id | name       | price | count | first_created |
+---------+------------+-------+-------+---------------+
| 6       | puppy chow | 15    | 8     | 2022-03-18    |
| 2       | nintendo   | 100   | 9     | 2022-03-18    |
| 3       | blender    | 15    | 99    | 2022-03-18    |
| 4       | camera     | 50    | 6     | 2022-03-18    |
| 5       | walkman    | 10    | 11    | 2022-03-18    |
| 7       | gum        | 1     | 12    | 2022-03-31    |
+---------+------------+-------+-------+---------------+

Example 2: Find items on sale

How can we identify all of the items in our inventory that have had their price reduced and see the commit message from the price change?

We'll join our inventory table with the dolt_diff_inventory system table again and this time we'll filter for price drops that match the current item price. We'll also join on the dolt_commits system table so that we can pull in the commit message.

select inventory.*, dolt_diff_inventory.from_price, dolt_commits.message 
 from inventory, dolt_diff_inventory, dolt_commits 
 where dolt_commits.commit_hash=dolt_diff_inventory.to_commit 
   and inventory.item_id=dolt_diff_inventory.to_item_id 
   and to_price < from_price 
   and to_price = inventory.price
   and from_price is not null;
+---------+---------+-------+-------+------------+--------------------------+
| item_id | name    | price | count | from_price | message                  |
+---------+---------+-------+-------+------------+--------------------------+
| 3       | blender | 15    | 99    | 20         | putting blenders on sale |
+---------+---------+-------+-------+------------+--------------------------+

Example 3: Find removed items

How can we find all the items we’ve removed from our inventory?

The query below uses the dolt_diff_inventory system table to see everything that's been removed. We could further filter this to removals in specific time periods by adding an additional filter. The results show us that we’ve only pulled one item from the shelves. Seems like customers didn't appreciate those extra chunks in our chicken noodle soup? 🍜 🤷

select from_item_id item_id, from_name name, message reason 
  from dolt_diff_inventory, dolt_commits 
 where dolt_commits.commit_hash=dolt_diff_inventory.to_commit
   and diff_type='removed'; 
+---------+------+------------------------------------------------------------------------+
| item_id | name | reason                                                                 |
+---------+------+------------------------------------------------------------------------+
| 1       | soup | removing chunky chicken noodle soup after multiple customer complaints |
+---------+------+------------------------------------------------------------------------+

Example 4: Update from historical data

Now that the big spring blender sale is over, how can we use Dolt's versioned history to reset the price of blenders back to the price right before we put it on sale?

The query below is called an update join query. Update joins with the dolt_diff_$tablename system tables are a powerful way to do partial data rollbacks with Dolt. If you're curious about how update joins are implemented in Dolt, you can read more about how we implemented support for update joins in Dolt.

update inventory 
  join dolt_diff_inventory on dolt_diff_inventory.to_name=inventory.name 
   set price=from_price
 where from_price > to_price;
Query OK, 1 row affected
Rows matched: 1  Changed: 1  Warnings: 0

It looks like the query above executed correctly and updated one row, but how can we see what actually changed?

We can use the dolt_diff_inventory table for this, too. Working set changes are tracked in the dolt_diff_$tablename system tables, too, but since they don't have a real commit hash yet, we can use the special value WORKING to see them. If we query dolt_diff_inventory and filter on to_commit='WORKING', we can see exactly how the query above changed our data. Sure enough, the blender price was changed from 15 back to 20, just like we wanted.

select * from dolt_diff_inventory where to_commit='WORKING';
+---------+----------+------------+----------+-----------+----------------+-----------+------------+--------------+------------+----------------------------------+-----------------------------------+-----------+
| to_name | to_price | to_item_id | to_count | to_commit | to_commit_date | from_name | from_price | from_item_id | from_count | from_commit                      | from_commit_date                  | diff_type |
+---------+----------+------------+----------+-----------+----------------+-----------+------------+--------------+------------+----------------------------------+-----------------------------------+-----------+
| blender | 20       | 3          | 99       | WORKING   | NULL           | blender   | 15         | 3            | 99         | d4q5m9ks9f8b4eq1rjskicnsaiq343cn | 2022-03-31 16:02:54.613 +0000 UTC | modified  |
+---------+----------+------------+----------+-----------+----------------+-----------+------------+--------------+------------+----------------------------------+-----------------------------------+-----------+

When we're satisfied with our changes, we can use the dolt_commit() SQL function to write the changes to Dolt's commit history.

select dolt_commit('-am', 'reverting blenders to their pre-sale price');

Wrap up

That wraps up our tour of the dolt_diff_$tablename system table! We saw how the dolt_diff_$tablename system table shows you a log of all the changes to the rows in your tables, for each individual Dolt commit, and we saw a few ways you can use that historical data in your applications. Dolt has lots more tricks up its sleeves and we'll keep exploring more in future posts in this series.

How could you leverage Dolt's versioning, history, and diff features in your applications? Got more questions on what else you can do with Dolt or how any of these features work? Come join us on Discord and let us know! We’d love to hear from you and see how we can help you and your team use Dolt!

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt

Or join our mailing list to get product updates.