So you want Soft Deletes?

REFERENCE
8 min read

This recent article and corresponding Hacker News thread brought the topic of soft deletes to my attention. People have been wondering if soft deletes are a good idea for a long time.

Here at DoltHub, we built a version controlled SQL database called Dolt. In Dolt, every delete is a soft delete! Moreover, every column effectively has a Slowly Changing Dimension column associated with it, giving you an audit trail for every cell in the database. You get both of these without changing your database schema. All you have to do is CALL DOLT_COMMIT() when you want to preserve your current state.

Snuggle Softness

So, given that we've thought a lot about this topic, I thought I would weigh in on the topic of Soft Deletes. What are soft deletes? How do you perform them in standard SQL? What are the pros and cons? What tools exist to make soft deletes easier? And finally, are there new databases, like Dolt, that treat soft deletes as the default?

What are Soft Deletes?

Soft deletes are so popular there is even an entry in Wiktionary.

soft deletion - (database) An operation in which a flag is used to mark data as unusable, without erasing the data itself from the database.

Instead of deleting data from your database, which is a hard to recover from operation, you add metadata indicating the data should be considered deleted. Anywhere where you query the soft deleted data, you now must filter on the "is deleted" metadata. If you want to restore the soft deleted data, you delete the "is deleted" metadata.

Why Soft Deletes?

Pros

The main reason to implement soft deletes is to prevent data loss. Your application isn't physically deleting data, just marking it deleted. Thus, with some work you can recover any soft delete.

If you get a little fancy and implement slowly changing dimension, a form of soft deletion where you store additional metadata about the delete operation, you also get audit capability. For instance, you can see who deleted what and when.

Cons

The main downside of soft deletes is complexity. Your database is now more error prone. For instance, wherever you are querying the data you now must remember to filter the data by something like where is_deleted=1. This may be fine in the application context but it's often overlooked by data analysts working in the analytical context.

Soft deletes also cause issues with foreign keys. If you rely on foreign keys to ensure referential integrity in your database, some rules are invalid with soft deletes.

Lastly, you need more storage. If you don't delete any data your database will be bigger.

How to Soft Delete

There are a couple of ways to perform soft deletes:

  1. Column Method: Add a column to the table you want to add soft deletes to
  2. Table Method: Add a new table where you store the soft deleted data

Example

I am going to use Dolt to show off different soft delete methods. Dolt is a MySQL compatible database so the SQL you see should also work in MySQL. I'm using Dolt through the offline command line interface but you can also use Dolt as a server with any MySQL client.

For this example, I created a single table called employees and populated it with three rows.

$ dolt sql -q "create table employees (id int, last_name varchar(100), first_name varchar(100), primary key(id))"
$ dolt sql -q "insert into employees values (0, 'Sehn', 'Tim'), (1, 'Hendriks', 'Brian'), (2, 'Son', 'Aaron')"
Query OK, 3 rows affected (0.00 sec)
$ dolt add .
$ dolt commit -am "Soft deletion example table"
commit k2fmj43t50eipq9ggkb9hmhcamt1unjg (HEAD -> main) 
Author: timsehn <tim@dolthub.com>
Date:  Wed Nov 02 12:26:46 -0700 2022

        Soft deletion example table

$

I also made a Dolt commit so I could show off Dolt soft deletes later. Stay tuned.

Hard Delete

For completeness, this is an example of a hard delete.

$ dolt sql -q "delete from employees where first_name='Tim'"
Query OK, 1 row affected (0.00 sec)
$ dolt sql -q "select * from employees"
+----+-----------+------------+
| id | last_name | first_name |
+----+-----------+------------+
| 1  | Hendriks  | Brian      |
| 2  | Son       | Aaron      |
+----+-----------+------------+

Praise be! I hated my cofounders anyway.

Column Method

In the column method, you add an is_deleted column and set it to true when a column is deleted. Everywhere else in your application where you query that table, you filter using where is_deleted = 0.

First we'll add a is_deleted column with a default value of 0. Note, tinyint is the best option for booleans in MySQL.

$ dolt sql -q "alter table employees add column is_deleted tinyint default 0"
$ dolt sql -q "select * from employees"                                         
+----+-----------+------------+------------+
| id | last_name | first_name | is_deleted |
+----+-----------+------------+------------+
| 0  | Sehn      | Tim        | 0          |
| 1  | Hendriks  | Brian      | 0          |
| 2  | Son       | Aaron      | 0          |
+----+-----------+------------+------------+

Now when I delete, I don't issue a delete query. I issue an update query instead.

$ dolt sql -q "update employees set is_deleted=1 where first_name='Tim'"
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

Lastly, when I query the employees table, I have to remember to filter on is_deleted.

$ dolt sql -q "select * from employees where is_deleted=0"
+----+-----------+------------+------------+
| id | last_name | first_name | is_deleted |
+----+-----------+------------+------------+
| 1  | Hendriks  | Brian      | 0          |
| 2  | Son       | Aaron      | 0          |
+----+-----------+------------+------------+

If I forget to do that, I get bad data.

$ dolt sql -q "select * from employees"                   
+----+-----------+------------+------------+
| id | last_name | first_name | is_deleted |
+----+-----------+------------+------------+
| 0  | Sehn      | Tim        | 1          |
| 1  | Hendriks  | Brian      | 0          |
| 2  | Son       | Aaron      | 0          |
+----+-----------+------------+------------+

In this trivial example, it is easy to notice the is_deleted column but you can imagine with really wide tables, the is_deleted column would be easy to miss.

Table Method

In the table method, you add a table with the same schema as the table you want to soft delete from. Then you create a trigger to insert the data you delete from the target table into the deleted table.

First let's create the table. For this we'll use CREATE TABLE...LIKE which copies the schema but not the data of the target table.

$ dolt sql -q "create table deleted_employees like employees"                                                                                                 
$ dolt sql -q "describe deleted_employees"                                                                                                              
+------------+--------------+------+-----+---------+-------+
| Field      | Type         | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+-------+
| id         | int          | NO   | PRI | NULL    |       |
| last_name  | varchar(100) | YES  |     | NULL    |       |
| first_name | varchar(100) | YES  |     | NULL    |       |
+------------+--------------+------+-----+---------+-------+

$ dolt sql -q "select * from deleted_employees"                                                                                                              

$

As you can see, the deleted_employees table is empty as intended. Now, we create an ON DELETE trigger that inserts the record into the deleted_employees table whenever the record is deleted from the employees table.

$ dolt sql -q "create trigger soft_delete 
    after delete on employees 
    for each row 
    insert into deleted_employees values (old.id, old.last_name, old.first_name)"
$

Now, when I issue a delete on the employees table, the deleted data is preserved in deleted_employees.

dolt sql -q "delete from employees where first_name='Tim'"                                                               
Query OK, 1 row affected (0.00 sec)
$ dolt sql -q "select * from  employees" 
+----+-----------+------------+
| id | last_name | first_name |
+----+-----------+------------+
| 1  | Hendriks  | Brian      |
| 2  | Son       | Aaron      |
+----+-----------+------------+

$ dolt sql -q "select * from  deleted_employees"            
+----+-----------+------------+
| id | last_name | first_name |
+----+-----------+------------+
| 0  | Sehn      | Tim        |
+----+-----------+------------+

The issue with this approach is when you want to make a schema change to the employees table, say add a column, you must make that schema change in three places: the original employees table, the deleted_employees table, and the trigger.

Tool Options

Lots of developer tools that help with database interactions, have patterns or how tos on soft deletes. Here are some articles about how to soft delete in Jmix, SpringJPA, Bun, Metabase, Lighthouse PHP, Django and Laravel. If you're interested in doing soft deletes, check if your application stack provides tools to help. Soft deletes are a very common pattern.

Immutable Databases

There is a class of databases where every delete is a soft delete: immutable databases. In this article, I surveyed the immutable database space. Today's article focused exclusively on SQL databases. There is only one immutable SQL database: Dolt.

Dolt

Dolt

Tagline
It's Git for Data
Initial Release
August 2019
GitHub
https://github.com/dolthub/dolt

In Dolt, if I've created a Dolt commit, hard deletes are soft deletes behind the scenes. Returning to the above hard delete example, to undo a delete I just made all I need to do is dolt reset --hard.

$ dolt reset --hard
$ dolt sql -q "select * from employees"
+----+-----------+------------+
| id | last_name | first_name |
+----+-----------+------------+
| 0  | Sehn      | Tim        |
| 1  | Hendriks  | Brian      |
| 2  | Son       | Aaron      |
+----+-----------+------------+

Dolt is conceptually modeled after Git. There is all manner of rollback options in a version controlled database just like in Git.

We're biased but we think if you want soft deletes in your SQL database, Dolt is the best choice. You get all the benefits of soft deletes like data loss prevention, disaster recovery, and audit with none of the downsides of the traditional soft delete solutions. Dolt is just better for databases that need soft deletes.

Curious to learn more? Come by our Discord or just go try deleting some data yourself. Dolt is free and open source and we have plenty of open databases on DoltHub for you to try soft deletes on. Clone a copy and try and break one with a delete. We dare you!

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt

Or join our mailing list to get product updates.