Working with JSON in Dolt

REFERENCE
6 min read

Document databases like MongoDB rose to prominence because of their ability to store data in a hierarchical format like Javascript Object Notation, aka JSON. Some applications just make more sense when you persist the JSON your application is using in the database. JSON is a lot more flexible. You can change the schema without making a schema change to your database.

Relational SQL databases like MySQL responded to customer demand for storing document style data with the JSON column type and a bunch of associated functions for working with JSON. In theory, JSON-typed columns and associated functions provided all the benefits of a relational database while adding the flexibility of document style data in columns where you wanted it.

Here at Dolt, we aspire to be 100% MySQL compatible. We also support the JSON column type and a subset of the associated JSON functions. If you need a function we don't support yet, please submit a GitHub issue. So, you can use Dolt to store and version JSON objects.

A few people showed up recently on our Discord and asked how to use Dolt to store JSON objects. This blog article dives into that question.

JSON Example

Do you really want JSON?

The first thing you have to decide is whether you actually want JSON in your database. Alternatively, you model the data stored in your JSON relationally (ie. in multiple related tables). In your application, you disassemble the data into tables on write and reassemble the data into JSON on read. There are a bunch of applications that help with this like GraphQL.

This sounds like a lot of work and it is. Relationally structured data has a lot of built in type and shape constraints that can enhance data quality.

The trade offs are akin to the choice between strongly typed languages like Go or Java versus weakly typed languages like Python or Javascript. In weakly typed languages, you can get started quickly because you don't have to worry about types. As the code gets big or more people work on it, there are no type constraints so it's easier to make mistakes.

A similar trade off is made when choosing document style databases over relational databases. In document databases, you get started easily but you have no schema so the stored JSON document can contain anything. Thus, your database users, including applications, need to be prepared for anything. As the database gets bigger, the permutations of JSON stored can get out of hand. Relational databases have strong schema: tables, types, keys and constraints to mitigate this problem. But changing database schema is often a pain.

Moreover, in Dolt, the versioning primitive is tables. Currently, if you have a 1,000 value JSON object and you change one value, the diff in Dolt will be the entire object, not the single value. In storage, we do a little better than this. We chunk the JSON object up so each version on disk may share some chunks. Obviously, if there is demand for better JSON diff capabilities, we can build them but currently Dolt handles relational data better than JSON data.

I really want JSON!

OK. We'll stop trying to talk you out of it. There are valid use cases for JSON columns in relational databases.

If you use JSON in a relational database like Dolt, you can kind of have the best of both worlds, you can mix relational data with JSON data. Descriptive data is stored in columns and complicated hierarchical data is stored in a JSON column in the same table.

JSON is often used to flatten one to many relationships into a single table. For instance, you may have a JSON-typed column called categories in a product table. This allows a product to be assigned to 0 to N catagories by having a varying sized JSON object. In traditional SQL, this would require a second two column table with a product id, category structure. Flattening the data into a single table requires a JOIN query.

The way to add JSON to your Dolt database is to make a table containing a column with type JSON. Once the table is created you add JSON data using INSERT queries, modify JSON data using UPDATE queries, and delete JSON data using DELETE queries. When inserting or updating JSON columns, the database even parses the JSON to make sure it is valid. Your JSON column is functionally identical to columns of other types in a relational database.

If you want to do anything more complicated with your JSON objects, you use a built in function. For instance you can query a specific field using JSON_EXTRACT using JsonPath syntax.

Just show me.

For this example, I am going to assume you have a newly created database named json_example. If you need help creating a database and getting a MySQL client connected to it, see our Getting Started, database edition.

In the example, I am going to create a books table with standard information like title and author in regular columns and categories as a hierarchical JSON column. There can be zero to many categories. Sub categories are represented as an array of categories keyed by a category.

I pushed the end result database to DoltHub if you want to check it out.

Create a table with a JSON type column

First, we need to create a table with a JSON column. We use a standard CREATE TABLE statement to create tables in Dolt.

mysql> create table books (
    id int, 
    title varchar(100), 
    author varchar(100), 
    categories json, 
    primary key(id)
);
mysql> describe books;
+------------+--------------+------+-----+---------+-------+
| Field      | Type         | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+-------+
| id         | int          | NO   | PRI | NULL    |       |
| title      | varchar(100) | YES  |     | NULL    |       |
| author     | varchar(100) | YES  |     | NULL    |       |
| categories | json         | YES  |     | NULL    |       |
+------------+--------------+------+-----+---------+-------+

Now, we'll make a Dolt commit to persist our work.

mysql> call dolt_commit('-am', 'Created books table with json column');
+----------------------------------+
| hash                             |
+----------------------------------+
| 4tr1t678ff94462n4mhf7plelgma49m8 |
+----------------------------------+

Insert some books

Now we'll insert a couple books, specifying the categories json object.

mysql> insert into books values (
    0, 
    'The Tipping Point: How Little Things Can Make a Big Difference', 
    'Malcolm Gladwell',
    '["Non fiction", {"Business": ["Marketing", "Sales"]}]'
);
mysql> insert into books values (
    1, 
    'The Grapes of Wrath', 
    'John Steinbeck', 
    '["Fiction", "Politics", {"Classics": ["Great American Novel"]}]'
);
mysql> insert into books values (
    2, 
    'The Odyssey', 
    'Homer', 
    '["Fiction", "Poetry", {"Classics": ["Ancient"]}]'
);
mysql> insert into books values (
    3, 
    'The Iliad', 
    'Homer', 
    '["Fiction", "Poetry", {"Classics": ["Ancient"]}]'
);

Let's make sure I did everything correctly and then make a commit.

mysql> select * from books;
+----+----------------------------------------------------------------+------------------+-----------------------------------------------------------------+
| id | title                                                          | author           | categories                                                      |
+----+----------------------------------------------------------------+------------------+-----------------------------------------------------------------+
| 0  | The Tipping Point: How Little Things Can Make a Big Difference | Malcolm Gladwell | ["Non fiction", {"Business": ["Marketing", "Sales"]}]           |
| 1  | The Grapes of Wrath                                            | John Steinbeck   | ["Fiction", "Politics", {"Classics": ["Great American Novel"]}] |
| 2  | The Odyssey                                                    | Homer            | ["Fiction", "Poetry", {"Classics": ["Ancient"]}]                |
| 3  | The Iliad                                                      | Homer            | ["Fiction", "Poetry", {"Classics": ["Ancient"]}]                |
+----+----------------------------------------------------------------+------------------+-----------------------------------------------------------------+
mysql> call dolt_commit('-am', 'Insert 4 books');
+----------------------------------+
| hash                             |
+----------------------------------+
| pknliaea8kt80hniomvcrm3bgq304rbc |
+----------------------------------+

Retrieve the categories

Retrieving the JSON object is now a simple select query.

mysql> select id, categories from books where author='Homer';
+----+--------------------------------------------------+
| id | categories                                       |
+----+--------------------------------------------------+
| 2  | ["Fiction", "Poetry", {"Classics": ["Ancient"]}] |
| 3  | ["Fiction", "Poetry", {"Classics": ["Ancient"]}] |
+----+--------------------------------------------------+

Use JSON_EXTRACT() to group fiction and non-fiction

Let's assume you know that the first dimension in the categories array stored in your JSON object is always "Fiction" or "Non fiction". JSON_EXTRACT() allows allows you to access specific elements in your JSON object using JSON Path syntax.

mysql> select title, author, json_extract(categories,"$[0]") as fict from books order by fict;
+----------------------------------------------------------------+------------------+---------------+
| title                                                          | author           | fict          |
+----------------------------------------------------------------+------------------+---------------+
| The Grapes of Wrath                                            | John Steinbeck   | "Fiction"     |
| The Odyssey                                                    | Homer            | "Fiction"     |
| The Iliad                                                      | Homer            | "Fiction"     |
| The Tipping Point: How Little Things Can Make a Big Difference | Malcolm Gladwell | "Non fiction" |
+----------------------------------------------------------------+------------------+---------------+

Make an update

Let's remove the "Politics" category from "Grapes of Wrath". I tend to believe Grapes of Wrath is a commentary on the ills of Great Depression era capitalism. But labelling it "Politics" could be a stretch.

I update JSON columns with an update query.

mysql> update books set categories='["Fiction", {"Classics": ["Great American Novel"]}]' where title="The Grapes of Wrath";
mysql>

Now let's check and make sure it worked with a dolt_diff query.

mysql> select from_categories, to_categories from dolt_diff_books where to_commit='WORKING';
+-----------------------------------------------------------------+-----------------------------------------------------+
| from_categories                                                 | to_categories                                       |
+-----------------------------------------------------------------+-----------------------------------------------------+
| ["Fiction", "Politics", {"Classics": ["Great American Novel"]}] | ["Fiction", {"Classics": ["Great American Novel"]}] |
+-----------------------------------------------------------------+-----------------------------------------------------+

As you can see, the diff is the whole value of the JSON object. If this categories was organized as another table with id and category, I would have seen a delete of that specific category for id 1.

And finally a commit to save my work.

mysql> call dolt_commit('-am', 'Updated Grapes of Wrath Categories');
+----------------------------------+
| hash                             |
+----------------------------------+
| c4h9bh7dbnip9urvnfe7es0ab8niljda |
+----------------------------------+

Conclusion

As you can see, Dolt allows you to work with JSON columns and functions just like MySQL. Additionally, you get Dolt's versioning capabilities for those JSON columns. Questions? Come ask about JSON types on our Discord.

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt

Or join our mailing list to get product updates.