Django and Dolt

7 min read

Dolthub was started in 2018 to create a place on the internet to access interesting, maintained data. That vision drove us to build Dolt, a versioned, syncable data format. It's "Git for Data". This year we launched data bounties as the logical continuation of Dolt's data collaboration features. We believe in Dolt as the future of open data communities, but along the way it's become capable of so much more.

Today Dolt is a fully-featured relational database that's MySQL compatible and is 96.6% SQL compliant. More than a data format, it's able to host applications and give them super powers. Dolt is the only SQL database you can branch, merge, diff, push and pull. As we continue to expand its abilities, we're exploring the range of application integrations that are possible. Today we'll demo how to build Django apps with Dolt.

Django

Django is a well known and widely used python web framework. Among its most popular features is its Object-Relational Mapping (ORM). The ORM allows users to model application domain objects as python classes and have Django do the work of creating a database schema and serializing objects to and from the database. Django supports the most common SQL databases through a common API. Using Django's MySQL database backend, you can plug-and-play with Dolt and things just work! To get things up and running we'll use Django's recommended MySQL client package mysqlclient, and configure our settings.py file as follows:

DATABASES = {
    "default": {  # dolt
        "NAME": "my_database",
        "USER": "my_user",
        "PASSWORD": "my_db_password",
        "HOST": "127.0.0.1",
        "PORT": 3306,
        "ENGINE": "django.db.backends.mysql",
    }
}

Next we'll configure Dolt to run as the application database with a my_config.yaml file:

log_level: warning

behavior:
  read_only: false

user:
  name: "my_user"
  password: "my_db_password"

databases:
  - name: "my_database"
    path: "."

listener:
  host: localhost
  port: 3306
  max_connections: 128
  read_timeout_millis: 28800000
  write_timeout_millis: 28800000

Finally, we'll initialize the database and start the SQL server. Note that we export DOLT_ENABLE_TRANSACTIONS=true to turn on SQL transactions in Dolt. Transactions support is currently in beta and will be fully released in the coming weeks.

% export DOLT_ENABLE_TRANSACTIONS=true
% cd my-app-database
% dolt init
% dolt sql-server --config my_config.yaml

Our database is up and running, and we're ready to write our Django app!

Django Models

Django Models are the foundation of Django's ORM. They allow developers to work purely in an object-oriented space by abstracting away database interactions. Let's look at an example from the Django documentation:

from django.db import models

class Person(models.Model):
    first_name = models.CharField(max_length=30)
    last_name = models.CharField(max_length=30)

This defines a Python class that will be mapped to the following database table:

CREATE TABLE my_app_person (
    "id" int NOT NULL PRIMARY KEY AUTO_INCREMENT,
    "first_name" varchar(30) NOT NULL,
    "last_name" varchar(30) NOT NULL
);

It also creates a Django Migration that will add this table to the database's schema:

import uuid
from django.db import migrations, models

class Migration(migrations.Migration):

    initial = True

    dependencies = []

    operations = [
        migrations.CreateModel(
            name="Person",
            fields=[
                ("id", models.UUIDField(default=uuid.uuid4, editable=False, primary_key=True, serialize=False, unique=True),
                ("first_name", models.CharField(max_length=30)),
                ("last_name", models.CharField(max_length=30)),
            ],
        ),
    ]

After a model is defined and its migration is run against the database, users can interact with persisted models using only class methods:

from my_app.models import Person

Person(first_name="Andy", last_name="Arthur").save()  # inserts a row into the database
aa = Person.objects.get(last_name="Arthur")  # reads a model from the database

Dolt Models in Django

To take full-advantage of the power of Dolt we need to expose its version control (VCS) features through the Django ORM. In a traditional SQL context, branch, commit and merge are accessed via system tables and dolt-specific SQL functions. The dolt_branches table contains a row for each branch in the database. dolt_log contains a row for each commit in the database, ordered by a topological traversal of the commit graph. Exposing this functionality in the ORM is relatively straight forward:

from django.db import models

class Branch(models.Model):
    """ Expose the `dolt_branches` system table """
    name = models.TextField(primary_key=True)
    hash = models.TextField()
    latest_committer = models.TextField()
    latest_committer_email = models.TextField()
    latest_commit_date = models.DateTimeField()
    latest_commit_message = models.TextField()

    class Meta:
        managed = False
        db_table = "dolt_branches"
        verbose_name_plural = "branches"


class Commit(models.Model):
    """ Expose the `dolt_log` system table """
    commit_hash = models.TextField(primary_key=True)
    committer = models.TextField()
    email = models.TextField()
    date = models.DateTimeField()
    message = models.TextField()

    class Meta:
        managed = False
        db_table = "dolt_log"
        verbose_name_plural = "commits"

One wrinkle is that Dolt system tables exist from the time the database is initialized. Inside each model's Meta class, we set managed = false and db_table = "..." to tell Django that this model represents an existing table and doesn't need a migration.

New branches in Dolt can be created in SQL by inserting new rows into the dolt_branches system table. In the Django ORM, this looks like Branch(name="my-new-branch").save(). When the save() method is called, Django knows how to take the provided data and convert it into an INSERT query. Creating new commits in SQL works a little differently, so we need to override the save() method for the Commit model.

from django.db import models, connection

class Commit(models.Model):
    ...

    def save(self, *args, **kwargs):
        with connection.cursor() as cursor:
            cursor.execute(f"SELECT dolt_commit(--all', --message', '{self.message}');")
            ch = cursor.fetchone()[0]
        return Commit.objects.get(commit_hash=ch)

While we're at it, we'll expose a few functions on the Branch model in order to make use of Dolt's VCS features in the ORM:

from django.db import models, connection

class Branch(models.Model):
    ...
    
    @staticmethod
    def active_branch():
        with connection.cursor() as cursor:
            cursor.execute("SELECT active_branch();")
            return cursor.fetchone()[0]

    def checkout_branch(self):
        with connection.cursor() as cursor:
            cursor.execute(f"""SELECT dolt_checkout('{self.name}') FROM dual;""")

    def merge(self, merge_branch):
        with connection.cursor() as cursor:
            cursor.execute(f"""SELECT dolt_merge('{merge_branch}') FROM dual;""")

With these methods in place, we have all the primitives needed to write a versioned Django application. Let's walk through an example of creating models on different Dolt branches and merging them back together:

from my_app.models import Person, Branch, Commit

# Start by adding a new Person and committing the change
Person(first_name="Andy", last_name="Arxthur").save()
Commit(message="added Andy").save()

# Now create a new branch, check it out, and make some changes
Branch(name="other-branch").save().checkout()
Person(first_name="Zach", last_name="Musgrave").save()
aa = Person.objects.get(last_name="Arxthur")
aa.last_name = "Arthur"  # fix typo
aa.save()
Commit(message="added Zach, fixed typo").save()

# Switch back to the master branch and make more changes
Branch.objects.get(name="master").checkout()
Person(first_name="Daylon", last_name="Wilkins").save()
aa = Person.objects.get(last_name="Arxthur")
aa.first_name = "Andrew"  # update
aa.save()
Commit(message="added Daylon, update to legal first name").save()
 
# Merge the changes from 'other-branch' into master
Branch.objects.get(name=Branch.active_branch()).merge("other-branch")
Commit(message="merged 'other-branch' into master").save()
list(Person.objects.order_by("last_name"))
[
    Person<"Andrew Arthur">, 
    Person<"Zach Musgrave">, 
    Person<"Daylon Wilkins">,
]

Dolt's unique data model allows us to perform cell-wise merges. Both branches touched Person<"Andy Arxthur> and Dolt merged the updates seamlessly.

Conclusion

Using Django on top of Dolt is a new use case for us. At present, it requires a bit of leg-work from developers, but real-world Django apps are already being developed against Dolt. Nautobot is a network source-of-truth application that automates network provisioning and maintenance. A plugin to run Nautobot on top of Dolt is currently being built. Adding branch and merge features to a source-of-truth application allows changes to be carefully reviewed before applying them to production data. Providing these features in the database solves an entire class of problems with relatively little work from application authors. As these patterns become more mature, we plan to release a Dolt-specific Django library, "Djolt", to package the integration code. We're breaking new ground in the world of database technology and uncovering more utility every day. If you have an idea for integrating Dolt into your application, or if you want to learn more, get in touch with us on Discord!

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt