Using Dolt with ORMs

REFERENCE
14 min read

Object-Relational Mappers (ORMs) are a cornerstone of modern application development. They provide a domain-friendly interface for working with relational databases, letting developers define models in their chosen programming language and then automatically translating those into SQL queries. By abstracting away SQL boilerplate, ORMs can increase productivity, reduce errors, and let developers focus on their higher-level application code instead of spending time writing SQL queries.

Dolt is a MySQL-compatible relational database that combines the features of MySQL with the features of Git, to give you the world’s first version-controlled, relational database. Dolt lets you branch, merge, diff, and work with your SQL tables in all the same ways that Git lets you work with files.

In this post, we’re taking a look at how to use Dolt with ORMs and highlighting some specific Dolt features that can be helpful, along with a couple of gotchas to know about.

ORM Benefits#

ORMs are popular because they take over the responsibility for database interactions in an application by creating SQL queries and unmarshalling results from the database into model objects. Beyond that, there’s a lot of variance in what ORMs provide. Some ORMs are super lightweight without providing extra functionality, while others come bundled in a full-featured framework that provides all sorts of other features such as pagination support, advanced connection pooling, and lazy/eager data loading.

Generally, you can expect most ORMs to provide:

  • Model-First Development: Define your entities in code. Use classes, structs, or structs with annotations, and let the ORM handle mapping to tables and columns. This removes a lot of marshalling/unmarshalling boilerplate code from your application.
  • Migrations & Schema Evolution: Evolve your schema safely over time, alongside your application code, using ORM migration tooling. ORMs can generate migration scripts and apply those migration scripts when deploying new versions.
  • Query Building: Build database queries using fluent, type-safe query construction instead of authoring raw SQL strings.

It’s worth noting that not every database application needs to use an ORM. ORMs provide a lot of helpful functionality, but there are valid reasons why a developer may prefer to keep tight control over the SQL queries and query results processing, instead of using an ORM.

Dolt Compatibility with ORMs#

Dolt is MySQL-compatible, although it doesn’t contain any MySQL code in its implementation. We wrote Dolt from the ground up so that we could use a unique data storage engine which enables extremely efficient diff computation. Because Dolt is MySQL-compatible, you can use any tool, framework, or ORM that works with MySQL to interact with Dolt. As part of our MySQL-compatibility testing, we’ve tested over a dozen ORMs in various languages to make sure they work well with Dolt. (If you’re looking for a version-controlled Postgres database, check out Doltgres, another database we’ve been building.)

At the time of writing this post, here are the ORMs we’ve tested. Each one has a link to a blog article with a walkthrough of using that ORM with Dolt, as well as sample code in a linked repository:

Tricks and Tips for Using Dolt with an ORM#

ORMs are designed around certain assumptions: a single, stable schema for your tables, traditional schema migrations, and predictable introspection capabilities. Dolt expands what’s possible with data, and challenges some of those assumptions. The following sections share tips and tricks for using Dolt effectively with ORMs.

Using ORMs with Historical Schemas#

One of the primary benefits of Dolt is that you can go back in time and view your data at any point where you’ve made a Dolt commit. This is a powerful feature, but can present a challenge for ORMs that expect data to match the schema of your current model definitions. If you go back in time, specifically across schema changes, then your tables’ schemas won’t match what the ORM is expecting from your model.

Fortunately, Dolt provides a solution for this. Schema overriding allows you to pin your schema to a specific commit, then when you query historical data, it is automatically mapped to that target schema. This feature was designed explicitly to enable ORMs to more easily access historical data when the data’s schema has changed over time. By setting the session variable @@dolt_override_schema, all queries in that session will behave as if the database schema were the one specified, even if the underlying data comes from a commit with a different schema.

Using this feature is easy. Let’s take it for a spin…

First, create a new directory, initialize it as a Dolt database, and then launch a SQL shell:

mkdir dolt-schema-override-demo && cd dolt-schema-override-demo
dolt init
dolt sql

Next, let’s create some sample data and then extend the schema in the most recent commit:

-- Create the "old" schema (no Birthdate) and seed data
CREATE TABLE people (
    Id   INT NOT NULL PRIMARY KEY,
    Name TEXT NOT NULL
);

INSERT INTO people (Id, Name) VALUES
    (1, 'Frank'),
    (2, 'Columbia'),
    (3, 'Dr. Scott');

-- Stage all tables (-A) and commit 
CALL DOLT_COMMIT('-Am', 'Create people table (Id, Name) and seed data');

-- Tag this version so we can reference this point in our history later
CALL DOLT_TAG('v1.0');

-- Evolve the schema: add a new column and commit it
ALTER TABLE people ADD COLUMN Birthdate DATE NULL;
INSERT INTO people (Id, Name, Birthdate) VALUES (4, 'Magenta', '1981-02-16 06:00:00');
CALL DOLT_COMMIT('-Am', 'Add Birthdate column to people');

Now that we’ve set up our database, we’re ready to use the schema override feature. First let’s take a look at the problem. Dolt lets you query historical data in many ways. You could use a branch that has an older schema version, or you could check out a tag that points to an older commit, or you could use the AS OF syntax to reference an older commit. Those give us easy ways to look back in history and see how our data existed at previous points, but remember that our ORM is still using a static model to deserialize results and it’s expecting the schema to match that static model exactly. In the query below, the ORM’s static model knows about the Name and Birthdate fields, but the Birthdate field doesn’t exist at v1.0.

-- this column doesn't exist on olderData's schema
SELECT Name, Birthdate FROM people AS OF 'v1.0';

The query above fails, because the commit we’re referencing doesn’t contain the Birthdate field:

column "Birthdate" could not be found in any table in scope

To fix this, we just need to set the @@dolt_override_schema session variable with a commit reference that contains the schema we want to map to. In our case, we’re going to use the latest schema from main, since that’s what matches the static model our ORM is using.

-- Override schema to use the schema at the tip of the main branch. 
-- The hashof() function resolves the tag to a commit hash. 
SET @@dolt_override_schema = 'main';

-- Now the same query succeeds (Birthdate will be NULL for these historical rows)
-- Dolt maps old data to the newer schema and fills in missing columns with NULL.
SELECT Name, Birthdate FROM people AS OF 'v1.0';

With the @@dolt_override_schema session variable set, we can now execute that query without an error, because the data is mapped to the schema at the tip of main and fills in NULL values for the new Birthdate field. Notice in this example that the data we’re using here comes from v1.0 (because of the AS OF 'v1.0' clause we used), but the schema comes from the tip of main (because @@dolt_override_schema is set to main).

+-----------+-----------+
| Name      | Birthdate |
+-----------+-----------+
| Frank     | NULL      |
| Columbia  | NULL      |
| Dr. Scott | NULL      |
+-----------+-----------+
3 rows in set (0.00 sec) 

Schema overriding has some limits, but it’s a handy feature to map older data into a different schema.

Using ORMs with Non-Versioned Data#

Dolt allows you to create isolated branches where you can stage changes or run experiments. These branches are isolated and don’t interfere with other branches. But what happens when you have data that needs to be accessible across all branches and shouldn’t be versioned? One example of a good candidate is a table that tracks all visits to a site. If this data was versioned, then you could run into spurious merge conflicts when merging between branches that have both updated the table. For rapidly growing data like this, it could also generate noise in diffs and history logs that could make other data changes harder to see. If your app has data like this, where there’s really no need to ever go back and see the historical versions, or audit how the data changed, then it’s a good candidate for using Dolt’s nonlocal table support.

Nonlocal tables let you configure tables to behave like global tables and be accessible from any branch, without being included in versioning. To use nonlocal tables, define a dolt_nonlocal_tables configuration that maps certain table names to a target reference (like a branch). From that point on, queries for those tables will resolve to the designated reference across all branches.

Let’s test out nonlocal tables. First, create a new directory, initialize it as a Dolt database, and launch a SQL shell:

mkdir dolt-nonlocal-demo && cd dolt-nonlocal-demo
dolt init
dolt sql

Next, create the nonlocal table. In this example, we have a table that contains global configuration, and we want that configuration data to be available to all branches and not versioned with our data. We use a global_ prefix for the table name, which isn’t required, but is a good pattern to make it more obvious that this table holds global data and is accessible from all branches.

CREATE TABLE global_app_config (
  config_key   VARCHAR(100) NOT NULL PRIMARY KEY,
  config_value VARCHAR(100) NOT NULL
);

INSERT INTO global_app_config VALUES
  ('theme', 'light'),
  ('feature_x_enabled', 'false');

Now we need to tell Dolt that the new global_app_config table should be a nonlocal table by inserting a new row into the dolt_nonlocal_tables system table. Nonlocal tables are also “ignored” by default (meaning Dolt won’t version them), but it’s good practice to also explicitly add them to the dolt_ignore system table. Note also that after we make changes to dolt_nonlocal_tables and dolt_ignore, we need to stage those changes and commit them.

INSERT INTO dolt_nonlocal_tables (table_name, target_ref, options)
    VALUES ('global_*', 'main', 'immediate');

-- Optional but recommended: ensure these tables never get staged/committed by accident
INSERT INTO dolt_ignore (pattern, ignored) VALUES ('global_*', true);

-- Commit the configuration so *new branches inherit it*
CALL DOLT_ADD('dolt_nonlocal_tables', 'dolt_ignore');
CALL DOLT_COMMIT('-m', 'Configure global_* as nonlocal tables on main (immediate), and ignore them');

We’ve got a nonlocal table configured now, so we can test out how it works. We’re going to check out a new branch, make some changes to global_app_config, then verify that those changes are visible from all branches.

CALL DOLT_CHECKOUT('-b', 'feature/ui');

UPDATE global_app_config
    SET config_value = 'true'
    WHERE config_key = 'feature_x_enabled';

Here is the updated data in our global_app_config table:

SELECT * FROM global_app_config ORDER BY config_key;
+-------------------+--------------+
| config_key        | config_value |
+-------------------+--------------+
| feature_x_enabled | true         |
| theme             | light        |
+-------------------+--------------+
2 rows in set (0.00 sec) 

If we look at dolt_status, we see those changes to global_app_config are not listed as uncommitted changes, since this table isn’t included in versioning.

SELECT * FROM dolt_status;
Empty set (0.00 sec)

Finally, let’s switch to a different branch, and confirm that the data we see in global_app_config matches what we updated in the other branch:

CALL DOLT_CHECKOUT('main');

SELECT * FROM global_app_config ORDER BY config_key;
+-------------------+--------------+
| config_key        | config_value |
+-------------------+--------------+
| feature_x_enabled | true         |
| theme             | light        |
+-------------------+--------------+
2 rows in set (0.00 sec) 

Connecting to Branches#

Dolt allows you to reference a specific branch in a database by using syntax of the form <db_name>/<branch_name>. This is useful in queries (e.g. you can read from a different branch by specifying this syntax: SELECT * FROM mydb/mybranch.mytable;), and is particularly useful in database connection strings. If you want all your database connections to automatically connect to a specific branch, you can use this syntax in the connection string, and then not have to worry about calling dolt_checkout(), which also saves you a round trip to the database. This also makes it easy for you to move all your app’s connections to a different branch during testing.

How you specify that database connection string varies by the client you’re using. The general, MySQL-compatible form is:

user:password@tcp(localhost:3306)/mydb/mybranch

Here’s what it looks like in Python, with SQLAlchemy:

engine = create_engine(
    "mysql+pymysql://user:password@localhost:3306/mydb/mybranch"
)

And here’s an example with .NET’s Entity Framework Core:

options.UseMySql(
    "server=localhost;user=user;password=password;database=mydb/mydb",
    ServerVersion.AutoDetect(connectionString)
);

For more information about how to specify a branch-specific connection string in different ORMs, check out the blogs and sample code we provide for various ORMs.

Using Connection Pooling#

ORMs generally use a connection pool to avoid the costly overhead of repeatedly establishing new connections to your database. When a connection is released back to the pool, the pool is responsible for sending a message to the database over that connection to reset its state. This ensures that no session state leaks between threads, for example if one session assigned session variables with customer information, you wouldn’t want another thread to be able to read that other customer’s private data just because it’s reusing the same connection. Unfortunately, not all connection pools will send this reset session request. The Go MySQL driver maintainers won’t send this message when a connection is returned to a pool because they worry it will affect performance (although a concurrent process could reset the session asynchronously and make the performance impact minimal). For Dolt, this can be a bigger problem, because the branch you have checked out is also part of the session state. This means one thread could grab a connection, change its branch, then return it to the pool for another unsuspecting thread to pick up and be unaware that it’s not pointing at the branch it expects.

There are a few ways to deal with this. Some applications won’t need to switch branches, so they don’t need to worry about this concern. For applications that do need to work with multiple branches there are two approaches we recommend to customers:

  • Use one connection pool per branch – if your application only works with a small number of branches, then you can have a connection pool dedicated to each branch, using the branch connection approach below.
  • Run dolt_checkout() at the start of your logic – if your application dynamically creates branches, then your best bet may be to ensure you call dolt_checkout() at the start of your logic to ensure you’re on the expected branch. Alternatively, you can run SELECT active_branch() to sanity check what branch you’re on. Note that the behavior here is somewhat specific to the ORM you are using. Some ORMs let you have a connection and work with it over multiple queries until you explicitly release it, while other ORMs will automatically select a connection for you and may still choose a different connection in the pool for two queries that execute next to each other in your application logic.

Reflection and Working with Dolt’s System Tables#

Some ORMs rely heavily on schema reflection (sometimes called introspection) to understand the structure of a database at runtime. Reflection allows an ORM to discover tables, columns, types, and relationships dynamically, rather than requiring everything to be defined statically up front. Dolt supports schema reflection via information_schema, just like MySQL does. This means ORMs can reflect on tables in Dolt just as they would in MySQL.

In ORMs that support reflection, it can also be used to reflect on system tables, including Dolt system tables that expose version-control features and metadata (e.g. commit logs, diffs, branch metadata). The best approach to query these system tables depends on the capabilities of the ORM that you’re using.

In Python, for example, SQLAlchemy supports using reflection to dynamically construct table model definitions. Here’s an example with SQLAlchemy for querying the dolt_log system table, which shows you the commit history for your current branch. Notice that we don’t have to declare the structure of the dolt_log system table. Instead, we can specify autoload_with=engine to get SQLAlchemy to use reflection to discover the table’s schema.

    # Instantiate the dolt_log table. Using "autoload_with=engine" causes SQLAlchemy to 
    # load the column definitions from the database itself, using reflection.  
    dolt_log = Table("dolt_log", MetaData(), autoload_with=engine)
    stmt = select(dolt_log.c.commit_hash,
                  dolt_log.c.committer,
                  dolt_log.c.message
                  ).order_by(dolt_log.c.date.desc())

    with engine.connect() as conn:
        results = conn.execute(stmt)
        for row in results:
            # Process results... 

Other ORMs, like .NET’s Entity Framework Core, rely on static models and less on reflection. Since the dolt_log system table has a static schema, you could define a model for it, then use the standard ORM tooling to query and process results using that model type.

Here’s a sample model for the fields our app needs from the dolt_log system table:

public class DoltCommit
{
    public string CommitHash { get; set; }
    public string Committer { get; set; }
    public string Message { get; set; }
    public DateTime CommitDate { get; set; }
}

We have to register this model with the framework so it knows it’s associated with the dolt_log table:

using Microsoft.EntityFrameworkCore;

public class AppDbContext : DbContext
{
    public DbSet<DoltCommit> DoltLog { get; set; }

    protected override void OnModelCreating(ModelBuilder modelBuilder)
    {
        modelBuilder
            .Entity<DoltCommit>()
            .HasNoKey()
            .ToTable("dolt_log");

        base.OnModelCreating(modelBuilder);
    }
}

From here, we can use that model to query the dolt_log system table:

using (var context = new AppDbContext())
{
    var recentCommits = context.DoltLog
        .OrderByDescending(c => c.CommitDate)
        .Take(10)
        .ToList();

    foreach (var commit in recentCommits)
    {
        Console.WriteLine($"{commit.CommitHash}: {commit.Message}");
    }
}

Finally, there’s always an escape hatch if you really need to execute raw SQL without using a model. For example, you may want to execute a stored procedure, like dolt_reset() or call a function. ORMs almost always provide a way to access a connection and execute a SQL statement directly.

For more examples and other ORMs, check out our integration documentation to find blogs and source code.

Conclusion#

ORMs can make it easier to build an application that works with a database, although they can also bring in some complexity, so understanding how they work is critical. Dolt is MySQL-compatible, so any ORM that works with MySQL will work with Dolt. We’ve personally tested over a dozen ORMs with Dolt, and documented them in blog posts and with sample code. Dolt features like schema overrides and nonlocal tables can help make it easier to use Dolt’s versioning features with ORMs.

If you want to discuss the ins and outs of ORMs, suggest another ORM for us to test out, or if you’re just curious about version-controlled, relational databases, then stop by the DoltHub Discord and say hello!

JOIN THE DATA EVOLUTION

Get started with Dolt

Or join our mailing list to get product updates.