Getting Started: SQLAlchemy and Dolt

REFERENCE
23 min read

Here at DoltHub, we built Dolt, the world's first version controlled SQL database. Dolt is MySQL-compatible so all the tools you're used to using with MySQL work the same way with Dolt, like SQLAlchemy.

SQLAlchemy is a popular Python Object Relational Mapper (ORM) that augments standard Python MySQL connection libraries. The main features are reflection, a query builder, and a mapped object interface for writes. SQLAlchemy is extremely popular. Many Dolt customers have used it as an interface to Dolt to build a version controlled application.

So, we tend to get a lot of "How to use Dolt with SQLAlchemy?" questions on our Discord. Earlier this year, I wrote an article about how to use reflection to time travel across schema changes. But it wasn't enough. This article goes through the Getting Started example from the Dolt README but this time in Python using SQLAlchemy.

Dolt + SQLAlchemy

TLDR; The Code

If you don't want to run through the tutorial and just want the demo.py code to play with, it is available in this GitHub repository. demo.py is runnable if you have a Dolt SQL Server started on port 3306 with a database named sql_alchemy_big_demo initialized.

You must have Git, Dolt, Python, the Python MySQL Connector, and SQLAlchemy installed. Then, in one terminal run:

$ cd ~
$ mkdir sqlalchemy-demo
$ cd sqlalchemy-demo
$ mkdir sql_alchemy_big_demo
$ cd sql_alchemy_big_demo
$ dolt init --fun
Successfully initialized dolt data repository.
$ dolt sql-server
Starting server with Config HP="localhost:3306"|T="28800000"|R="false"|L="info"|S="/tmp/mysql.sock"
2023-07-06T09:21:34-07:00 INFO [no conn] Server ready. Accepting connections. {}

In another terminal run:

$ cd ~/sqlalchemy-demo
$ git clone git@github.com:timsehn/dolt-sqlalchemy-getting-started.git
Cloning into 'dolt-sqlalchemy-getting-started'...
remote: Enumerating objects: 59, done.
remote: Counting objects: 100% (59/59), done.
remote: Compressing objects: 100% (26/26), done.
remote: Total 59 (delta 17), reused 52 (delta 16), pack-reused 0
Receiving objects: 100% (59/59), 15.05 KiB | 2.15 MiB/s, done.
Resolving deltas: 100% (17/17), done.
$ cd dolt-sqlalchemy-getting-started
$ python demo.py
Using branch: main
Active branch: main
Resetting to commit: doltu614up273ideb65ntr9r8jl8q1t2
Tables in database:
	employees
	employees_teams
	teams
Created commit: dbbp9lbdprorffhv82ughtr0qndut3mi
...
...
...

You can run the demo script multiple times. It resets Dolt to the init commit and deletes all the branches so it works the same on each subsequent execution. The code shows off table creation, dolt commits, reading dolt system tables using reflection, rollback using dolt reset, branching, and merging all with SQLAlchemy flair.

Install Dolt, MySQL Connector Python, and SQLAlchemy

Dolt is a single ~103 megabyte program.

$ du -h ~/go/bin/dolt
103M	/Users/timsehn/go/bin/dolt

It's really easy to install. Download it and put it on your PATH. We have a bunch of ways to make this even easier for most platforms.

Here is a convenience script that does that for *NIX platforms. Open a terminal and run it.

sudo bash -c 'curl -L https://github.com/dolthub/dolt/releases/latest/download/install.sh | sudo bash'

For the rest of this blog entry, I'm going to assume you are on a *NIX based system and use bash when interacting on the command line.

I assume you have python and pip installed on your machine. There are many ways to install python and they are fraught with peril. You may not have a MySQL connection library or SQLAlchemy installed. To get the Python packages you need for this article run:

$ pip install mysql-connector-python
$ pip install sqlalchemy

You are all set. You have everything you need.

Create a new Database

Dolt needs a place to store your databases. I'm going to put my databases in ~/sqlalchemy-demo.

% cd ~
% mkdir sqlalchemy-demo
% cd sqlalchemy-demo

For this demo, we need an initialized database named sql_alchemy_big_demo. We're going to initialize the database using the Dolt CLI. If you know Git, Dolt will feel really familiar. Dolt is Git for Data. To initialize a new database on the command line, you run dolt init.

$ mkdir sql_achemy_big_demo
$ cd sql_alchemy_big_demo
$ dolt init --fun
Successfully initialized dolt data repository.

The --fun option makes a special initial commit hash. --fun is optional when using Dolt.

Start a SQL Server for SQLAlchemy to Connect To

You can work with Dolt offline in Git for Data mode or you can work with Dolt online in version controlled database mode. SQLAlchemy expects a SQL database to connect to. So, we need versioned controlled database mode. For that, we need a server.

Dolt ships with a built-in MySQL-compatible server. To start a server, we run dolt sql-server. This starts a MySQL-compatible server on port 3306. User root can connect with no password by default.

$ dolt sql-server
Starting server with Config HP="localhost:3306"|T="28800000"|R="false"|L="info"|S="/tmp/mysql.sock"

The terminal will just hang there. Any Dolt errors will be printed in this terminal. Just leave that terminal open and open a new one.

Connect to the Database

This is where we get to start using Python! I'm going to assume you are running the script using the process described in the TLDR section. In the following sections I'll include Python code snippets and explain what they do and why. I don't expect you to run this code in the Python shell. It's a bit too complicated for that.

So, in SQLAlchemy there is a concept of an engine. This is the "home base" for you database connectivity. You make an engine using the create_engine() function and a connection string. For Dolt you use the MySQL flavor connection string like so:

engine = create_engine(
    "mysql+mysqlconnector://root@127.0.0.1:3306/sql_alchemy_big_demo"
)

You'll notice in the demo script, I've wrapped engine creation in a function called dolt_checkout() that takes a branch name. This is a useful abstraction in SQLAlchemy. If you tack a /<branch name> onto the end of a connection string in Dolt, it connects to that branch. If you don't specify a branch in the connection string, it connects to the default branch, in most cases main. Thus, making the SQLAlchemy engine branch aware allows us to persist branch information across connections.

engine = dolt_checkout('main')

def dolt_checkout(branch):
    engine_base = "mysql+mysqlconnector://root@127.0.0.1:3306/sql_alchemy_big_demo"
    # Branches can be "checked out" via connection string. We make heavy use
    # of reflection in this example for system tables so passing around an
    # engine instead of a connection is best for this example.
    engine = create_engine(
        engine_base + "/" + branch
    )
    print("Using branch: " + branch)
    return engine

Once you've created an engine, it's standard SQLAlchemy to connect and run queries. The first thing we do is print the active branch. This function is a wrapper around the Dolt function active_branch().

print_active_branch(engine)

def print_active_branch(engine):
    stmt = text("select active_branch()")
    with engine.connect() as conn:
        results = conn.execute(stmt)
        rows = results.fetchall()
        active_branch = rows[0][0]
        print("Active branch: " + active_branch)

This prints:

Using branch: main
Active branch: main

Note, there is also a dolt_checkout() procedure that switches branches for a connection. We chose an engine based approach to switch branches because SQLAlchemy seems to encourage multiple connections in their documentation. Reflection in particular, which we use a lot in this demo, requires an engine not a connection.

Create Tables

Now, it's time to make some tables. In this example, our database will have three tables: employees, teams, and employees_teams. We can define these in the standard SQLAlchemy way: create a Metadata() object, define the tables on the Metadata object using the Table() class, and at the end run the create_all() method on our Metadata object.

We turn off auto increment using autoincrement=False on all the primary keys. In SQLAlchemy, auto increment is on by default. In Dolt, auto increment values are shared across branches to decrease the risk of conflicts. However, Dolt best practice is to have your application specify keys if it can because auto increment values are not shared across clones. More information on clones vs branches can be found here.

setup_database(engine)

def setup_database(engine):
    metadata_obj = MetaData()

    # This is standard SQLAlchemy without the ORM
    employees_table = Table(
        "employees",
        metadata_obj,
        Column("id", Integer, primary_key=True, autoincrement=False),
        Column("last_name", String(255)),
        Column("first_name", String(255))
    )

    teams_table = Table(
        "teams",
        metadata_obj,
        Column("id", Integer, primary_key=True, autoincrement=False),
        Column("name", String(255))
    )

    employees_teams_table = Table(
        "employees_teams",
        metadata_obj,
        Column("employee_id",
               ForeignKey("employees.id"),
               primary_key=True,
               autoincrement=False),
        Column("team_id",
               ForeignKey("teams.id"),
               primary_key=True,
               autoincrement=False)
    )

    metadata_obj.create_all(engine)

Note, Dolt supports foreign keys, secondary indexes, triggers, check constraints, and stored procedures. It's a modern, feature-rich SQL database.

Then to examine what we've done, we run a show tables and print the results.

print_tables(engine)

def print_tables(engine):
    # Raw SQL here to show what we've done
    with engine.connect() as conn:
        result = conn.execute(text("show tables"))

        print("Tables in database:")
        for row in result:
            table = row[0]
            print("\t" + table)

This outputs:

Tables in database:
	employees
	employees_teams
	teams

The employees, teams, and employees_teams tables are created.

Make a Dolt Commit

Now, it's time to make a Dolt commit. Dolt commits and SQL transaction commits are different. Git and SQL both have commits and Dolt is a combination of both so Dolt must support both. This can be confusing. A Dolt commit makes an entry in the commit log for versioning purposes. A SQL transaction commit is required to persist your database writes to disk so other connections can see them.

In order to make a Dolt commit, we need to use the dolt_commit() procedure. Dolt exposes version control write operations as procedures. The naming of these procedures follows the Git command line standard. git add on the Git command line becomes dolt add on the Dolt command line becomes dolt_add() as a Dolt SQL procedure. Arguments mimic Git as well. If you know Git, you already know how to use Dolt.

There is no SQLAlchemy native way to use procedures. In SQL, we execute procedures using call. We could grab a raw connection from the engine and use callproc(). Or, we could make a raw SQL statement using SQLAlchemy's text() method. I opted for the latter because it seemed more readable. For this method, I allowed the caller to specify a commit author and message to show off how to pass arguments into Dolt procedures. Note, Dolt has a staging area, just like Git so I need to dolt_add() all my tables before making a commit. The resulting code looks like so:

dolt_commit(engine, "Tim <tim@dolthub.com>", "Created tables")

def dolt_commit(engine, author, message):
    # Dolt exposes version control writes as procedures
    # Here, we use text to execute procedures.
    #
    # The other option is to do something like:
    #
    # conn = engine.raw_connection()
    # results = conn.cursor().callproc('dolt_commit', arguments)
    # conn.close()
    #
    # I like the text approach better.
    with engine.connect() as conn:
        # -A means all tables
        conn.execute(
            text("CALL DOLT_ADD('-A')")
        )
        # --skip-empty so this does not fail if there is nothing to commit
        result = conn.execute(
            text("CALL DOLT_COMMIT('--skip-empty', '--author', '"
                 + author
                 + "', '-m', '"
                 + message
                 + "')")
        )
        commit = None
        for row in result:
            commit = row[0]
        if ( commit ):
            print("Created commit: " + commit )

And running it results in the following output:

Created commit: 5ahj2vcifi29ibs48f7bp91sm752ub2i

Examine the Log

Let's examine the Dolt commit log. Dolt version control read operations are exposed in SQL as custom system tables or functions. The commit log can be read using the dolt_log system table named after the git log and dolt log command line equivalents. Again, if you know Git, you already know how to use Dolt.

In order to automatically load the dolt_log schema using SQLAlchemy, we use reflection. We pass autoload_with=engine to the Table constructor to create the dolt_log table object. We then use the standard SQLAlchemy query builder to select and order the log elements we want to print. The resulting code looks like so:

print_commit_log(engine)

def print_commit_log(engine):
    # Examine a dolt system table, dolt_log, using reflection
    metadata_obj = MetaData()
    print("Commit Log:")

    dolt_log = Table("dolt_log", metadata_obj, autoload_with=engine)
    stmt = select(dolt_log.c.commit_hash,
                  dolt_log.c.committer,
                  dolt_log.c.message
                  ).order_by(dolt_log.c.date.desc())

    with engine.connect() as conn:
        results = conn.execute(stmt)
        for row in results:
            commit_hash = row[0]
            author      = row[1]
            message     = row[2]
            print("\t" + commit_hash + ": " + message + " by " + author)

And it outputs the following:

Commit Log:
	5ahj2vcifi29ibs48f7bp91sm752ub2i: Created tables by Tim
	do1tbd13h57jlsm86gqeknnm7nob41uj: Іnіtialize datа repоsitory by timsehn

Notice the first four letters of the initial commit. That's --fun.

Insert Some Data

Now, we're going to populate the tables with some data. Again, we use reflection to build table objects. Then we use standard SQLAlchemy insert syntax to insert rows.

insert_data(engine)

def load_tables(engine):
    metadata_obj = MetaData()

    employees = Table("employees", metadata_obj, autoload_with=engine)
    teams = Table("teams", metadata_obj, autoload_with=engine)
    employees_teams = Table("employees_teams",
                            metadata_obj,
                            autoload_with=engine)

    return (employees, teams, employees_teams)

def insert_data(engine):
    (employees, teams, employees_teams) = load_tables(engine)

    # This is standard SQLAlchemy
    stmt = insert(employees).values([
        {'id':0, 'last_name':'Sehn', 'first_name':'Tim'},
        {'id':1, 'last_name':'Hendriks', 'first_name':'Brian'},
        {'id':2, 'last_name':'Son', 'first_name':'Aaron'},
        {'id':3, 'last_name':'Fitzgerald', 'first_name':'Brian'}
        ])
    with engine.connect() as conn:
        conn.execute(stmt)
        conn.commit()

    stmt = insert(teams).values([
        {'id':0, 'name':'Engineering'},
        {'id':1, 'name':'Sales'}
    ])
    with engine.connect() as conn:
        conn.execute(stmt)
        conn.commit()

    stmt = insert(employees_teams).values([
        {'employee_id':0, 'team_id':0},
        {'employee_id':1, 'team_id':0},
        {'employee_id':2, 'team_id':0},
        {'employee_id':0, 'team_id':1},
        {'employee_id':3, 'team_id':1},
    ])
    with engine.connect() as conn:
        conn.execute(stmt)
        conn.commit()

Don't forget to conn.commit(). This commits the SQL transaction. SQLAlchemy has autocommit off by default so every SQL transaction must be explicitly committed. Again, both Git and SQL have the concept of a commit. Dolt supports both so it can get a little confusing.

Let's make sure our inserts worked by displaying a summary table. SQLAlchemy comes with an impressive query builder that supports all manner of complex SQL queries. In this example, I construct a three table join. I'm expecting a schema change later in the demo so I defensively coded for that.

print_summary_table(engine)

def print_summary_table(engine):
    (employees, teams, employees_teams) = load_tables(engine)

    print("Team Summary")

    # Get all employees columns because we change the schema
    columns = []
    for column in employees.c:
        if ( column.key == "id" ):
            continue
        columns.append(column.key)

    # Must convert to tuple
    columns = tuple(columns)

    # Dolt supports up to 12 table joins. Here we do a 3 table join.
    stmt = select(teams.c.name,
                  employees.c[columns]
                  ).select_from(
                      employees
                  ).join(
                      employees_teams,
                      employees.c.id == employees_teams.c.employee_id
                  ).join(
                      teams,
                      teams.c.id == employees_teams.c.team_id
                  ).order_by(teams.c.name.asc());
    with engine.connect() as conn:
        results = conn.execute(stmt)
        for row in results:
            team_name  = row[0]
            last_name  = row[1]
            first_name = row[2]
            start_date = ''
            if ( len(row) > 3 ):
                if ( row[3] ):
                    start_date = row[3].strftime('%Y-%m-%d')

            print("\t" + team_name + ": " + first_name + " " + last_name + " " + start_date)

Which results in the following output:

Team Summary
	Engineering: Tim Sehn
	Engineering: Brian Hendriks
	Engineering: Aaron Son
	Sales: Tim Sehn
	Sales: Brian Fitzgerald

The CEO at DoltHub wears many hats.

Examine the Status and Diff

To see what tables changed, you can use the dolt_status system table. In a recurring theme, I again use reflection to automatically load the schema.

print_status(engine)

def print_status(engine):
    metadata_obj = MetaData()
    dolt_status = Table("dolt_status", metadata_obj, autoload_with=engine)

    print("Status")
    stmt = select(dolt_status.c.table_name, dolt_status.c.status)
    with engine.connect() as conn:
        results = conn.execute(stmt)
        rows = results.fetchall();
        if ( len(rows) > 0 ):
            for row in rows:
                table  = row[0]
                status = row[1]
                print("\t" + table + ": " + status)
        else:
            print("\tNo tables modified")

Now, that I see which tables changed, I want to see what rows changed in the tables. Dolt is built from the ground up to provide fast differences (ie. diffs) between table versions even for very large tables. Let's see how we can use Dolt diff functionality with SQLAlchemy. In Dolt, there is a dolt_diff_<table> system table for each user defined table and a dolt_diff() table function. The system table works better with reflection so we'll use it here. Notice, I filter the diff table down to only WORKING changes so I only see changes that aren't staged or committed.

print_diff(engine, "employees")

def print_diff(engine, table):
    metadata_obj = MetaData()

    print("Diffing table: " + table)
    dolt_diff = Table("dolt_diff_" + table,
                      metadata_obj,
                      autoload_with=engine)

    # Show only working set changes
    stmt = select(dolt_diff).where(dolt_diff.c.to_commit == 'WORKING')
    with engine.connect() as conn:
        results = conn.execute(stmt)
        for row in results:
            # I use a dictionary here because dolt_diff_<table> is a wide table
            row_dict = row._asdict()
            # Then I use pprint to display the results
            pprint(row_dict)

The resulting output looks like so:

Status
	employees: modified
	employees_teams: modified
	teams: modified
Diffing table: employees
{'diff_type': 'added',
 'from_commit': '5ahj2vcifi29ibs48f7bp91sm752ub2i',
 'from_commit_date': datetime.datetime(2023, 7, 7, 19, 6, 18, 716000),
 'from_first_name': None,
 'from_id': None,
 'from_last_name': None,
 'to_commit': 'WORKING',
 'to_commit_date': None,
 'to_first_name': 'Tim',
 'to_id': 0,
 'to_last_name': 'Sehn'}
{'diff_type': 'added',
 'from_commit': '5ahj2vcifi29ibs48f7bp91sm752ub2i',
 'from_commit_date': datetime.datetime(2023, 7, 7, 19, 6, 18, 716000),
 'from_first_name': None,
 'from_id': None,
 'from_last_name': None,
 'to_commit': 'WORKING',
 'to_commit_date': None,
 'to_first_name': 'Brian',
 'to_id': 1,
 'to_last_name': 'Hendriks'}
{'diff_type': 'added',
 'from_commit': '5ahj2vcifi29ibs48f7bp91sm752ub2i',
 'from_commit_date': datetime.datetime(2023, 7, 7, 19, 6, 18, 716000),
 'from_first_name': None,
 'from_id': None,
 'from_last_name': None,
 'to_commit': 'WORKING',
 'to_commit_date': None,
 'to_first_name': 'Aaron',
 'to_id': 2,
 'to_last_name': 'Son'}
{'diff_type': 'added',
 'from_commit': '5ahj2vcifi29ibs48f7bp91sm752ub2i',
 'from_commit_date': datetime.datetime(2023, 7, 7, 19, 6, 18, 716000),
 'from_first_name': None,
 'from_id': None,
 'from_last_name': None,
 'to_commit': 'WORKING',
 'to_commit_date': None,
 'to_first_name': 'Brian',
 'to_id': 3,
 'to_last_name': 'Fitzgerald'}

I transformed the row results to a dictionary and used pprint to save myself a bunch of diff display code that would have complicated this demo. I think it looks pretty.

Before we go onto the next section let's Dolt commit our changes.

dolt_commit(engine,
            "Aaron <aaron@dolthub.com>",
            "Inserted data into tables")
print_commit_log(engine)

Resulting in the following output:

Created commit: lf30j7u2pmrna1h4ld3gg7j9gpn9etho
Commit Log:
	lf30j7u2pmrna1h4ld3gg7j9gpn9etho: Inserted data into tables by Aaron
	5ahj2vcifi29ibs48f7bp91sm752ub2i: Created tables by Tim
	do1tbd13h57jlsm86gqeknnm7nob41uj: Іnіtialize datа repоsitory by timsehn

Oh no. I made a mistake.

Dolt has powerful rollback capabilities. Let's imagine I accidentally drop a table. The foreign keys will prevent me from dropping employees or teams but employees_teams is not safe from my wrath!

drop_table(engine, "employees_teams")

def drop_table(engine, table):
    (employees, teams, employees_teams) = load_tables(engine)

    if ( table == "employees"):
        employees.drop(engine)
    elif ( table ==  "teams" ):
        teams.drop(engine)
    elif ( table == "employees_teams" ):
        employees_teams.drop(engine)
    else:
        print(table + ": Not found")

As we can see from status and show tables it is gone.

print_status(engine)
print_tables(engine)
Status
	employees_teams: deleted
Tables in database:
	employees
	teams

In a traditional database, this could be disastrous. In Dolt, we can get it back with a simple call dolt_reset('hard'). This function takes an optional commit. If no commit is specified it resets to the HEAD commit.

dolt_reset_hard(engine, None)
print_status(engine)
print_tables(engine)

def dolt_reset_hard(engine, commit):
    if ( commit ):
        stmt = text("CALL DOLT_RESET('--hard', '" + commit + "')")
        print("Resetting to commit: " + commit)
    else:
        stmt = text("CALL DOLT_RESET('--hard')")
        print("Resetting to HEAD")

    with engine.connect() as conn:
        results = conn.execute(stmt)
        conn.commit()

The conn.commit() should not be necessary here but I uncovered a bug writing this demo. Now, the output:

Resetting to HEAD
Status
	No tables modified
Tables in database:
	employees
	employees_teams
	teams

Dolt makes operating databases less error prone. You can always back out changes you have in progress or rewind to a known good state.

Change data on a branch

Dolt is the only SQL database with branches and merges. Here I will show you how to create and switch branches. I'll make some changes and commit them. Later, I'll merge all my changes together. Think of a branch as a really long SQL transaction.

First, you need to create a branch. Creating a branch is a write so you do it with a procedure, dolt_branch(). In the Python code, I also consult the dolt_branches system table to make sure the branch does not already exist. We used the dolt_checkout() method described in the Connect to the Database section to switch branches using the engine object.

dolt_create_branch(engine, 'modify_data')
engine = dolt_checkout('modify_data')

def dolt_create_branch(engine, branch):
    # Check if branch exists
    metadata_obj = MetaData()

    dolt_branches = Table("dolt_branches", metadata_obj, autoload_with=engine)
    stmt = select(dolt_branches.c.name).where(dolt_branches.c.name == branch)
    with engine.connect() as conn:
        results = conn.execute(stmt)
        rows = results.fetchall()
        if ( len(rows) > 0 ):
             print("Branch exists: " + branch)
             return

    # Create branch
    stmt = text("CALL DOLT_BRANCH('" + branch + "')")
    with engine.connect() as conn:
        results = conn.execute(stmt)
        print("Created branch: " + branch)

Now that we're on a new branch, it's safe to make changes. The main branch will remain unchanged as I make these changes. This function inserts, updates, and deletes using the SQLAlchemy query builder.

modify_data(engine)

def modify_data(engine):
    (employees, teams, employees_teams) = load_tables(engine)

    update_stmt = update(employees).where(employees.c.first_name == 'Tim'
                                          ).values(first_name='Timothy')

    insert_emp_stmt = insert(employees).values([
        {'id':4, 'last_name':'Wilkins', 'first_name':'Daylon'}
        ])
    insert_et_stmt = insert(employees_teams).values([
        {'employee_id':4, 'team_id':0}
    ])

    delete_stmt = delete(employees_teams).where(
        employees_teams.c.employee_id == 0
    ).where(employees_teams.c.team_id == 1)

    with engine.connect() as conn:
        conn.execute(update_stmt)
        conn.execute(insert_emp_stmt)
        conn.execute(insert_et_stmt)
        conn.execute(delete_stmt)
        conn.commit()

Let's inspect what we've done to make sure it looks good.

print_status(engine)
print_diff(engine, 'employees')
print_diff(engine, 'employees_teams')
print_summary_table(engine)
Using branch: modify_data
Status
	employees: modified
	employees_teams: modified
Diffing table: employees
{'diff_type': 'modified',
 'from_commit': 'lf30j7u2pmrna1h4ld3gg7j9gpn9etho',
 'from_commit_date': datetime.datetime(2023, 7, 7, 19, 6, 18, 771000),
 'from_first_name': 'Tim',
 'from_id': 0,
 'from_last_name': 'Sehn',
 'to_commit': 'WORKING',
 'to_commit_date': None,
 'to_first_name': 'Timothy',
 'to_id': 0,
 'to_last_name': 'Sehn'}
{'diff_type': 'added',
 'from_commit': 'lf30j7u2pmrna1h4ld3gg7j9gpn9etho',
 'from_commit_date': datetime.datetime(2023, 7, 7, 19, 6, 18, 771000),
 'from_first_name': None,
 'from_id': None,
 'from_last_name': None,
 'to_commit': 'WORKING',
 'to_commit_date': None,
 'to_first_name': 'Daylon',
 'to_id': 4,
 'to_last_name': 'Wilkins'}
Diffing table: employees_teams
{'diff_type': 'removed',
 'from_commit': 'lf30j7u2pmrna1h4ld3gg7j9gpn9etho',
 'from_commit_date': datetime.datetime(2023, 7, 7, 19, 6, 18, 771000),
 'from_employee_id': 0,
 'from_team_id': 1,
 'to_commit': 'WORKING',
 'to_commit_date': None,
 'to_employee_id': None,
 'to_team_id': None}
{'diff_type': 'added',
 'from_commit': 'lf30j7u2pmrna1h4ld3gg7j9gpn9etho',
 'from_commit_date': datetime.datetime(2023, 7, 7, 19, 6, 18, 771000),
 'from_employee_id': None,
 'from_team_id': None,
 'to_commit': 'WORKING',
 'to_commit_date': None,
 'to_employee_id': 4,
 'to_team_id': 0}
Team Summary
	Engineering: Timothy Sehn
	Engineering: Brian Hendriks
	Engineering: Aaron Son
	Engineering: Daylon Wilkins
	Sales: Brian Fitzgerald

Daylon is added to the engineering team on the modify_data branch. I'm no longer on the Sales team. Engineering for life!

Finally, let's commit these changes so we can make different changes on another branch.

dolt_commit(engine, 'Brian <brian@dolthub.com>', 'Modified data on branch')
print_commit_log(engine)
Created commit: 0o07b2f3r788t9nn2vg139maviu49fd4
Commit Log:
	0o07b2f3r788t9nn2vg139maviu49fd4: Modified data on branch by Brian
	lf30j7u2pmrna1h4ld3gg7j9gpn9etho: Inserted data into tables by Aaron
	5ahj2vcifi29ibs48f7bp91sm752ub2i: Created tables by Tim
	do1tbd13h57jlsm86gqeknnm7nob41uj: Іnіtialize datа repоsitory by timsehn

Change schema on another branch

We're going to make a schema change on another branch and make some data modifications using the SQLAlchemy Session interface. The Session interface is the true alchemy in SQLAlchemy. You bind Python objects to tables and when you add, modify, or delete Python objects, calling Session.commit() automatically updates the records in your tables. Truly lead into gold stuff.

Below, you'll see we check out the main branch so the new branch has the correct base branch. Then, we create a new branch called modify_schema. Then, we run the modify_schema() function which adds a start date column and populates it using the Session interface. We finally use status and diff to show off what changed.

engine = dolt_checkout('main')
dolt_create_branch(engine, 'modify_schema')
engine = dolt_checkout('modify_schema')
print_active_branch(engine)
modify_schema(engine)
print_status(engine)
print_diff(engine, "employees")
print_summary_table(engine)

def modify_schema(engine):
    (employees, teams, employees_teams) = load_tables(engine)

    # SQLAlchemy does not support table alters so we use text
    stmt = text('alter table employees add column start_date date')
    with engine.connect() as conn:
        conn.execute(stmt)
        conn.commit()

    # Update using the SQL Alchemy session interface
    class Base(DeclarativeBase):
        pass

    class Employee(Base):
        __tablename__ = "employees"
        id: Mapped[int] = mapped_column(primary_key=True)
        last_name: Mapped[str] = mapped_column(String(255))
        first_name: Mapped[str] = mapped_column(String(255))
        start_date: Mapped[Date] = mapped_column(Date)

        def __repr__(self) -> str:
            return f"Employee(id={self.id!r}, last_name={self.last_name!r}, first_name={self.first_name!r}, start_date={self.start_date!r})"

    session = Session(engine)
    Tim = session.get(Employee, 0)
    Tim.start_date = "2018-08-06"

    Aaron = session.get(Employee, 1)
    Aaron.start_date = "2018-08-06"

    BHeni = session.get(Employee, 2)
    BHeni.start_date = "2018-08-06"

    Fitz = session.execute(select(Employee).filter_by(last_name="Fitzgerald")).scalar_one()
    Fitz.start_date = "2021-04-19"

    session.commit()

This outputs the following:

Using branch: main
Created branch: modify_schema
Using branch: modify_schema
Active branch: modify_schema
Status
	employees: modified
Diffing table: employees
{'diff_type': 'modified',
 'from_commit': 'lf30j7u2pmrna1h4ld3gg7j9gpn9etho',
 'from_commit_date': datetime.datetime(2023, 7, 7, 19, 6, 18, 771000),
 'from_first_name': 'Tim',
 'from_id': 0,
 'from_last_name': 'Sehn',
 'from_start_date': None,
 'to_commit': 'WORKING',
 'to_commit_date': None,
 'to_first_name': 'Tim',
 'to_id': 0,
 'to_last_name': 'Sehn',
 'to_start_date': datetime.date(2018, 8, 6)}
{'diff_type': 'modified',
 'from_commit': 'lf30j7u2pmrna1h4ld3gg7j9gpn9etho',
 'from_commit_date': datetime.datetime(2023, 7, 7, 19, 6, 18, 771000),
 'from_first_name': 'Brian',
 'from_id': 1,
 'from_last_name': 'Hendriks',
 'from_start_date': None,
 'to_commit': 'WORKING',
 'to_commit_date': None,
 'to_first_name': 'Brian',
 'to_id': 1,
 'to_last_name': 'Hendriks',
 'to_start_date': datetime.date(2018, 8, 6)}
{'diff_type': 'modified',
 'from_commit': 'lf30j7u2pmrna1h4ld3gg7j9gpn9etho',
 'from_commit_date': datetime.datetime(2023, 7, 7, 19, 6, 18, 771000),
 'from_first_name': 'Aaron',
 'from_id': 2,
 'from_last_name': 'Son',
 'from_start_date': None,
 'to_commit': 'WORKING',
 'to_commit_date': None,
 'to_first_name': 'Aaron',
 'to_id': 2,
 'to_last_name': 'Son',
 'to_start_date': datetime.date(2018, 8, 6)}
{'diff_type': 'modified',
 'from_commit': 'lf30j7u2pmrna1h4ld3gg7j9gpn9etho',
 'from_commit_date': datetime.datetime(2023, 7, 7, 19, 6, 18, 771000),
 'from_first_name': 'Brian',
 'from_id': 3,
 'from_last_name': 'Fitzgerald',
 'from_start_date': None,
 'to_commit': 'WORKING',
 'to_commit_date': None,
 'to_first_name': 'Brian',
 'to_id': 3,
 'to_last_name': 'Fitzgerald',
 'to_start_date': datetime.date(2021, 4, 19)}
Team Summary
	Engineering: Tim Sehn 2018-08-06
	Engineering: Brian Hendriks 2018-08-06
	Engineering: Aaron Son 2018-08-06
	Sales: Tim Sehn 2018-08-06
	Sales: Brian Fitzgerald 2021-04-19

As you can see my defensive coding in the Insert Some Data section paid off and employee start dates are displayed. This looks good so we'll commit it.

    dolt_commit(engine, 'Tim <tim@dolthub.com>', 'Modified schema on branch')
    print_commit_log(engine)
Created commit: apf44oec5ctk3cu0e0s2ot90pgpr2f1s
Commit Log:
	apf44oec5ctk3cu0e0s2ot90pgpr2f1s: Modified schema on branch by Tim
	lf30j7u2pmrna1h4ld3gg7j9gpn9etho: Inserted data into tables by Aaron
	5ahj2vcifi29ibs48f7bp91sm752ub2i: Created tables by Tim
	do1tbd13h57jlsm86gqeknnm7nob41uj: Іnіtialize datа repоsitory by timsehn

Merge it all together

Now for our last trick. We will merge all the branches together and show the resulting summary table. To merge, you use the procedure dolt_merge().

engine = dolt_checkout('main')
print_active_branch(engine)
print_commit_log(engine)
print_summary_table(engine)
dolt_merge(engine, 'modify_data')
print_summary_table(engine)
print_commit_log(engine)
dolt_merge(engine, 'modify_schema')
print_commit_log(engine)
print_summary_table(engine)

def dolt_merge(engine, branch):
    stmt = text("CALL DOLT_MERGE('" + branch + "')")
    with engine.connect() as conn:
        results = conn.execute(stmt)
        rows = results.fetchall()
        commit       = rows[0][0]
        fast_forward = rows[0][1]
        conflicts    = rows[0][2]
        print("Merge Complete: " + branch)
        print("\tCommit: " + commit)
        print("\tFast Forward: " + str(fast_forward))
        print("\tConflicts: " + str(conflicts))

This outputs the following. It's beautiful. You can see the data and schema evolving as we merge.

Using branch: main
Active branch: main
Commit Log:
	lf30j7u2pmrna1h4ld3gg7j9gpn9etho: Inserted data into tables by Aaron
	5ahj2vcifi29ibs48f7bp91sm752ub2i: Created tables by Tim
	do1tbd13h57jlsm86gqeknnm7nob41uj: Іnіtialize datа repоsitory by timsehn
Team Summary
	Engineering: Tim Sehn
	Engineering: Brian Hendriks
	Engineering: Aaron Son
	Sales: Tim Sehn
	Sales: Brian Fitzgerald
Merge Complete: modify_data
	Commit: 0o07b2f3r788t9nn2vg139maviu49fd4
	Fast Forward: 1
	Conflicts: 0
Team Summary
	Engineering: Timothy Sehn
	Engineering: Brian Hendriks
	Engineering: Aaron Son
	Engineering: Daylon Wilkins
	Sales: Brian Fitzgerald
Commit Log:
	0o07b2f3r788t9nn2vg139maviu49fd4: Modified data on branch by Brian
	lf30j7u2pmrna1h4ld3gg7j9gpn9etho: Inserted data into tables by Aaron
	5ahj2vcifi29ibs48f7bp91sm752ub2i: Created tables by Tim
	do1tbd13h57jlsm86gqeknnm7nob41uj: Іnіtialize datа repоsitory by timsehn
Merge Complete: modify_schema
	Commit: 2jmueascj0cnrba7pl46m9ok5et4ccmg
	Fast Forward: 0
	Conflicts: 0
Commit Log:
	2jmueascj0cnrba7pl46m9ok5et4ccmg: Merge branch 'modify_schema' into main by root
	apf44oec5ctk3cu0e0s2ot90pgpr2f1s: Modified schema on branch by Tim
	0o07b2f3r788t9nn2vg139maviu49fd4: Modified data on branch by Brian
	lf30j7u2pmrna1h4ld3gg7j9gpn9etho: Inserted data into tables by Aaron
	5ahj2vcifi29ibs48f7bp91sm752ub2i: Created tables by Tim
	do1tbd13h57jlsm86gqeknnm7nob41uj: Іnіtialize datа repоsitory by timsehn
Team Summary
	Engineering: Timothy Sehn 2018-08-06
	Engineering: Brian Hendriks 2018-08-06
	Engineering: Aaron Son 2018-08-06
	Engineering: Daylon Wilkins
	Sales: Brian Fitzgerald 2021-04-19

Notice the first merge was a fast-forward merge just like in Git. Dolt will detect schema and data conflicts if you make them.

Conclusion

Phew. That was a lot. If you made it this far, congratulations. You are now ready to build your own Dolt application using Python and SQLAlchemy. If this wasn't enough to get you started, come by our Discord and someone will help you out. If you want to talk to me, just @timsehn.

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt

Or join our mailing list to get product updates.