Have Postgres. Want Dolt.

REFERENCEDOLTGRES
5 min read

Note: We released a Postgres version of Dolt called DoltgreSQL.

Dolt is the first version controlled SQL database. DoltHub is a place on the internet to share Dolt databases. DoltLab is a self-hosted version of DoltHub. Dolt brings Git-style version control to the relational database.

Dolt supports the MySQL wire protocol and thus MySQL flavored SQL. PostgreSQL (or Postgres for short) is another popular open source SQL database with a different wire protocol and SQL flavor. If you have a Postgres database and want Dolt this blog article is for you.

Why is Dolt MySQL flavored anyway?

To add versioning to data we needed a new storage engine that could both quickly respond to queries and provide differences between two copies. The good folks who built Noms built and tested a data structure called a prolly tree that fit the bill.

Adopting this data structure at the core of Dolt meant Dolt would need to master data. In other words, you would have to import data into Dolt. You would not be able to sit Dolt on top of a traditional database like Git sits on top of your filesystem.

Dolt was initially built as a command line tool. Dolt implements the Git command line exactly except the versioning target is tables, not files. Before Dolt supported SQL, you would create and update tables using dolt table import. This version was never released in the wild. While dogfooding initial versions of Dolt, we quickly realized that we needed both a data description language (ie. schema) and some ability to query data.

SQL was the obvious choice for both schema and querying. But how would we add SQL? We considered two options:

  1. Write a SQL parser, analyzer and engine ourselves, preferring open source options that existed.
  2. Use a Postgres foreign data wrapper.

For (1), we discovered go-mysql-server, at the time supported by now defunct source{d}. go-mysql-server, as the name suggests, implements the MySQL wire protocol and SQL flavor. No Postgres equivalent existed. go-mysql-server leverages vitess for parsing and implemented its own analyzer and engine. Because go-mysql-server was a Go library, the main advantage of using it was that Dolt could stay a command line tool. To start a MySQL compatible server, we could just add a command dolt sql-server. To run queries using STDIN or from the command line, we could add a command called dolt sql. We could still ship Dolt as a single command line tool. The downside was that go-mysql-server was not very mature and was missing a lot of SQL functionality. We've since taken over go-mysql-server and it is much more robust now.

For (2), in order to use Dolt, one would need Postgres installed. You would then load the Dolt foreign data wrapper. All the Dolt functionality would need to be accessed via that running Postgres server. At the time we had not imagined how things like merge and branch would work in SQL. The advantage of this approach, however, was that Dolt would use the full power of the actual Postgres parser, analyzer, and engine. Dolt would effectively be Postgres with a cool storage engine.

PostgreSQL vs MySQL

So weighing our options, we went with option (1). MySQL was a very popular SQL dialect. Had a Postgres equivalent of go-mysql-server existed, we likely would have gone with that given that Postgres has a little more open source cachet.

But I have Postgres and want Dolt!

We hear you. You have options and those will get better over time. For now, you'll have to import data into Dolt to take advantage of its features. We have tools to make that migration easier for you. In the near future, we have something exciting coming!

SQL Sync

Our first kick at the "Have Postgres, Want Dolt" can was SQL Sync, a module in doltpy, our Python helper library.

SQL Sync is a clever approach. It leverages the SQLAlchemy ORM to do SQL dialect conversions on individual tables. Here's some code snippets from the docs:

from doltpy.etl.sql_sync import sync_to_dolt, sync_schema_to_dolt, POSTGRES_TO_DOLT_TYPE_MAPPINGS

sync_schema_to_dolt(postgres_engine,
                    dolt,
                    {'revenue_estimates': 'revenue_estimates'},
                    POSTGRES_TO_DOLT_TYPE_MAPPINGS)

sync_to_dolt(get_postgres_source_reader(postgres_engine),
             get_dolt_target_writer(dolt),
             {'revenue_estimates': 'revenue_estimates'})

You can see to convert from Postgres types to MySQL types, SQL Sync comes with a mapping. If you're a Python user and want to mirror a single, simply schema-ed table from Postgres to Dolt, SQL Sync would work. At the time we implemented SQL Sync, Dolt did not support things like constraints, views or triggers so the approach made sense.

But the approach always bothered me. Obviously, it wouldn't work for lots of schema elements. SQL Sync is not a "sync". Looking closely at the code, you can see it effectively does an upsert on the entire table and things could be changing as sync was reading multiple tables. Schema changes are handled in a non-principled way. It kind of felt like a hack and I think any database expert would think so. We needed a better way.

pg2mysql

Our second try was pg2mysql, a pgdump to mysqldump converter. There were some unmaintained attempts at dump converters that didn't quite work. We were motivated so we thought we could get 90% of the way there. Hence, our improved version of pg2mysql.

Using dumps to do migrations from Postgres to Dolt allows for schema elements like constraints to be preserved in the migration. A dump is an out of band process so there is no worry about data changing between table imports.

After you run pgdump to get a dump file of your database, you pipe that file into pg2mysql and it does its best to convert that dump to MySQL sql that you can then pipe into dolt sql.

% pgdump --insert mydatabase mydatabase.pgdump
% ./pg2mysql.pl < mydatabase.pgdump > mydatabase.mydump
% dolt sql < mydatabase.mydump

This simple process can be automated so you can import dumps on a recurring basis, getting diffs between different versions of your database.

pg2mysql is still a little finicky. It does not support triggers, views, or stored procedures. There are some types that can't be converted like the money type or integer arrays. But it's good enough to try on your database. If it doesn't work, feel free to cut an issue and we'll see if we can fix it.

DoltgreSQL

DoltgreSQL (Doltgres for short) is what we call the PostgreSQL foreign data wrapper solution we always dreamed of. We're so serious about it, we bought the domain. We started building a prototype and even published a blog article about how to build a foreign data wrapper.

We had to pause the project but the idea is, you will be able to clone most Dolt repositories and data types permitting, be able to spin up a Postgres reader through the Dolt foreign data wrapper. Eventually, we may be able to support writes making full Doltgres a reality. This work is likely to land late 2022 or early 2023. Let us know if you're interested and we may be able to move it up.

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt

Or join our mailing list to get product updates.