Accessing Dolt Data with the Postgres mysql_fdw

FEATURE RELEASEREFERENCE
7 min read

Dolt is a full featured SQL database which supports Git-like version control features, including branch, diff, merge, clone, push and pull. It's wire protocol and supported SQL dialect are based on MySQL.

Here at DoltHub, it's not uncommon for our users to be software firms that are already heavily invested in PostreSQL. We've recently released changes to Dolt to improve its compatibility with the MySQL Foreign Data Wrapper for PostgreSQL. Using the foreign data wrapper, you can easily access your Dolt tables from your existing PostgreSQL database and do things like perform queries which join Dolt tables with Postgres tables and run DML operations against Dolt tables.

This is a short blog post to show how to get started with mysql_fdw running against a Dolt SQL server.

Overview

Foreign data wrappers are extensions to Postgres which allow a given server to access data which resides outside of the Postgres RDBMS using regular SQL queries. There are a variety of such extensions, including extensions which allow it to access various other RDBMS systems (MySQL, Postgres, SQLite, Oracle, etc.), access plain text files as if they were relational data (CSV, JSON, Parquet, etc.), access various non-relational databases (LDAP, DynamoDB, Redis, etc.), and even access arbitrary internet protocols (Git, IMAP, www, etc.). Suffice to say, it's a very powerful model and there is some really cool stuff out there.

Because Dolt speaks the MySQL wire protocol, you can utilize the MySQL foreign data wrapper to access your Dolt tables within from within your Postgres database. In order to see it in action, we will do the following things:

  1. Get a dolt database with some interesting data. We will be working with a clone of a bounties database.

  2. Run a Dolt sql-server instance to which the Postgres server will connect.

  3. Run a Postgres server with the mysql_fdw installed. We will be working with a docker image which already has postgres and the FDW installed.

  4. Run the appropriate DDL commands against the PostgreSQL server in order to tell it how to the remote Dolt server, which tables it should expect to find, and what their schema will be.

At the end of our demo, mysql_fdw is running within the postgres server to allow for access to the tables in the foreign dolt database. The running server instances look like:

mysql_fdw architecture

Let's jump in and see it all in action.

Get a Dolt database

First we will clone an existing bounties database from dolthub.com. Make certain you are working with the latest dolt release, which includes the changes to improve compatibility with mysql_fdw:

$ dolt clone dolthub/standard-charge-files
cloning https://doltremoteapi.dolthub.com/dolthub/standard-charge-files
$ cd standard-charge-files

Run dolt sql-server

Now we will run dolt sql-server, exposing the standard_charge_files database. We want the root user on the database to have a password, so we will create a small configuration file to run it against:

$ DOLT_ROOT_PASSWORD=`openssl rand -hex 16`
$ cat > config.yaml <<EOF
user:
  name: root
  password: "$DOLT_ROOT_PASSWORD"
EOF
$ dolt sql-server --config config.yaml &

To make sure everything is working, we can run a quick query:

$ echo 'select count(*) from hospitals;' | mysql -u root --password="$DOLT_ROOT_PASSWORD" standard_charge_files
mysql: [Warning] Using a password on the command line interface can be insecure.
count(*)
7232

Run Postgres and Configure mysql_fdw

For ease of demonstration, we're going to run Postgres from a docker image with mysql_fdw already installed.

$ export POSTGRES_PASSWORD=`openssl rand -hex 16`
$ docker run --publish 5432:5432 -d --rm -e POSTGRES_PASSWORD toleg/postgres_mysql_fdw:15.2_fdw2.9.0
$ PGPASSWORD="$POSTGRES_PASSWORD" psql -h 127.0.0.1 --user postgres -v DOLT_ROOT_PASSWORD="$DOLT_ROOT_PASSWORD"
Type "help" for help.

postgres=# \dt
Did not find any relations.

Now we can use mysql_fdw to connect to our dolt sql-server:

postgres=# create extension mysql_fdw;
CREATE EXTENSION

postgres=# CREATE SERVER dolt_server FOREIGN DATA WRAPPER mysql_fdw OPTIONS (host 'host.docker.internal', port '3306');
CREATE SERVER

postgres=# CREATE USER MAPPING FOR postgres SERVER dolt_server OPTIONS (username 'root', password :'DOLT_ROOT_PASSWORD');
CREATE USER MAPPING

At this point, we can import the entire schema of the foreign tables, if we want:

postgres=# IMPORT FOREIGN SCHEMA standard_charge_files FROM SERVER dolt_server INTO public;
IMPORT FOREIGN SCHEMA

This will import foreign table definitions for every table in the standard_charge_files database. We can inspect the imported tables with psql directive \dE:

postgres=# \dE+
                                    List of relations
 Schema |      Name      |     Type      |  Owner   | Persistence |  Size   | Description
--------+----------------+---------------+----------+-------------+---------+-------------
 public | additional_npi | foreign table | postgres | permanent   | 0 bytes |
 public | hospitals      | foreign table | postgres | permanent   | 0 bytes |
 public | npi_tin        | foreign table | postgres | permanent   | 0 bytes |
(3 rows)

And we can run queries against the remote tables:

postgres=# select count(*) from npi_tin;
 count
-------
 10164
(1 row)

postgres=# select npi, tin, standard_charge_file_url from hospitals natural join npi_tin where standard_charge_file_url <> '' limit 10;
    npi     |    tin    |                                                     standard_charge_file_url
------------+-----------+-----------------------------------------------------------------------------------------------------------------------------------
 1487866315 | 560529945 | https://chsprodrsrcegrporg385.blob.core.windows.net/cms-price-transparency/560529945_atrium-health-cabarrus_standardcharges.json
 1487866315 | 560529945 | https://chsprodrsrcegrporg385.blob.core.windows.net/cms-price-transparency/560529945_atrium-health-cabarrus_standardcharges.json
 1497859649 | 370960170 | https://www.nm.org/-/media/northwestern/resources/patients-and-visitors/billing-and-insurance/northwestern-memorial-hospital.xlsx
 1497859649 | 370960170 | https://www.nm.org/-/media/northwestern/resources/patients-and-visitors/billing-and-insurance/northwestern-memorial-hospital.xlsx
 1558463745 | 352007327 | https://www.comhs.org/-/media/Files/PriceTransparency/St%20Mary%20Medical%20Center_standardcharges2021.ashx
 1114924834 | 221487330 | https://www.saintpetershcs.com/SaintPeters/media/SaintPeters/About%20Us/SaintPeters_Hospital-Transparency-Upload-121621.pdf
 1114924834 | 221487330 | https://www.saintpetershcs.com/SaintPeters/media/SaintPeters/About%20Us/SaintPeters_Hospital-Transparency-Upload-121621.pdf
 1669424354 | 455540852 | https://www.legacyhealth.org/-/media/Files/CSV/Price-Transparency/93-0281321_legacy-silverton-medical-center_standardcharges.json
 1346753118 | 823349983 | https://kingwood247er.com/wp-content/uploads/2022/05/823349983_Kingwood-Emergency-Hospital_Standardcharges.csv
 1124021811 | 710246565 | https://irp.cdn-website.com/d9154e9e/files/uploaded/charge%20master%205.2.pdf
(10 rows)

Getting Dolt Specific

So far, all of this functionality is available against any RDBMS being accessed from a foreign data wrapper. We're excited that dolt's compatibility with mysql_fdw has recently improved, but one major reason people use Dolt is to gain access to its version control features, and those aren't being showcased above. The reality is that this approach is great for read-only use cases, but there are some existing limitations which make dolt's version control features less accessible when accessing them through Postgres + mysql_fdw:

  1. Some dolt features rely on stored procedures, like call dolt_commit(), call dolt_merge() and call dolt_checkout(). As far as we know, these are currently unaccessible from mysql_fdw.

  2. Some dolt features rely on dynamic schema (database) designations, in the form of database_name/branch_name, for example, to access different branches and revisions of the database.

Importing dolt-specific metadata tables

Depending on your use case, these constraints may be a bit of a show stopper. There are some ways to work within them a bit. For example, you can easily manually define a foreign table for a few of of the built-in Dolt tables like dolt_status and dolt_log:

postgres=# CREATE FOREIGN TABLE dolt_status (
  table_name text NOT NULL,
  staged smallint NOT NULL,
  status text NOT NULL
) SERVER dolt_server
OPTIONS (dbname 'standard_charge_files', table_name 'dolt_status');
CREATE FOREIGN TABLE

postgres=# CREATE FOREIGN TABLE dolt_log (
  commit_hash text NOT NULL,
  committer text NOT NULL,
  email text NOT NULL,
  date timestamp NOT NULL,
  message text NOT NULL
) SERVER dolt_server
OPTIONS (dbname 'standard_charge_files', table_name 'dolt_log');
CREATE FOREIGN TABLE

postgres=# SELECT * FROM dolt_status;
 table_name | staged | status
------------+--------+--------
(0 rows)

postgres=# SELECT * FROM dolt_log LIMIT 1;
           commit_hash            | committer |      email       |        date         |       message
----------------------------------+-----------+------------------+---------------------+----------------------
 e4j80uq3v2qj3jus3goajnas11ubtk0f | spacelove | alec@dolthub.com | 2023-04-05 18:44:49 | Dolthub User: rl1987+
                                  |           |                  |                     | Accepted PR: 6
(1 row)

Using mysql_fdw, we can run DML against any MySQL table where the first column is a UNIQUE or PRIMARY KEY. After we run the DML, we can see the changes reflected in dolt_status.

postgres=# INSERT INTO hospitals (enrollment_id) VALUES ('-1');
INSERT 0 1

postgres=# SELECT * FROM dolt_status;
 table_name | staged |  status
------------+--------+----------
 hospitals  |      0 | modified
(1 row)

postgres=# DELETE FROM hospitals WHERE enrollment_id = '-1';
DELETE 1

postgres=# SELECT * FROM dolt_status;
 table_name | staged | status
------------+--------+--------
(0 rows)

Importing tables from a non-default branch

We can also import the tables from a specific non-default branch or commit by specifying the branch in the database name. For example:

postgres=# CREATE FOREIGN TABLE tuopuh_npi_tin (
  npi int,
  tin varchar(16383)
) SERVER dolt_server
OPTIONS (dbname 'standard_charge_files/tuopuhsicl3i2kblhkdpdr8ik3vmjvf8', table_name 'npi_tin');
CREATE FOREIGN TABLE

postgres-# \dE
                 List of relations
 Schema |      Name      |     Type      |  Owner
--------+----------------+---------------+----------
 public | additional_npi | foreign table | postgres
 public | dolt_log       | foreign table | postgres
 public | dolt_status    | foreign table | postgres
 public | hospitals      | foreign table | postgres
 public | npi_tin        | foreign table | postgres
 public | tuopuh_npi_tin | foreign table | postgres
(6 rows)

postgres=# SELECT count(*) FROM tuopuh_npi_tin;
 count
-------
 10164
(1 row)

This same technique works for dolt_log or dolt_status against a different Dolt branch HEAD as well. If you have just a few Dolt HEADs you need to access on the read path of your integration with your existing Postgres database, this can be a very workable approach. It would perhaps be more practical to import the various branch HEADs as foreign tables in their own, HEAD specific Postgres schemas, and then to query across them within the Postgres context.

Overall we can access a number of versioned features of the Dolt database from within Postgres using the mysql_fdw, but write-heavy workloads and workloads which require access to advanced version control features like branching and diffs remain most accessible using the MySQL protocol against the Dolt sql-server instance.

We know potential customers are committed to Postgres and we will continue to work on better Dolt support for the Postgres use case. We are excited to expose normal Dolt tables and Dolt system tables to Postgres servers through the mysql_fdw.

Conclusion

We saw how to run a postgres server with mysql_fdw installed and how to use mysql_fdw to connect to a dolt sql-server instance. We imported the tables in the Dolt server as Postgres foreign tables and were table to run queries against them, including writes when the table's schema met specific constraints. We saw how to import Dolt specific metadata tables and specific branches and revisions of the foreign Dolt database as well.

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt

Or join our mailing list to get product updates.