Lately, we've been hard at work building Doltgres, our version-controlled Postgres-compatible database. Doltgres gives you the power of a SQL relational database, combined with the distributed version-control features of Git. Doltgres is currently in Alpha, and we're getting close to hitting our Beta milestone. In this post, we're focusing on one specific Beta milestone requirement: support for importing Postgres data dumps.
Postgres Data Dumps
When new customers come to try out Dolt or Doltgres, one of the very first things they usually want to do is load a data dump from an existing database and see how well Dolt and Doltgres work. If a potential customer can't get this first step working quickly, then we find that people aren't typically willing to spend a lot more time trying. They often don't reach out for help or cut issues on GitHub, and instead just move on. So, getting imports to load in cleanly was one of our initial big areas of focus with Dolt, and we're giving the same attention to Doltgres now, too.
Supporting database dump files is challenging, because database schemas often use a LOT of features across the large surface area of SQL, particularly exercising DDL support. For customers to be able to import database dumps from a wide variety of real world schemas, we need a wide swath of support across Postgres' DDL SQL syntax.
We've been mining dump files from the internet, testing them, and then cataloging, prioritizing, and fixing the gaps we find. There's been a LOT of progress closing those gaps and getting closer to full support, and we've now reached a point where we're starting to be able to cleanly import Postgres data dumps! There are enough features implemented for the first of these Postgres dumps to cleanly load into Doltgres. In the next section, we'll demo importing that dump into Doltgres.
If you don't have the doltgres binary on your machine yet, go ahead and install it.
Next, start up the doltgres server:
> doltgres
INFO[0000] Creating root@localhost superuser
INFO[0000] Server ready. Accepting connections.
WARN[0000] secure_file_priv issetto"", which is insecure.
WARN[0000]AnyuserwithGRANTFILEprivileges will be able toreadanyfile which the sql-server process can read.
WARN[0000] Please consider restarting the server with secure_file_priv setto a safe (or non-existent) directory.
Now that we've got a Doltgres server running, let's import some data! We first use wget to download the raw french-towns-communes-francaises.sql file from GitHub, then we use the Postgres SQL shell, psql, to feed in those SQL statements to the Doltgres server:
If everything works correctly, you should see the following output in your terminal, indicating what statements are being executed from the SQL file:
BEGIN
CREATE TABLE
CREATE TABLE
CREATE TABLE
SET
SET
SET
COPY 26
COPY 100
COPY 36684
pg_catalog.setval
-------------------
26(1 row)
pg_catalog.setval
-------------------
100(1 row)
pg_catalog.setval
-------------------
36684(1 row)
COMMIT
ANALYZE
ANALYZE
ANALYZE
From here, we can log into the database, look around, and run some sample queries:
PGPASSWORD=password psql -Upostgres -hlocalhost
psql (15.12(Homebrew), server 15.0)Type"help"for help.
postgres=> \dt
List of relations
Schema| Name |Type| Owner
--------+-------------+-------+----------public| departments |table| postgres
public| regions |table| postgres
public| towns |table| postgres
(3rows)
postgres=>selectcount(*)from towns;
count
-------36684(1row)
postgres=>select*from towns limit1;
id | code | article | name | department
-------+------+---------+-----------+------------16070|309|| Souprosse |40(1row)
Now that we've confirmed our data import looks good, let's create a Doltgres commit. This creates a commit in our database's commit graph that we can always refer back to:
call dolt_commit('-Am','Initial data import for french-towns-communes-francaises dataset');
We can take a look at the dolt_log system table to see the commit log for our current branch:
And sure enough, we see our new commit is the most recent commit reported in the commit log.
What's Next?
As excited as we are about getting a first Postgres data dump from the internet to cleanly load into Doltgres, we're not stopping to celebrate just yet. We've been testing a lot of data dumps and there are many more gaps we're targeting to knock out quickly.
In addition to lots of smaller features we've got in our backlog, here are a few of the larger features we're looking at next:
TEXT types in primary, secondary, and foreign keys – Postgres uses a clever approach, named TOAST, that allows efficient storage of large text or binary content, by storing it inline in a row when it's small enough, and moving larger content to out-of-row storage. This enables TEXT types to be stored and searched efficiently enough for them to be used in primary and secondary indexes, as well as for foreign keys. TEXT types are commonly used in Postgres indexes, so we're excited to deliver this and support more Postgres schemas in Doltgres. We've already got a working prototype for this support, so stay tuned for an announcement on this one soon.
User-defined PL/pgSQL functions – We just recently released initial support for user-defined PL/pgSQL functions. We've got many of the basic statements covered (e.g. loops, branching, assignment, selecting from queries) but there are still a few more areas we're fleshing out (e.g. exception handling, variable aliases, cursors).
Triggers – now that we've got support for user-defined functions, using those functions as triggers on tables is a logical next step. Triggers are a widely-used and powerful feature for relational databases, so we're excited to get these working in Doltgres soon.
We Need Your Help!
Doltgres has been making a TON of progress in the past few months, and we'll be continuing to announce lots more support for new features going forward.
You can help us out by taking Doltgres for a spin and sharing feedback on any gaps or problems you hit! Try importing your Postgres schemas to Doltgres and let us know how it goes. If you can share your schema file with us, that's even better! If you don't have an existing Postgres data dump, you can try out the steps in the getting started guide.