Creating a Postgres Foreign Data Wrapper

REFERENCE
25 min read

Here at DoltHub some of us have been working with PostgreSQL extensions recently. This is an introductory tutorial on how to get started building a PostgreSQL foreign data wrapper. We introduce the basics around setting up a project for building and installing the extension and how to implement a very basic read only scan.

Overview

PostgreSQL is a powerful open-source SQL engine with a ton of compelling features and great stability. It's also very extensible. It's extension architecture allows shared libraries developed in C to add new data types and functions, index and table storage formats, and query planning and execution functionality.

One common type of postgres extension is a foreign data wrapper. Originally designed for exposing tables and relations in a foreign SQL server so that they could be queried and processed on a different postgres server, the architecture allows for developing extensions to expose many foreign data sources as tables to the postgres backend. They can then be queried and processed using the full power of the postgres query engine, and can be easily combined with local data or other remote data using powerful query capabilities like joins and aggregations.

Postgres itself ships with two foreign data wrappers:

  • postgres_fdw — which will connect to a remote PostgreSQL server and expose tables from it.

  • file_fdw — which will allow for accessing local files and programs as if they were tables, exposing functionality similar to the COPY statement, but without importing the data into postgres tables first.

There are a multitude of third-party foreign data wrappers for connecting to other SQL databases, like MySQL, Oracle and MSSQL, and also for connecting to other data sources entirely, such as Redis and DynamoDB.

The postgres documentation is fantastic, but the surface area of the of the API is decently large. This is a guided tour for getting started with a new foreign data wrapper implementation.

Prerequisites and Some Scaffolding

To get started, you will need the following installed:

  • GNU Make
  • GCC or Clang
  • Postgres

The instructions here have been tested on macOS and Linux. On Windows, they will definitely require Cygwin or MinGW, and may require further changes as well.

We're going to create a foreign data wrapper named tutorial_fdw. To start with, we will create a directory for our project and a Makefile for building it.

$ mkdir tutorial_fdw
$ cd tutorial_fdw

And we create the following Makefile:

MODULE_big = tutorial_fdw
OBJS = tutorial_fdw.o

EXTENSION = tutorial_fdw
DATA = tutorial_fdw--1.0.sql

PG_CONFIG = pg_config
PGXS := $(shell $(PG_CONFIG) --pgxs)
include $(PGXS)

This Makefile uses a build system that ships with Postgres called PGXS. To use it, we set some variables describing what artifacts we need to build and install, and we include build system makefile which we find using pg_config.

If we now run make, it will complain that there is no rule to make tutorial_fdw.o. We can

$ touch tutorial_fdw.c

to make a bit of progress. Eventually that file will contain our implementation.

Running make again complains about a missing tutorial_fdw.control file. The control file describes a proper Postgres extension, and includes metadata about the extension itself.

$ cat > tutorial_fdw.control <<EOF
comment = 'Tutorial FDW.'
default_version = '1.0'
module_pathname = '$libdir/tutorial_fdw'
relocatable = true

Now our extension will build successfully. make install will fail with:

install: .//tutorial_fdw--1.0.sql: No such file or directory

Our Makefile references this file as a DATA component, and PostgreSQL's extension mechanism will attempt to load it when someone runs CREATE EXTENSION tutorial_fdw;. Its purpose is to run the lower-level SQL commands that will load our shared library and point PostgreSQL at its entry points. A foreign data wrapper in postgres has one mandatory and one optional entry point:

  • A handler entry point, which returns a struct of function pointers that will implement the foreign data wrapper API. These function pointers will be called by postgres to participate in query planning and execution. They will do things like estimate the number of rows that match a given restrict clause, enumerate and estimate the costs of the different ways of accessing a given set of rows (through an index, table scan, covering index scan, etc.), and implement the table access itself where each row is returned to the execution engine. This is the meat of a FDW implementation and what we are going to build.

  • A validator entry point, which is an optional entry point that will be called with the options which have been set on the foreign server, table, user or wrapper itself.

We will leave the validator as a future addition and just implement handler for now.

$ cat > tutorial_fdw--1.0.sql <<EOF
CREATE FUNCTION tutorial_fdw_handler()
RETURNS fdw_handler
AS 'MODULE_PATHNAME'
LANGUAGE C STRICT;

CREATE FOREIGN DATA WRAPPER tutorial_fdw
  HANDLER tutorial_fdw_handler;
EOF

A Small Smoke Test

Now our extension can build and install as a proper extension accessible from our postgres installation. We will now be iterating on an implementation to actually return result tuples as part of a table scan. Let's define exactly what we want our tutorial foreign data wrapper to do, and how people will interface with it.

Our FDW will be quite silly. We will implement it to support single column tables where the column is an integer. By default, we will return single element tuples sequentially from 0 to 63 inclusive. Then we will add the ability for options on the table to control where we start and where we end. Interacting with our extension should look like:

> CREATE EXTENSION tutorial_fdw;
CREATE EXTENSION
> CREATE SERVER tutorial_server FOREIGN DATA WRAPPER tutorial_fdw;
CREATE SERVER
> CREATE FOREIGN TABLE sequential_ints ( val int ) SERVER tutorial_server;
CREATE FOREGIN TABLE
> SELECT * FROM sequential_ints;
 val
-----
0
1
2
3
4
5
...
(64 rows)

As we iterate on our implementation, we will want to test it out. So we create a small smoke test which does the above in a standalone (and newly created) postgres database. smoke_test.sh looks like:

#!/bin/bash

set -eo pipefail

make install

PGDATA=`mktemp -d -t tfdw-XXXXXXXXXXX`

trap "PGDATA=\"$PGDATA\" pg_ctl stop >/dev/null || true; rm -rf \"$PGDATA\"" EXIT

PGDATA="$PGDATA" pg_ctl initdb > /dev/null
PGDATA="$PGDATA" pg_ctl start
psql postgres -f smoke_test.sql

And smoke_test.sql is:

CREATE EXTENSION tutorial_fdw;
CREATE SERVER tutorial_server FOREIGN DATA WRAPPER tutorial_fdw;
CREATE FOREIGN TABLE sequential_ints ( val int ) SERVER tutorial_server;
SELECT * FROM sequential_ints;

Now after ./smoke_test.sh, we will see the server logs, which will include error log lines similar to:

ERROR:  incompatible library "/usr/local/lib/postgresql/tutorial_fdw.so": missing magic block
HINT:  Extension libraries are required to use the PG_MODULE_MAGIC macro.

That means our extension is installed and postgres attempted to load it. Great progress. Now we just need to write the implementation.

Iterative Implementation

Let's continue on our path of addressing the immediate failures we see. PG_MODULE_MAGIC is documented here and we will need to include it in our shared library. From now on, we will be making changes to tutorial_fdw.c, which is currently empty.

--- a/tutorial_fdw.c
+++ b/tutorial_fdw.c
@@ -0,0 +1,3 @@
+#include "postgres.h"
+#include "fmgr.h"
+PG_MODULE_MAGIC;

Changes our first error upon running smoke_test.sh to:

psql:smoke_test.sql:1: ERROR:  could not find function "tutorial_fdw_handler" in file "/usr/local/lib/postgresql/tutorial_fdw.so"

So we will need to implement the function we told postgres would exist.

--- a/tutorial_fdw.c
+++ b/tutorial_fdw.c
@@ -1,3 +1,14 @@
 #include "postgres.h"
 #include "fmgr.h"
+#include "foreign/fdwapi.h"
+Datum tutorial_fdw_handler(PG_FUNCTION_ARGS);
+PG_FUNCTION_INFO_V1(tutorial_fdw_handler);
+
+Datum
+tutorial_fdw_handler(PG_FUNCTION_ARGS)
+{
+        FdwRoutine *fdwroutine = makeNode(FdwRoutine);
+        PG_RETURN_POINTER(fdwroutine);
+}
+
 PG_MODULE_MAGIC;

This is an extension function that is callable from SQL, so it has a particular calling convention it needs to follow. We use preprocessor macros to declare its arguments, generate its return value and expose it in our shared library. You can read about these calling conventions and how to work with them further. For the purposes of a foreign data wrapper, we have a parameter-less function that always returns a pointer to a struct FdwRoutine.

When we run our smoke_test.sh now, you will see that our client gets a closed connection error and the postgres server logs that the child process died with a segmentation fault. Our implementation returned a pointer to a newly allocated and zeroed FdwRoutine struct, but that struct is meant to contain function pointers to our implementation. Our handler function did not populate any. A small number of the function pointers are necessary, but most are optional. You can find documentation on the members and see the definition of the struct.

For now, let's populate the necessary members with some stubs.

--- a/tutorial_fdw.c
+++ b/tutorial_fdw.c
@@ -4,11 +4,51 @@
 Datum tutorial_fdw_handler(PG_FUNCTION_ARGS);
 PG_FUNCTION_INFO_V1(tutorial_fdw_handler);
 
+void tutorial_fdw_GetForeignRelSize(PlannerInfo *root, RelOptInfo *baserel, Oid foreigntableid);
+void tutorial_fdw_GetForeignPaths(PlannerInfo *root, RelOptInfo *baserel, Oid foreigntableid);
+ForeignScan *tutorial_fdw_GetForeignPlan(PlannerInfo *root, RelOptInfo *baserel, Oid foreigntableid,
+    ForeignPath *best_path, List *tlist, List *scan_clauses, Plan *outer_plan);
+void tutorial_fdw_BeginForeignScan(ForeignScanState *node, int eflags);
+TupleTableSlot *tutorial_fdw_IterateForeignScan(ForeignScanState *node);
+void tutorial_fdw_ReScanForeignScan(ForeignScanState *node);
+void tutorial_fdw_EndForeignScan(ForeignScanState *node);
+
 Datum
 tutorial_fdw_handler(PG_FUNCTION_ARGS)
 {
         FdwRoutine *fdwroutine = makeNode(FdwRoutine);
+        fdwroutine->GetForeignRelSize = tutorial_fdw_GetForeignRelSize;
+        fdwroutine->GetForeignPaths = tutorial_fdw_GetForeignPaths;
+        fdwroutine->GetForeignPlan = tutorial_fdw_GetForeignPlan;
+        fdwroutine->BeginForeignScan = tutorial_fdw_BeginForeignScan;
+        fdwroutine->IterateForeignScan = tutorial_fdw_IterateForeignScan;
+        fdwroutine->ReScanForeignScan = tutorial_fdw_ReScanForeignScan;
+        fdwroutine->EndForeignScan = tutorial_fdw_EndForeignScan;
         PG_RETURN_POINTER(fdwroutine);
 }
 
+void tutorial_fdw_GetForeignRelSize(PlannerInfo *root, RelOptInfo *baserel, Oid foreigntableid) {
+}
+
+void tutorial_fdw_GetForeignPaths(PlannerInfo *root, RelOptInfo *baserel, Oid foreigntableid) {
+}
+
+ForeignScan *tutorial_fdw_GetForeignPlan(PlannerInfo *root, RelOptInfo *baserel, Oid foreigntableid,
+    ForeignPath *best_path, List *tlist, List *scan_clauses, Plan *outer_plan) {
+        return NULL;
+}
+
+void tutorial_fdw_BeginForeignScan(ForeignScanState *node, int eflags) {
+}
+
+TupleTableSlot *tutorial_fdw_IterateForeignScan(ForeignScanState *node) {
+        return NULL;
+}
+
+void tutorial_fdw_ReScanForeignScan(ForeignScanState *node) {
+}
+
+void tutorial_fdw_EndForeignScan(ForeignScanState *node) {
+}
+

Our stubs all do absolutely nothing and, when they have to return a pointer, return NULL. Now if we run smoke_test, we get:

ERROR:  could not devise a query plan for the given query

It's time to flesh out the implementation and actually participate in query planning.

Paths and Plans

Postgres documentation describes the path of a query. After a query has been parsed and rewritten, our FDW will participate in planning and optimizing to come up with a final query plan that can be executed. The postgres query planner is going to call GetForeignRelSize, GetForeignPaths and GetForeignPlan, in that order, for each table access in a query. The purpose is as follows:

  • GetForeignRelSize — update baserel->rows, and possibly baserel->width and baserel->tuples, with an estimated result set size for a scan of baserel, accounting for the filtering done by restriction qualifiers. Planning can proceed without an accurate implementation here, but the potential for misoptimizations is high.

  • GetForeignPaths — update baserel->pathlist to include ways of accessing baserel. Typically adds ForeignPath *s created with create_foreignscan_path to baserel using add_path. Each added path will include a cost estimate, a rows estimate and potentially outer dependencies. The optimizer will work to choose a plan that minimizes costs. The error we see currently from smoke_test is indicating that this method did not add a path to baserel and thus the query cannot be planned.

  • GetForeignPlan — responsible for creating a ForeignScan * for the given ForeignPath *. base_path was created by GetForeignPaths and has been chosen by the planner as the access path for the query.

Because it is not strictly necessary, we will ignore GetForeignRelSize for now. Let's implement GetForeignPaths so that we always add a single path, representing a full enumeration of all the integers we return.

--- a/tutorial_fdw.c
+++ b/tutorial_fdw.c
@@ -1,6 +1,7 @@
 #include "postgres.h"
 #include "fmgr.h"
 #include "foreign/fdwapi.h"
+#include "optimizer/pathnode.h"
 Datum tutorial_fdw_handler(PG_FUNCTION_ARGS);
 PG_FUNCTION_INFO_V1(tutorial_fdw_handler);
 
@@ -31,6 +32,16 @@ void tutorial_fdw_GetForeignRelSize(PlannerInfo *root, RelOptInfo *baserel, Oid
 }
 
 void tutorial_fdw_GetForeignPaths(PlannerInfo *root, RelOptInfo *baserel, Oid foreigntableid) {
+        Path *path = (Path *)create_foreignscan_path(root, baserel,
+                NULL,              /* default pathtarget */
+                baserel->rows,     /* rows */
+                1,                 /* startup cost */
+                1 + baserel->rows, /* total cost */
+                NIL,               /* no pathkeys */
+                NULL,              /* no required outer relids */
+                NULL,              /* no fdw_outerpath */
+                NIL);              /* no fdw_private */
+        add_path(baserel, path);
 }
 
 ForeignScan *tutorial_fdw_GetForeignPlan(PlannerInfo *root, RelOptInfo *baserel, Oid foreigntableid,

After this, our smoke test is back to segfaulting. Our returned ForeignPath * told it to use our GetForeignPlan to access that relation, but instead of returning a Plan node that could be executed, tutorial_fdw_GetForeignPlan turned NULL. No problem, now we can just implement tutorial_fdw_GetForeignPlan.

--- a/tutorial_fdw.c
+++ b/tutorial_fdw.c
@@ -2,6 +2,8 @@
 #include "fmgr.h"
 #include "foreign/fdwapi.h"
 #include "optimizer/pathnode.h"
+#include "optimizer/restrictinfo.h"
+#include "optimizer/planmain.h"
 Datum tutorial_fdw_handler(PG_FUNCTION_ARGS);
 PG_FUNCTION_INFO_V1(tutorial_fdw_handler);
 
@@ -46,7 +48,15 @@ void tutorial_fdw_GetForeignPaths(PlannerInfo *root, RelOptInfo *baserel, Oid fo
 
 ForeignScan *tutorial_fdw_GetForeignPlan(PlannerInfo *root, RelOptInfo *baserel, Oid foreigntableid,
     ForeignPath *best_path, List *tlist, List *scan_clauses, Plan *outer_plan) {
-        return NULL;
+        scan_clauses = extract_actual_clauses(scan_clauses, false);
+        return make_foreignscan(tlist,
+                scan_clauses,
+                baserel->relid,
+                NIL, /* no expressions we will evaluate */
+                NIL, /* no private data */
+                NIL, /* no custom tlist; our scan tuple looks like tlist */
+                NIL, /* no quals we will recheck */
+                outer_plan);
 }
 
 void tutorial_fdw_BeginForeignScan(ForeignScanState *node, int eflags) {

The extract_actual_clauses builds a List * of the clause field of the passed in scan_clauses, which are a list of RestrictInfo * nodes. By adding them all in the second parameter to make_foreignscan, we are telling the executor that we are not responsible for enforcing any of those clauses, and it is responsible for evaluating them on the tuples we return and excluding the appropriate ones.

And now we've reached a major milestone. Because if we run smoke_test.sh, we don't crash and we don't get an error! We get the following output from smoke_test.sql:

CREATE EXTENSION
CREATE SERVER
CREATE FOREIGN TABLE
 val 
-----
(0 rows)

Now we just need to generate some rows.

Begin, Iterate, End

The next time postgres calls into our FDW, it's going to be completely done planning the query and it's going to be in the execution phase. It will start one or more scans, using the ForeignScan * nodes we returned to it to construct ForeignScanState * nodes and passing them into tutorial_fdw_BeginForeignScan. Then it will call tutorial_fdw_IterateForeignScan one or more times, until either it no longer needs results from the scan or the scan is complete. Each call to tutorial_fdw_IterateForeignScan will return one new result tuple. Finally, the engine will call tutorial_fdw_EndForeignScan to allow our FDW to clean up anything it needs to.

We won't need to clean up anything for our implementation, so we only need to implement Begin and Iterate. Both of them are quite straightforward.

--- a/tutorial_fdw.c
+++ b/tutorial_fdw.c
@@ -59,11 +59,28 @@ ForeignScan *tutorial_fdw_GetForeignPlan(PlannerInfo *root, RelOptInfo *baserel,
                 outer_plan);
 }
 
+typedef struct tutorial_fdw_state {
+        int current;
+} tutorial_fdw_state;
+
 void tutorial_fdw_BeginForeignScan(ForeignScanState *node, int eflags) {
+        tutorial_fdw_state *state = palloc0(sizeof(tutorial_fdw_state));
+        node->fdw_state = state;
 }
 
 TupleTableSlot *tutorial_fdw_IterateForeignScan(ForeignScanState *node) {
-        return NULL;
+        TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
+        ExecClearTuple(slot);
+
+        tutorial_fdw_state *state = node->fdw_state;
+        if (state->current < 64) {
+                slot->tts_isnull[0] = false;
+                slot->tts_values[0] = Int32GetDatum(state->current);
+                ExecStoreVirtualTuple(slot);
+                state->current++;
+        }
+
+        return slot;
 }
 
 void tutorial_fdw_ReScanForeignScan(ForeignScanState *node) {

ForeignScanState includes a void *fdw_state field as a place for FDW implementations to store state that is used in generating the scan results. We store our scan state there and Iterate accesses and mutates it in order to return the desired results.

On the provided ForeignScanState node we also have a TupleTableSlot which is set up for taking the values of the columns we are returning. We can store a virtual tuple directly to the slot as shown above. If we have more columns to populate, we could do it with slot->tts_values[1] = ... for example.

At this point our FDW passes our smoke test. We still have one unimplemented function to implement: tutorial_fdw_ReScanForeignScan. Because our implementation is so simple, restarting the scan from the beginning is also straightforward.

--- a/tutorial_fdw.c
+++ b/tutorial_fdw.c
@@ -84,6 +84,8 @@ TupleTableSlot *tutorial_fdw_IterateForeignScan(ForeignScanState *node) {
 }
 
 void tutorial_fdw_ReScanForeignScan(ForeignScanState *node) {
+        tutorial_fdw_state *state = node->fdw_state;
+        state->current = 0;
 }
 
 void tutorial_fdw_EndForeignScan(ForeignScanState *node) {

Some Simple Error Checking

Our current implementation is not robust regarding the schema of the created foreign table. Try changing smoke_test.sql to create a table with a text column, or a float column, or multiple columns, or no columns, and see what happens. We definitely don't want unallocated memory access and potential memory corruption in our postgres extension.

To address this, we will add some simple error checking in the planner phase so we can return an error if the schema doesn't match what we expect. The first time our plugin sees the remote table definition is in tutorial_fdw_GetForeignRelSize, so that seems like a natural place to check things.

We can go about this sanity check in a few ways, but here we're just going to do something simple and strict, making assertions against the attribute descriptions of the foreign table relation definition. Doing so looks like this:

--- a/tutorial_fdw.c
+++ b/tutorial_fdw.c
@@ -4,6 +4,8 @@
 #include "optimizer/pathnode.h"
 #include "optimizer/restrictinfo.h"
 #include "optimizer/planmain.h"
+#include "utils/rel.h"
+#include "access/table.h"
 Datum tutorial_fdw_handler(PG_FUNCTION_ARGS);
 PG_FUNCTION_INFO_V1(tutorial_fdw_handler);
 
@@ -31,6 +33,19 @@ tutorial_fdw_handler(PG_FUNCTION_ARGS)
 }
 
 void tutorial_fdw_GetForeignRelSize(PlannerInfo *root, RelOptInfo *baserel, Oid foreigntableid) {
+        Relation rel = table_open(foreigntableid, NoLock);
+        if (rel->rd_att->natts != 1) {
+                ereport(ERROR,
+                        errcode(ERRCODE_FDW_INVALID_COLUMN_NUMBER),
+                        errmsg("incorrect schema for tutorial_fdw table %s: table must have exactly one column", NameStr(rel->rd_rel->relname)));
+        }
+        Oid typid = rel->rd_att->attrs[0].atttypid;
+        if (typid != INT4OID) {
+                ereport(ERROR,
+                        errcode(ERRCODE_FDW_INVALID_DATA_TYPE),
+                        errmsg("incorrect schema for tutorial_fdw table %s: table column must have type int", NameStr(rel->rd_rel->relname)));
+        }
+        table_close(rel, NoLock);
 }
 
 void tutorial_fdw_GetForeignPaths(PlannerInfo *root, RelOptInfo *baserel, Oid foreigntableid) {

This demonstrates a few new constructs. First, we can inspect the description of the foreign table definition in the system catalog using table_open. The RelationData struct which it returns has a number of fields related to the implementation of various postgres features, but it notably has the TupleDesc rd_att field containing the description of the relation's tuple. Above we assert that there is exactly one attribute in this tuple and that it is an int4, which corresponds to the type with which we're populating the virtual tuple stored in TupleTableSlot in our Iterate function (Int32GetDatum up above).

We also see how to report errors up above. There is a list of predefined errcodes and errmsg accepts a formatting string and arguments. Reporting an error of severity ERROR will abort the current query, and even the current C function execution, using setjmp/longjmp machinery. See elog.h for more information, including information about try/catch/finally functionality.

Adding Options

As our last piece of functionality, we will change our FDW so that tables can be annotated with options regarding the start and end of their sequences. This will let us learn about threading some state information from the planning phase to the execution phase.

The CREATE FOREIGN TABLE syntax in postgres supports arbitrary string-valued options. We're going to support the following:

CREATE FOREIGN TABLE sequential_ints ( val int ) SERVER tutorial_server OPTIONS ( start '0', end '64' )

Both options will be optional. start will default to 0. end will default to 64. They will specify a range [start, end). If present, an option will be required to be parse as a base 10 non-negative integer. After options are parsed and defaults are applied, we will throw an error if end < start.

We will access our options from a ForeignTable struct that is accessible through the foreigntableoid. GetRelSize is a perfect place to initially parse and validate these options. Let's do that first.

--- a/tutorial_fdw.c
+++ b/tutorial_fdw.c
@@ -6,6 +6,8 @@
 #include "optimizer/planmain.h"
 #include "utils/rel.h"
 #include "access/table.h"
+#include "foreign/foreign.h"
+#include "commands/defrem.h"
 Datum tutorial_fdw_handler(PG_FUNCTION_ARGS);
 PG_FUNCTION_INFO_V1(tutorial_fdw_handler);
 
@@ -47,6 +49,51 @@ void tutorial_fdw_GetForeignRelSize(PlannerInfo *root, RelOptInfo *baserel, Oid
         }
         table_close(rel, NoLock);
 
+        int start = 0, end = 64;
+
+        ForeignTable *ft = GetForeignTable(foreigntableid);
+        ListCell *cell;
+        foreach(cell, ft->options) {
+                DefElem *def = lfirst_node(DefElem, cell);
+
+                if (strcmp("start", def->defname) == 0) {
+                        char *val = defGetString(def);
+                        if (sscanf(val, "%d", &start) != 1) {
+                                ereport(ERROR,
+                                        errcode(ERRCODE_FDW_ERROR),
+                                        errmsg("invalid value for option \"start\": \"%s\" must be a decimal integer", val));
+                        }
+                } else if (strcmp("end", def->defname) == 0) {
+                        char *val = defGetString(def);
+                        if (sscanf(val, "%d", &end) != 1) {
+                                ereport(ERROR,
+                                        errcode(ERRCODE_FDW_ERROR),
+                                        errmsg("invalid value for option \"end\": \"%s\" must be a decimal integer", val));
+                        }
+                } else {
+                        ereport(ERROR,
+                                (errcode(ERRCODE_FDW_INVALID_OPTION_NAME),
+                                errmsg("invalid option \"%s\"", def->defname),
+                                errhint("Valid table options for tutorial_fdw are \"start\" and \"end\"")));
+                }
+        }
+        if (start < 0) {
+                ereport(ERROR,
+                        errcode(ERRCODE_FDW_ERROR),
+                        errmsg("invalid value for option \"start\": must be non-negative"));
+        }
+        if (end < 0) {
+                ereport(ERROR,
+                        errcode(ERRCODE_FDW_ERROR),
+                        errmsg("invalid value for option \"end\": must be non-negative"));
+        }
+        if (end < start) {
+                ereport(ERROR,
+                        errcode(ERRCODE_FDW_ERROR),
+                        errmsg("invalid values for option \"start\" and \"end\": end must be >= start"));
+        }
+
+        baserel->rows = end - start;
 }
 
 void tutorial_fdw_GetForeignPaths(PlannerInfo *root, RelOptInfo *baserel, Oid foreigntableid) {

That's a bunch of code, but a lot of it is standard C or things we've seen before, such as sscanf and ereport. Take a moment to look at the foreach macro which iterates through the List *options field. In this List, the Cells all have DefElem * values, so we use lfirst_node to work on each value from within our loop. See pg_list.h for more context on the List type in postgres and a lot of convenience functions for working with it.

Now we have two populated local variables with our start and end options. We will thread those through baserel so that they are available to tutorial_fdw_GetForeignPlan and can be put on the ForeignScan node itself. Postgres provides a void *fdw_private field for exactly this purpose. It will not be touched by anything else, and so we are free to store whatever we need to in it.

--- a/tutorial_fdw.c
+++ b/tutorial_fdw.c
@@ -34,6 +34,10 @@ tutorial_fdw_handler(PG_FUNCTION_ARGS)
         PG_RETURN_POINTER(fdwroutine);
 }
 
+typedef struct tutorial_fdw_TableOptions {
+        int start, end;
+} tutorial_fdw_TableOptions;
+
 void tutorial_fdw_GetForeignRelSize(PlannerInfo *root, RelOptInfo *baserel, Oid foreigntableid) {
         Relation rel = table_open(foreigntableid, NoLock);
         if (rel->rd_att->natts != 1) {
@@ -94,6 +98,11 @@ void tutorial_fdw_GetForeignRelSize(PlannerInfo *root, RelOptInfo *baserel, Oid
         }
 
         baserel->rows = end - start;
+
+        tutorial_fdw_TableOptions *opts = palloc(sizeof(tutorial_fdw_TableOptions));
+        opts->start = start;
+        opts->end = end;
+        baserel->fdw_private = opts;
 }
 
 void tutorial_fdw_GetForeignPaths(PlannerInfo *root, RelOptInfo *baserel, Oid foreigntableid) {

Now they will arrive at GetForeignPlan without a problem. We need to get them from GetForeignPlan to Begin/Iterate. But Begin/Iterate don't have access to baserel; it is a planning construct and no longer exists during execution. The ForeignScan * plan node has a fdw_private field as well, but it is of type List *, not void *. And, in fact, because of how execution nodes are used, there are constraints on what we can put into the fdw_private field of the ForeignScan * node. It must be a decently proper node which copyObject will work on.

You can read about nodes in the README and the various source files adjacent to it. It's possible to make our own node type within an extension by using an ExtensibleNode. But that is a bit overkill for this tutorial. Instead, we will just populate a list with two value nodes, having the start and end integers respectively.

--- a/tutorial_fdw.c
+++ b/tutorial_fdw.c
@@ -8,6 +8,7 @@
 #include "access/table.h"
 #include "foreign/foreign.h"
 #include "commands/defrem.h"
+#include "nodes/pg_list.h"
 Datum tutorial_fdw_handler(PG_FUNCTION_ARGS);
 PG_FUNCTION_INFO_V1(tutorial_fdw_handler);
 
@@ -120,12 +121,14 @@ void tutorial_fdw_GetForeignPaths(PlannerInfo *root, RelOptInfo *baserel, Oid fo
 
 ForeignScan *tutorial_fdw_GetForeignPlan(PlannerInfo *root, RelOptInfo *baserel, Oid foreigntableid,
     ForeignPath *best_path, List *tlist, List *scan_clauses, Plan *outer_plan) {
+        tutorial_fdw_TableOptions *opts = baserel->fdw_private;
+        List *fdw_private = list_make2(makeInteger(opts->start), makeInteger(opts->end));
         scan_clauses = extract_actual_clauses(scan_clauses, false);
         return make_foreignscan(tlist,
                 scan_clauses,
                 baserel->relid,
                 NIL, /* no expressions we will evaluate */
-                NIL, /* no private data */
+                fdw_private, /* pass along our start and end */
                 NIL, /* no custom tlist; our scan tuple looks like tlist */
                 NIL, /* no quals we will recheck */
                 outer_plan);

list_make2 makes a two element list, and makeInteger is creating value nodes which respect copyObject, and thus the contract. Now we can use what has been passed along by accessing the plan node's fdw_private from our execution node and using it to populate our fdw_state.

--- a/tutorial_fdw.c
+++ b/tutorial_fdw.c
@@ -136,10 +136,14 @@ ForeignScan *tutorial_fdw_GetForeignPlan(PlannerInfo *root, RelOptInfo *baserel,
 
 typedef struct tutorial_fdw_state {
         int current;
+        int end;
 } tutorial_fdw_state;
 
 void tutorial_fdw_BeginForeignScan(ForeignScanState *node, int eflags) {
         tutorial_fdw_state *state = palloc0(sizeof(tutorial_fdw_state));
+        ForeignScan *fs = (ForeignScan *)node->ss.ps.plan;
+        state->current = intVal(linitial(fs->fdw_private));
+        state->end = intVal(lsecond(fs->fdw_private));
         node->fdw_state = state;
 }
 
@@ -148,7 +152,7 @@ TupleTableSlot *tutorial_fdw_IterateForeignScan(ForeignScanState *node) {
         ExecClearTuple(slot);
 
         tutorial_fdw_state *state = node->fdw_state;
-        if (state->current < 64) {
+        if (state->current < state->end) {
                 slot->tts_isnull[0] = false;
                 slot->tts_values[0] = Int32GetDatum(state->current);
                 ExecStoreVirtualTuple(slot);

The only new stuff here is linitial and lsecond, extracting the first and second pointer value from the cells of a List; intVal, extracting the integer of a value node; and the chain of field accesses to get from the ForeignScanState to the ForeignScan plan node that created it.

Wrapping Up and Next Steps

If you've made it this far, congratulations. Try updating the smoke_test.sql script to make use of the new functionality, and to exercise the functionality of the query engine further. Here are some fun near-at-hand exercises:

  • Update smoke_test.sql to create multiple tables with different ranges and query them in various ways — joins, unions, where clauses, subselects.

  • Add another table option, step, defaulting to 1. Require step to be non-zero and positive.

  • Remove the restriction of start and end being non-negative.

  • Remove the restriction on step being positive and end >= start. Still throw an error if step does not move from start to end.

  • Extend Iterate and the validation logic to allow for more numeric column types. bigint (INT8OID) and smallint (INT2OID) are natural extensions. bigint might require careful handling on 32-bit platforms and it's easy to trick yourself into thinking it's working if you're only testing on 64-bit architectures. Can you extend the validation logic to avoid overflow and underflow? Can you support float types as well?

After gaining familiarity with what a bare-bones postgres FDW extension looks like, the fantastic postgres documentation might seem a lot more accessible. Here are some good resources to follow up with:

And, of course, check out the implementations of the existing foreign data wrappers linked above.

PostgreSQL and Dolt

Dolt is a SQL database that supports Git-like functionality, including branch, merge, diff and clone. Dolt is written in Go and it implements its own SQL engine that aims for MySQL compatibility. So why are we blogging about how to implement PostgreSQL foreign data wrappers? Well, a lot of our customers are interested in PostgreSQL or already have a substantial investment in PostgreSQL. We think it would be great if people could use PostgreSQL to access Dolt tables and all of their related functionality such as the commit graph.

Because Dolt is compatible with MySQL, the existing mysql_fdw provides an avenue for integrating from PostgreSQL with Dolt. But we think there is also opportunity for native extensions that expose the full power of Dolt without the constraints and overhead of separate servers, network protocols, mismatched type systems, etc.

We're still exploring exactly what may be possible here, but we're definitely excited about the future. We hope you'll keep an eye on this space and let us know your thoughts on Twitter, Discord and GitHub.

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt

Or join our mailing list to get product updates.