We’re building Doltgres, the world’s first version-controlled Postgres-compatible database. We want Doltgres to be a drop-in replacement for Postgres, which means we have to support every feature that Postgres does, so that customers can port their existing database application to Doltgres without changing any code.
The only issue is… Postgres has a lot of features. Like, a lot. Most people will never use the vast majority of them, but drop-in compatibility means tackling that long tail of seldom-used features. Today we want to talk about one such feature you might never have heard of or used: table composite types.
What’s a table composite type?#
When you create a table in Postgres, Postgres automatically and implicitly creates a new type that represents the tuples that table will store. This is best illustrated with an example.
postgres=# \c test
You are now connected to database "test" as user "postgres".
test=# create table tuple_type_table(a int primary key, b varchar(100));
CREATE TABLE
Now if I query the pg_catalog.pg_class table, I can see that Postgres created a composite type for
this new table:
test=# SELECT
t.oid,
n.nspname AS type_schema,
t.typname AS type_name,
t.typtype,
t.typcategory,
t.typrelid
FROM pg_class c
JOIN pg_type t
ON t.oid = c.reltype
JOIN pg_namespace n
ON n.oid = t.typnamespace
WHERE c.relname = 'tuple_type_table';
oid | type_schema | type_name | typtype | typcategory | typrelid
-------+-------------+------------------+---------+-------------+----------
32809 | public | tuple_type_table | c | C | 32807
(1 row)
There’s my type, which has the same name as the table it corresponds to. Neat! What can we do with this type? Well, as far as Postgres is concerned, it’s just an ordinary composite type like any other, which means that we can use it anywhere we can use composite types: in function signatures, as columns in other tables, etc. Let’s look at a few examples.
Table composite types in functions#
Sometimes it’s convenient to write a function that takes as input every column in a table, which makes the function easier to call. Table composite types are a great way to do this. Let’s create a table for tracking grocery inventory.
test=# CREATE TABLE groceries (id INT4 PRIMARY KEY, name TEXT NOT NULL, qty INT4 NOT NULL, price REAL NOT NULL);
CREATE TABLE
test=# INSERT INTO groceries VALUES (1, 'apple', 3, 2.5), (2, 'banana', 5, 1.2);
INSERT 0 2
Now, let’s create a function that takes a row from this table as its parameter. This function will tell us the total value of our inventory for that item.
test=# CREATE FUNCTION total(t groceries) RETURNS REAL AS $$
BEGIN RETURN t.qty * t.price; END;
$$ LANGUAGE plpgsql;
CREATE FUNCTION
test=# select name, total(groceries) from groceries;
name | total
--------+-------
apple | 7.5
banana | 6
(2 rows)
The syntax to invoke a function this way is a little weird — you use the table alias as the parameter name in the function call. But it makes sense in context and lets you write functions that refer to a table’s columns natively, by their names. This is a natural way to express functions that operate on a row in a table.
Table composite types in other schemas#
There’s a second way you can use table composite types, and it’s quite a bit weirder and probably not a good idea most of the time. Since table composite types are just composite types, you can use them as the type of a column in another table, essentially embedding a row from one table in another.
postgres=# \c test;
You are now connected to database "test" as user "postgres".
test=# create table t1 (a int, b varchar(3));
CREATE TABLE
test=# create table t2(id serial, t1 t1);
CREATE TABLE
test=# insert into t2(t1) values (row(1, 'abc'));
INSERT 0 1
test=# select * from t2;
id | t1
----+---------
1 | (1,abc)
(1 row)
There’s nothing weird here except that the composite type being used for b is declared implicitly
by the schema of t1.
So what happens when that schema changes? Let’s try adding a column.
test=# alter table t1 add column c varchar(10) not null;
ALTER TABLE
test=# select * from t2;
id | t1
----+----------
1 | (1,abc,)
(1 row)
test=# insert into t2(t1) values (row(1, 'abc', '12345'));
INSERT 0 1
test=# select * from t2;
id | t1
----+---------------
1 | (1,abc,)
2 | (1,abc,12345)
(2 rows)
This is a bit weird: we’ve declared our new column c to be NOT NULL, but Postgres won’t alter
the values already inserted in another table to enforce this constraint. Instead, the existing
tuples in t2 now reflect the current schema of t1 but with NULL values for the (NOT NULL)
column added since insertion.
What about if we drop a column?
test=# alter table t1 drop column b;
ALTER TABLE
test=# select * from t2;
id | t1
----+-----------
1 | (1,)
2 | (1,12345)
(2 rows)
Dropping the column works fine, but it causes data loss everywhere that tuple was used: Postgres will now report tuples from before the schema change as no longer having that column.
Finally, you can take this use case to its logical conclusion by inserting rows from another table
directly into the embedding table with a SELECT clause.
test=# insert into t1 values (2, 'def'), (3, 'xyzzy');
INSERT 0 2
test=# insert into t2(t1) select t1 from t1;
INSERT 0 2
test=# select * from t2;
id | t1
----+-----------
1 | (1,)
2 | (1,12345)
3 | (2,def)
4 | (3,xyzzy)
(4 rows)
Again we see the slight awkwardness of selecting the table alias as a column expression, which returns the entire row tuple as a single composite value. Here are a few similar expressions you might imagine would work but do not.
test=# insert into t2(t1) select * from t1;
ERROR: INSERT has more expressions than target columns
LINE 1: insert into t2(t1) select * from t1;
^
test=# insert into t2(t1) select row(*) from t1;
ERROR: syntax error at or near "*"
LINE 1: insert into t2(t1) select row(*) from t1;
^
test=# insert into t2(t1) select row(a,c) from t1;
ERROR: cannot cast type record to t1
LINE 1: insert into t2(t1) select row(a,c) from t1;
Should I embed rows from one table in another?#
No, probably not. This feature has awkward syntax and some sharp edges, with potentially very surprising behavior around schema changes. Doing this goes against commonly accepted best practices in schema design, which dictate that you should usually normalize your data and store such records in their own table, recording the relationship with a foreign key constraint. But there are always exceptions, and it’s good to be aware of your options.
Doltgres can do this too#
The reason I’m writing this blog post is to better understand how this feature works in Postgres, since we recently added support for it in Doltgres. And it works the same there:
postgres=> create table t1 (a int, b varchar(3));
CREATE TABLE
postgres=> create table t2(id serial, t1 t1);
CREATE TABLE
postgres=> insert into t2(t1) values (row(1, 'abc'));
ERROR: ASSIGNMENT_CAST: target is of type t1 but expression is of type record: RECORD EXPR (errno 1105) (sqlstate HY000)
postgres=> insert into t2(t1) values (row(1, 'abc')::t1);
INSERT 0 1
Well, almost the same. Postgres automatically converts the record type to the appropriate composite type without an explicit cast, but Doltgres needs one. We’re working on that. We also have some work to do with schema changes:
postgres=> alter table t1 add column c varchar(10) not null;
ALTER TABLE
postgres=> insert into t2(t1) values (row(1, 'abc', '12345')::t1);
INSERT 0 1
postgres=> select * from t2;
id | t1
----+---------------
2 | (1,abc,12345)
1 | (1,abc)
(2 rows)
Here the older tuple is missing the new column value (NULL). There are some other
issues too which we’re working on, but the
feature is stable enough to unblock several early customers who needed it.
Conclusion#
Want to discuss the Postgres type system, or learn more about Doltgres? Visit us on the DoltHub Discord where our engineering team hangs out all day. Hope to see you there.
