Is Postgres Too Customizable?

May 14, 2024

5 min read

We're steadily making progress on DoltgreSQL, which is a version of Dolt built to be a drop-in replacement for PostgreSQL. With the goal of being a drop-in replacement, we're doing a lot of research to make sure that we're implementing the same functionality that any user would expect. Dolt is a drop-in replacement for MySQL, so we already have experience in this type of research, however I wanted to share a major design difference with Postgres compared to MySQL. I doubt anyone would expect SELECT 9 + 5; to return '4', so let's take a look at what makes Postgres so interesting.

Functions Rule The World

In MySQL, you can treat the building blocks of a statement as a set of discrete operations. For example, the + operator handles addition, and works for a specific set of types. SELECT 5 + 9; is valid and returns a number, while SELECT '042 ' + '11'; is also valid and also returns a number. Perhaps one would find it more intuitive that using + on two strings should result in concatenation (like in many programming languages), however that is just not what MySQL does here. As a result, SELECT 'hello' + 'world'; is a valid query that returns 0, since neither string may be interpreted as a number.

Postgres takes a different approach. The + operator also handles addition and behaves in a similar way to MySQL at first glance, but Postgres allows users to create their own operators using CREATE OPERATOR. Using our previous example, SELECT 'hello' + 'world';, Postgres returns an error stating that there's type ambiguity, so we can fix that by adding casts: SELECT 'hello'::text + 'world'::text;. Now Postgres returns an error saying that the operator does not exist for our two types. We can use CREATE OPERATOR to make it such that + for two text types results in their concatenation:

CREATE OPERATOR + (
  PROCEDURE = textcat,
  LEFTARG = text,
  RIGHTARG = text
);

The procedure that we're referencing is the built-in function for text concatenation, as SELECT 'hello'::text || 'world'::text; is the idiomatic way to concatenate two strings in Postgres, and textcat is the underlying function. While seemingly innocuous, this is a major difference to how MySQL operates, as Postgres is essentially composed of two things: tables and functions.

This isn't strictly true, as there are additional elements outside of these two things, but you can broadly think of the majority of Postgres' functionality using these two concepts. Operators are basically shorthand for functions. Casts between types are implemented as functions, and you can have different functions for different scenarios (for example, explicit casts or implicit casts). Even types themselves are a collection of functions for various scenarios. The query SELECT 1::int4 + 2::int4; is equivalent to SELECT int4pl(1::int4, 2::int4); from the execution's standpoint. For those familiar with Postgres, this isn't surprising, and many are already aware of this.

All of these functions are declared within the system tables, alongside tables for specific functionality such as mapping operators to functions. With these being tables though, that means that they're editable assuming you have the proper permissions.

True Postgres Customization

Let's revisit our first addition statement, SELECT 5 + 9;, and we'll throw casts on there to make our types explicit: SELECT 5::int4 + 9::int4;. As mentioned earlier, this will call the int4pl function and return an int4 result of 14. Let's change this to subtract the numbers instead, and return a text type. (I don't recommend trying this on a real instance.)

CREATE FUNCTION subtract_integers(a INT4, b INT4)
RETURNS TEXT AS $$
BEGIN
    RETURN (a - b)::text;
END;
$$ LANGUAGE plpgsql;
UPDATE pg_operator SET oprcode = 'subtract_integers', oprresult = 25 WHERE oid = 551;

Now our SELECT 5::int4 + 9::int4; query returns '4'. I've broken no rules here, and this is all very clearly defined and allowed in Postgres, which is pretty incredible. With great power comes great responsibility, and Postgres truly allows one to cripple their server in a very thorough way.

While the majority of Postgres users will never modify their server in such a way, the fact that it's allowed means that DoltgreSQL needs to also allow this behavior to truly be drop-in compatible. Someone out there relies on this behavior, and thus it needs support. While MySQL does not have this level of customizability, there are plenty of interesting decisions that do not seem intuitive, yet are relied upon by at least one of our customers. Supporting this is fairly far down the list regarding our priorities, but it's there.

Other Considerations

The breadth of functionality defined by functions is interesting, but implementing it is fairly straightforward from a philosophical standpoint. We either support it or we don't. However, what about those cases where it's not so straightforward?

Recently we were implementing support for the oid type, and we were testing against our locally-installed copies of Postgres to ensure the correct output. In particular, we were testing explicit casts from the text type to the oid type. One of my coworkers observed that SELECT ('-2147483648'::text)::oid; returned 2147483648, which seemed fairly strange. -2147483648 is the minimum value for an int32 (or int4 in Postgres), and oid is a uint32, so stripping the negative rather than erroring seemed a bit strange, but still makes sense. Trying SELECT ('-1'::text)::oid; returned a value of 4294967295, implying that the value is being converted to an int32 and then the bytes are simply being reinterpreted as a uint32. To further test this theory, SELECT ('-2147483649'::text)::oid; failed, as -2147483649 is not a valid int32 value, and therefore the query errors.

Except, the above statement is not true everywhere. Running SELECT ('-2147483649'::text)::oid; on a different machine succeeded by returning a value of 2147483647, and SELECT ('-4294967295'::text)::oid; returns a value of 1. Different behavior is observed across different machines, and it also persists across different versions (Postgres 14, 15) on these machines. It seems to be related to the compiler used while building Postgres, although this has not been thoroughly tested to confirm it. Regardless, it raises the question: which do we choose for our implementation? Do we choose the one that is more in-line with the version used by the largest number of people, or do we choose the one that we think makes the most sense? Perhaps we support both and offer a flag so that users may toggle their desired functionality, but that adds an additional layer of complexity and complication to our product, alongside deciding which one is the default if no flag is provided. There isn't a clear-cut answer for this, but it's one of many challenges that we're working through to fulfill our goal.

Conclusion

Both Dolt and DoltgreSQL are the world's first databases built from the ground up to support versioning (branch, merge, diffs, etc.) of all your data and schemas. Dolt is already used in production and supports MySQL's dialect and protocols. DoltgreSQL is in its early stages and supports Postgres' dialect and protocols, and is currently recommended as a replica of a primary Postgres server.

As we continue development on DoltgreSQL, I'm sure we'll run into many more interesting stories such as these, so stay tuned to read more in the future! You can find us on Twitter/X and chat with us on Discord. We also welcome all issue reports! Thank you for reading!

Blog

Functions Rule The World

True Postgres Customization

Other Considerations

Conclusion

Get started with Dolt