Are Golang Generics Simple or Incomplete? A Design Study
This comment on ycombinator lives rent-free in my head. Nothing before or since has captured my feelings on Go quite as succinctly.
Take for instance, generics. Or its more precise name: parametric polymorphism.
For the first ten years of Golang's life, it didn't support generics. This was a deliberate decision by the language authors, defended on the basis of avoiding unnecessary complexity.
But inevitably, Go caved to demand and added generics in March 2022, with version 1.18. This decision was met with a lot of pushback by Go devs who strongly felt that generics are unnecessary.
At DoltHub, we use generics in our code when we think that they reduce code duplication, make code clearer, or improve performance along critical paths. We think that generics can be used to write very clean code. We also know they also be used to write messy code.
And yet, generics in their current form feels like a compromise that pleases no one. I'm not the first person to feel this way either.
Go generics are very simple and limited by design, which forbids the unreadable horrors found in esoteric C++ template metaprogramming. But it's still very easy to write overwrought generic code whose added complexity and opaqueness hamper readability and maintainability. Critics point to these limitations as evidence that generics don't actually solve problems, and thus aren't actually necessary.
On the other hand, there's an argument to be made that unreadable Go generics are because of the limitations of the design. More powerful language features around generics could allow for code that is more succinct, less complex, and more maintainable.
Recently I encountered a problem that seemed like it was a perfect use case for generics, only to be held back by the language. It's an interesting case study in the "simple vs incomplete" debate.
Today, we'll explore the problem, why the obvious solution didn't work and introduced unwanted complexity, and the actual solution that still uses generics but solves it in a cleaner way. It's a problem I've seen crop up from time to time, although I don't know if the code pattern I used has an accepted name. Depending on how you feel about generics, you may see this as a clever hack, or a code smell. And personally, I'm still on the fence about it. I'm curious how readers feel about it.
The Problem
Here's a simplified version of the situation I found myself in:
Dolt is a version controlled SQL database: think git but for tables. Like git, it's content-addressed: everything in the database is a read-only chunk identified by a hash of its contents. We have types that represent different kinds of data stored in the database, and one of our most important types is Map
. Like the name suggests, Map
is a dictionary that maps keys onto values, backed by the database. Since database chunks are read-only, this means that Map
s are immutable.
But of course, the data in the database can change. In order to efficiently make changes to the database, we have another type, MutableMap
, which wraps the Map and also stores a buffer of changes. When the buffer gets too large, the changes get flushed, creating a new Map
with a new content address.
Finally, let's say we have a third type called Index
, which also wraps a Map
and enables it to be used by the database engine by implementing all those things that indexes need to do, like cursors and range lookups and such.
We're currently in the process of adding support for vector indexes. Vector indexes are structured differently than regular indexes, and support a different set of operations. But there's a lot of overlap between them, and much of the code for working with indexes doesn't need to know what type of index its using. This required us to make new versions of all three of these types, with different implementations for the same functions.
The simplest way to accomplish this would be to make all three of these types interfaces and provide multiple implementations for each. So the interfaces might look something like this:
type Map interface {
Mutate() MutableMap
}
var _ Map = BasicMap{}
var _ Map = VectorMap{}
type MutableMap interface {
Flush() Map
}
var _ MutableMap = MutableBasicMap{}
var _ MutableMap = MutableVectorMap{}
type Index interface {
GetMap() Map
SetMap(m Map)
}
var _ Index = BasicIndex{}
var _ Index = VectorIndex{}
However, doing this causes several issues:
- Each Index implementation is tied to a specific Map implementation, yet it's possible to call VectorIndex::SetMap() and pass in a BasicMap, or vice versa.
- If we know the concrete type of a
Map
, we should be able to know the concrete type returned by callingMutate
on it. But the method always returns theMutableMap
interface, creating situations where we may need to perform a cast. - Calling a method on any of these interfaces now involves a dynamic dispatch, which could have performance implications if used in a performance-critical loop.
Essentially, all of these issues of the same underlying cause: the code neither documents or asserts the relationship between the different implementations. That is, BasicMap, MutableBasicMap, and BasicIndex are all related, and VectorMap, MutableVectorMap, and VectorIndex are related, but the code is unable to assume or enforce this relationship.
More generally, the issue occurs whenever we have two or more types that collectively implement some contract, and only implement the contract together. Most languages don't have a specific notion for this, but it's a pattern that comes up frequently.
We'd really prefer not to have to add typecasts everywhere we're calling these methods. We'd also really prefer to not have to verify the runtime type within the SetMap
function. Even if we're 100% confident that these checks and casts will never fail, the whole purpose of having a type system is so that the compiler checks this for us.
There's some things we could do to work around this. For instance, each struct could define two versions of each of these methods: one which is used to implement the interface and returns another interface, and one which returns the more specific implementation. Code which knows the concrete type of an object can call the more specific one, and code that's meant to be implementation agnostic can call the general one. But not only does that double the number of methods in our API, it doesn't actually solve all our problems.
Imagine we have some hypothetical code updates the Map contained within an Index by applying a series of mutations. It might look something like this:
func ApplyEditsToIndex(index Index, edits Edits) {
oldMap := index.GetMap()
mutableMap := oldMap.Mutate()
mutableMap.ApplyEdits(edits)
newMap := mutableMap.Flush()
index.SetMap(newMap)
}
Our goal is to find some way to make the above function work without any typecasts or type checks: we want to be able to know at compile time that oldMap
and newMap
have the same underlying type, and somehow constrain the call to SetMap
to use that type.
This seems like a place where generics could come in handy.
The Failed Attempt
One way that generics allow us to establish relationships between types is by implementing generic interfaces. For instance, we could make MutableMap generic, based on the specific Map implementation that it wraps:
type MutableMap[MapType Map] interface {
Flush() MapType
...
}
var _ MutableMap[BasicMap] = MutableBasicMap{}
var _ MutableMap[VectorMap] = MutableVectorMap{}
This works just fine, and now, every implementation of MutableMap
actually implements a specialized interface MutableMap[M]
, where M is the corresponding implementation of Map.
Huzzah! Except... what about the reverse operation, where we convert a Map
into a MutableMap
? We need to do the same thing, and make Map
generic.
type Map[MutableMapType MutableMap] interface {
Mutate() MutableMapType
}
var _ Map[MutableBasicMap] = BasicMap{}
var _ Map[MutableVectorMap] = VectorMap{}
Except wait, we just made MutableMap generic, so we can't use it unparameterized like that. We have to add a type parameter to our type parameter. But what type do we use?
type Map[MutableMapType MutableMap[SOMETHING]] interface {
Mutate() MutableMapType
}
What goes in place of the SOMETHING
?
The type that we put there will be the return type of calling Mutate().Flush()
, which we want to be... the type that's implementing the interface. Basically, we want to say that if we have a type X that implements Map[Y], then Y must implement MutableMap[X]... but we can't actually express that kind of recursive relationship.
Some languages have a notion of a Self
type which can appear in type constraints and always refers to the type conforming to that constraint. Other languages like C++ don't have that, but have something called the Curiously recurring template pattern instead. But Golang doesn't support either.
But an astute observer might notice that we don't depend on any specific properties of the parameterized type. We know in practice it will always extend a specific interface, but we don't depend on that. So we can just stick an any
in it and call it a day.
type Map[MutableMapType any] interface {
Mutate() MutableMapType
...
}
type MutableMap[MapType any] interface {
Flush() MapType
...
}
type Index[MapType any] interface {
GetMap() MapType
SetMap(MapType)
}
var _ MutableMap[BasicMap] = MutableBasicMap{}
var _ MutableMap[VectorMap] = MutableVectorMap{}
var _ Map[MutableBasicMap] = BasicMap{}
var _ Map[MutableVectorMap] = VectorMap{}
var _ Index[BasicMap] = BasicIndex{}
var _ Index[VectorMap] = VectorIndex{}
This works... in the sense that these are valid interfaces that are implemented by these structs. The use of any
here isn't even that egregious.
Unfortunately, it turns out that doing it this way doesn't actually solve our problem.
In order to fix the typechecking in ApplyEditsToIndex
, it needs to itself become a generic function:
func ApplyEditsToIndex[IndexType SOMETHING](index IndexType, edits Edits) {
oldMap := index.GetMap()
mutableMap := oldMap.Mutate()
mutableMap.ApplyEdits(edits)
newMap := mutableMap.Flush()
index.SetMap(newMap)
}
What do we put in place of SOMETHING
? It needs to be a type that has a GetMap
method... so it has to be a specialization of Index
. But the return type of GetMap
needs to be a type that has a Mutate
method, and that needs to return a type that has a Flush
method, which needs to be a valid input to SOMETHING.SetMap
... which is once again a recursive type constraint, which isn't allowed.
We haven't actually solved anything. All we've done is move the problem elsewhere in the code, bringing the use of cumbersome generics along with it.
UPDATE: user /u/ar1819
on Reddit pointed out that while type parameters can't reference themselves, they can reference each other cyclically. Thus, the above function declaration can be written like so:
func ApplyEditsToIndex[MT MutableMap[M], M Map[MT], IndexType Index[M]](index IndexType, edits Edits) { ... }
Callsites to ApplyEditsToIndex
even participate in type inference, meaning that while the declaration is a bit messy, the callsites themselves are clean.
This has its limitations. /u/ar1819
points out that is has some gotchas around pointer receivers. In addition, this only works for type constraints, and not similar contexts like type aliases. For example, the following is still prohibited:
type MT = MutableMap[M]
type M = Map[MT]
Many thanks to /u/ar1819
for pointing this out!
Still, the fact that we have to parameterize it this way in the first place exposes an uncomfortable truth: the use of generics here is starting to infect its use sites, adding additional boilerplate and complexity.
Oh, and I've been hiding some of the details that make this even more complicated. Namely:
- The Map type already takes type parameters describing the key and value types.
- These types exist across multiple different packages, and if we're not careful in adding some of these type parameters, we'll create package dependency cycles.
Neither of these is impossible to deal with, but both further complicate things by forcing us to not only use parameterized types whose type parameters are themselves parameterized types, but also forcing us to carefully consider the structure of our packages to avoid cycles. The type parameters would be long, repetitive, and cumbersome... and most importantly still fail to solve the problem that we set out to solve.
At this point, it starts to feel like the generic critics have a point. Maybe generics don't actually solve problems. Maybe they don't reduce complexity, just move it around. Maybe simple is better, even if it means a bit of code reuse and explicit typecasts. Extra type safety isn't worth making your code unreadable.
On the other hand, maybe Golang generics create these problems because they're incomplete.
I eventually worked out a solution.
The Solution
Previously, we tried to express the connection between VectorIndex
, VectorMap
, and VectorMutableMap
by having them implement generic interfaces. It turns out, this doesn't work because while we can implement these interfaces, we'll still have to supply type parameters at the point of use, and their recursive definition makes that impossible. We need some other way to express their connection in the type system.
The solution isn't to express the connection via these types (and the interfaces they implement), but to create a new type whose sole responsibility is to describe how the other types are related. This new type can exist in the same package as the code that actually makes use of all the other types, and thus depend on all of them without any risk of a dependency cycle.
The best way to demonstrate this is with code. Our new interface is a "contract" of sorts that describes how this family of types is related. Each implementation of this interface describes how a set of types implement the required interfaces. Here's what it might look like:
type IndexContract[IndexType Index, MapType Map, MutableMapType MutableMap] interface {
Mutate(MapType) MutableMapType
Flush(MutableMapType) MapType
GetMapFromIndex(IndexType) MapType
SetMapOnIndex(IndexType, MapType)
}
type VectorIndexContract struct {}
func (VectorIndexContract) Mutate(m VectorMap) VectorMutableMap {
return m.Mutate()
}
func (VectorIndexContract) Flush(m VectorMutableMap) VectorMap {
return m.Flush()
}
func (VectorIndexContract) GetMapFromIndex(i VectorIndex) VectorMap {
return i.m
}
func (VectorIndexContract) SetMapOnIndex(i VectorIndex, m VectorMap) {
i.m = m
}
var _ IndexContract[VectorIndex, VectorMap, VectorMutableMap] = VectorIndexContract{}
It's not all roses. Using this pattern finally gives us a way to make our ApplyEditsToIndex
function generic, but it's not the prettiest:
func ApplyEditsToIndex[IndexType Index,
MapType Map,
MutableMapType MutableMap,
IndexContractType IndexContract[IndexType, MapType, MutableMapType],
] (contract IndexContractType, index IndexType, edits Edits) {
oldMap := contract.GetMapFromIndex(index)
mutableMap := contract.Mutate(oldMap)
mutableMap.ApplyEdits(edits)
newMap := contract.Flush(mutableMap)
contract.SetMapOnIndex(index, newMap)
}
// Elsewhere in the code:
ApplyEditsToIndex(VectorIndexContract{}, index, edits)
Fortunately, this ugliness is safely limited to this single method definition. Even calling this function is easy because of type inference: we pass in the contract and all the type parameters get inferred. What's even nicer is that we don't have to modify the implementation of these types at all. We get to cleanly separate the design of these types from the concept of them being in a "contract" together.
Could We Do Better?
Still, it shouldn't have to be this hard, or this messy. Next time, I'll discuss some language features that other languages have that Golang could adopt to improve the expressiveness of generics. We'll explore how those features could be applied to this design pattern, and whether the result would increase the complexity of the code, or if the increased expressiveness allows for code that is simpler, cleaner, and more maintainable. Telling the difference can be a useful signal on whether stronger generics would only complicate Go, or if they would make it a more complete language.
In the meantime, what do you think about this approach? Does this solution add unneeded complexity, or is it a reasonable way to type-check when you have an API that spans multiple interface types? What would you do in a situation like this? If you have strong feelings about generics in Golang, feel free to join our Discord and drop me a line.