Much Ado About Nil Things: More Go Pitfalls

GOLANG
9 min read

Previously, I wrote an article about pitfalls befouling Go newcomers. These were each lessons that I personally learned while working on go-mysql-server, a drop-in replacement for MySQL written entirely in Go. We made it to be the best pure-Go MySQL implementation around, and it's the query engine that we use in Dolt, the world’s first version controlled database.

Some of the readers shared their own suggested pitfalls. /u/donatj talked about how easy it is to forget to close resources by forgetting to defer Close methods. There’s a best practice that avoids this: functions that acquire resources should return a callback function that releases them. That way, if the caller binds the callback to a variable but never calls it, they get a compiler error. Unfortunately, if these functions are in a module that the user doesn’t control, there’s not much they can do.

/u/CuteTechGuy responded with “IMHO those are not pitfalls but just RTFM situations.” And you know what? /u/CuteTechGuy is technically correct. Every pitfall in the original article is clearly documented by the language.

Some of them, such as how receiver types in methods can be either values or pointers, are even completely reasonable design decisions in a vacuum. This wouldn’t be a pitfall at all if this is someone’s first exposure to methods: in some ways it's more intuitive this way, because it makes method receivers function much more closely to method parameters. But for lots of people, Go is not their first language. And someone who is coming from C++, or a similar language where method receivers are always pointers, is inevitably going to trip over this.

In my personal opinion, a lot of the sharp edges in Go come from how the language presents itself as a C-style object oriented language, when it really isn’t. Go’s more unusual behavior is often reasonable on its own, just confusing if you assume it behaves like C++ or Java.

The trouble is, Go does little to disabuse the notion that it’s part of the C family. If anything, it’s almost deceptive in how much it tries to look and behave like C. Its attempts to hide its inner workings only serve to obscure real understanding of the language.

So if approaching Go with C’s mental model is a recipe for pain, then the best remedy for Go footguns is to peel back the curtain and understand how the language actually works. Then, we can build a new, more accurate mental model.

The pitfall in today’s article demonstrates this mindset. In fact, I wouldn’t even think of it as a pitfall. Think of it as an opportunity to improve our mental model.

Misconception: nil is the same as null

nil is an identifier for the zero value in many of the language’s types. It’s reminiscent of null in other languages, but it would be a mistake to treat them the same way. nil is not null. In fact, sometimes nil isn’t even nil. Behold:

type S struct {}
var s *S = nil
var i1 interface{} = s
fmt.Println(i1)
if i1 != nil {
    panic("nil isn’t nil? What madness is this?")
}

Output:

<nil>
panic: nil is not nil? What madness is this?

(Try it yourself)

This result seems baffling. Our mental model tells us that this block of code is only doing two simple things: assigning a value nil to an interface, then comparing that interface with the value we just assigned to it (nil).

Unfortunately, neither of those statements are accurate. This is a case where the language, ever helpful, tries to hide its complexity from us in order to look more like C++ or Java. It really wants us to think of nil just like we think of null. But that would be a mistake.

This isn’t hypothetical either. I got bit by this last month when writing tests for go-mysql-server. Since database rows are heterogeneous structures (if you don’t have the schema), we represent them internally as a slice of interface{}s. Many of our tests execute queries and then check that the result matches an expected value. If the row contains the SQL value NULL, the engine represents that in the results as nil. I wrote my test that expected the result to contain nil. And yet, the test failed. These two nil values were not equal to each other. But how could this be?

In languages with null, null is a single value that can be assigned to any pointer type. But nil isn’t null. While nil is a single identifier in source code, it’s not a single value. Its value is entirely dependent on whatever type is inferred for it. You can see this clearly when you try to use nil without assigning a type for it:

a := nil

Output:

error: use of untyped nil in assignment

(Try it yourself)

It turns out, there are multiple different nil values, and they aren’t equal to each other. Usually, nils of different types can’t be compared. But interfaces are a pathway to many abilities type-safe programmers consider unnatural:

var slice1 []int = nil
var slice2 []string = nil
fmt.Println(slice1)           // prints []
fmt.Println(slice2)           // prints []
fmt.Println(slice1 == slice2) // error: invalid operation: slice1 == slice2 (mismatched types []int and []string)

(Try it yourself)

var slice1 []int = nil
var slice2 []string = nil
var i1 interface{} = slice1
var i2 interface{} = slice2
fmt.Println(i1)              // prints []
fmt.Println(i2)              // prints []
fmt.Println(i1 == i2)        // prints false

(Try it yourself)

This isn’t really that surprising: a nil int slice and a nil string slice might look the same (they even have the same representation in memory) but under the type system, they’re fundamentally different things. The compiler is smart enough to reject your attempts to compare them, because the comparison is nonsensical. And if you use interfaces to skirt the compiler, you’re rightly told at runtime that they aren’t equal.

Both variables were both assigned a value of nil. But those nils are not the same value because they are not the same type.

Similarly, you could use interfaces to compare two nil pointers. This time, we’ll even use unsafe.Pointer in order to look at the actual value of the variables in memory.

type S struct{}
type T struct{}
var s *S = nil
var t *T = nil
// fmt.Println(s == t)  // commented out because like before, this wouldn’t compile.
var i1 interface{} = s
var i2 interface{} = t
fmt.Println(i1)                              // prints <nil>
fmt.Println(i2)                              // prints <nil>
fmt.Println((uintptr)(unsafe.Pointer(s)))    // prints 0
fmt.Println((uintptr)(unsafe.Pointer(t)))    // prints 0
fmt.Println(i1 == i2)                        // prints false

(Try it yourself)

Again, despite the values s and t both being `nil``, and despite them having the same representation in memory (0), they’re not the same value because they have different types.

But how does this work? How does the runtime know that the values stored in i1 and i2 have different types? That type information needs to be stored on the interface somehow.

And here we peel back the curtain on interfaces. It turns out, the interface{} type is actually a struct. Here it is defined in golang’s source code. It looks like this:

type eface struct {
    _type *type
    data  unsafe.Pointer
}

This struct has two fields: a pointer to the data, and a pointer to another struct containing type information. (Structs like this, which contain a pointer to data alongside other metadata, and are intended to be used like pointers, are typically nicknamed “fat pointers.”)

Knowing the structure of an interface variable lets us do some reflection dark magic to see what an interface actually looks like at runtime:

type eface struct {
    _type unsafe.Pointer
    data  unsafe.Pointer
}

var slice1 []int = nil
	type S struct{}
	type T struct{}
	var slice1 *S = nil
	var slice2 *T = nil
	var i1 interface{} = slice1
	var i2 interface{} = slice2
	fmt.Println(*(*eface)(unsafe.Pointer(&i1)))  // prints {0x47f5c0 <nil>}
	fmt.Println(*(*eface)(unsafe.Pointer(&i2)))  // prints {0x47f600 <nil>}
	fmt.Println(i1 == i2)                        // prints false

(Try it yourself)

It now makes sense why these two values would not be considered equal: we’re not just comparing data, we’re also comparing _type, and both fields have to match. This is the right way to think about interfaces: not as a raw pointer to data, but as a tagged union. Two interface values can only be equal if the type tag matches.

For completeness sake, I’ll point out that interface types declared with struct … interface {...} have their own struct:

type iface struct {
	tab  *itab
	data unsafe.Pointer
}

itab is an iTable (a vtable for interfaces.) It stores not only type information for the data, but also pointers to each method declared on the interface and defined by the implementation. It’s how the runtime knows which method to execute when a method is called on the interface. The effect is the same though: the iTable serves as a tagged union to distinguish the different underlying types.

This means that assigning a value to an interface isn’t just a simple assignment, but the actual construction of a new object. A consequence of this is that the interface always has a pointer to the underlying data, even if the concrete type is a non-pointer or a scalar type like an int. This means that using interfaces almost always results in a heap allocation, which can affect performance.

Now we’re armed with the intuition to finally tackle our original conundrum. Here it is again, this time with additional print statements to reveal the interface’s internals:

type eface struct {
    _type unsafe.Pointer
    data  unsafe.Pointer
}

type S struct {}
var s *S = nil
var i1 interface{} = s
var i2 interface{} = nil
fmt.Println("i1 = ", i1)
fmt.Println("i2 = ", i2)
fmt.Println(*(*eface)(unsafe.Pointer(&i1)))
fmt.Println(*(*eface)(unsafe.Pointer(&i2)))
if i1 != i2 {
    panic("nil isn’t nil? What madness is this?")
}

We now have a pretty good idea of what’s happening with i1:

  • The nil in the line var s *S = nil is the zero value for pointer types (a nil pointer)
  • The line var i1 interface{} = s constructs a new object, containing the nil pointer from the previous line, and type information for *S

The nil in the line var i2 interface{} = nil is the zero value for interface{}. But what is that value? Again, the use of unsafe.Pointer illuminates:

Output:

i1 = <nil>
i2 = <nil>
{0x47f5c0 <nil>}
{<nil> <nil>}
panic: nil isn’t nil? What madness is this?

(Try it yourself)

The zero value for interface{} is a struct where both the type and the data are nil pointers.

And this was the source of my original error. Although both values in the test were nil, they weren’t the same nil.

As an aside, if you want to check an interface to see whether the underlying data is nil, you have to cast it, like so:

var s1 *S = nil
var i1 interface{} = s1
fmt.Println(i1 == nil) // false
s2 := i1.(*S)
fmt.Println(s2 == nil) // true

With interfaces demystified, their behavior is less surprising. Every interaction I've described here is perfectly reasonable with the understanding that interfaces are (type, data) tuples. I have no issues with this decision.

But I do take issue with how the language tries to obscure this decision:

  • nil is a single identifier whose meaning is context-dependent.
  • The string representation of nil pointers and nil interfaces are both "<nil>", despite being non-equal values that can be assigned to same type.
  • Assigning to an interface actually creates a new object, which is not the same as the value being assigned. To the best of my knowledge, no other assignment in Go behaves this way.
  • Casting an interface to a type actually extracts the underlying value, which is distinct from the interface itself. This causes confusion when a non-nil interface wraps a nil-value.

Go's attempts to resemble C-like languages may be a tactical move to encourage adoption by fans of those languages, but those fans will inevitably end up worse for wear when the language doesn't behave in the ways they expect. I'm not convinced that the confusion is worth the misleading sense of familiarity that these attempts engender.

Go is not Java. It should not pretend to be Java. Instead, it should focus on being Go. Go is much better at being Go than it is at being Java.

Special thanks to https://github.com/teh-cmc/go-internals for a lot of the information used in this article.

Go as a language always seems to attract strong opinions. If you love Go and feel like I'm being unfairly critical to your favorite language, come join our Discord and call me names! If, on the other hand you hate Go and feel like I'm bending over backwards to defend a fundamentally flawed language, then you should also come join our Discord and call me names! You can also get on touch with us on Twitter.

Or maybe you want to join us to discuss how using version control for your database is a no-brainer. We'd love to chat about that too.

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt

Or join our mailing list to get product updates.