They're called Slices because they have Sharp Edges: Even More Go Pitfalls

GOLANG
10 min read

On my last Golang post, I suggested that the main source of confusion in Golang is the fact that the language presents itself as a C-like object oriented language, and dresses itself in C-like syntax, while differing from C-like languages in surprising ways. I argued that many of Go’s pain points actually make sense when viewed in a vacuum, but violate the intuition of someone coming from a C-like language.

The remedy for these pitfalls, then, is to peel back the curtain and expose the inner workings of the language just enough for users to adjust their mental models. I’ve had to do a lot of that while helping develop Dolt, the version-controlled SQL database with git semantics. Sometimes it makes me appreciate Golang more. Other times it just makes me seriously question Go’s language design. It’s a mixed bag for sure.

Last time, we explored how interfaces are actually represented in memory, in order to explain some of their unexpected behavior. This time, I want to apply this same approach to another common source of Go misconceptions: slices.

When I started writing this, I assumed that demystifying slices would be shorter and simpler than demystifying interfaces. Slices are the simpler data structure, so there should be less to be confused by, less to criticize.

What a fool I was.

Misconception: Slices are heap-allocated, fixed-sized arrays.

I’m guilty of encouraging exactly this misconception in my original go pitfalls article. It has just enough right to get your feet wet in Go, and just enough wrong to mess up your day.

I suspect this misconception exists and gets propagated by people like me because there’s a common use pattern for which this mental model holds. If you’re using slices in go, and you:

  1. Only declare new slices without a value (var s []int)
  2. Extend slices by writing s = append(s, …), and never use a slice after passing it as an input to append

Then slices behave about how you would expect.

This is mostly how we use slices internally in Dolt. It’s easy to read, is sufficient for most common use cases, and has no surprises.

But step off this golden path and here be dragons. Let’s see what happens when we relax some of these constraints.

Relaxing rule 1: “Only declare new slices without a value”

Where this bit me: using if (slice == nil) to check if a slice is empty.

There are other ways to make empty slices. Here’s some of them:

var sliceA []int
sliceB := []int{}
sliceC := make([]int, 0)
sliceD := make([]int, 0, 1)

sliceA is initialized with the zero value for slices (nil) sliceB is a slice literal with zero elements. sliceC is created by the make function, with an initial size of zero. sliceD is created by the make function, with an initial size zero and an initial capacity of one.

These slices are all empty and all have the same string representation.

fmt.PrintLn(sliceA)	// prints []
fmt.PrintLn(sliceB)	// prints []
fmt.PrintLn(sliceC)	// prints []
fmt.PrintLn(sliceD)	// prints []

(Try it yourself)

But are they equal?

> fmt.Println(sliceA == sliceB)
invalid operation: sliceA == sliceB (slice can only be compared to nil)

(Try it yourself)

Huh. I guess that makes sense. The elements in slices are mutable, and comparing objects that point to mutable data can often be a source of confusion. Does the user want to check to see if they’re the same object in memory, or recursively compare the elements? Disallowing the comparison to prevent confusion is reasonable.

What’s odd is that you can still compare slices to nil, and only nil. It’s not clear what benefit this serves beyond the ability to distinguish between nil slices and non-nil empty slices. And in my experience, relying on that distinction for your program’s behavior is a trap. I once briefly considered writing a function that could return nil to indicate “no result” and a non-nil empty slice to indicate “empty result”. But that would have been begging for confusion, and I ended up adding an extra boolean return value instead.

I also want to draw attention to sliceD in this example. That last one is calling make with an optional capacity argument. If you already know the max size that the slice will grow to, then constructing it with a capacity argument can result in more performant code, because the program will pre-allocate enough space for the final size of the slice.

This is a hint toward how slices actually work. If specifying a capacity allows us to only allocate a single chunk of memory as the slice grows, then every slice returned by append must be reusing that memory. We’ll see that happen explicitly in the following section.

Relaxing rule 2: “Never use a slice after passing it as an input to append

Where this bit me: appending to a slice that was returned from a method.

This misconception is sometimes true.

sliceA := []int{1, 2, 3}
sliceB := append(sliceA, 4)
sliceC := append(sliceA, 5)
sliceC[0] = 0

fmt.Println(sliceA) // prints [1 2 3]
fmt.Println(sliceB) // prints [1 2 3 4]
fmt.Println(sliceC) // prints [0 2 3 5]

(Try it yourself)

But other times it isn’t.

sliceD := append([]int{1, 2}, 3)
sliceE := append(sliceD, 4)
sliceF := append(sliceD, 5)
sliceF[0] = 0

fmt.Println(sliceD) // prints [0 2 3]
fmt.Println(sliceE) // prints [0 2 3 5]
fmt.Println(sliceF) // prints [0 2 3 5]

(Try it yourself)

In the first case, the three final slices were all backed by separate memory and could be edited independently. In the second case, all three slices were backed by the same memory and edits to one were reflected in the others.

We see here that append only sometimes allocates new memory, and the output may reuse the memory of the old slice. This is why you should never reference a slice after it’s passed as an input to append.

Personally, I think that making the behavior of append unpredictable like this and putting the onus on the user to avoid reusing slices that have been passed into append is a mistake. Other languages make similar mistakes impossible:

Go has no such equivalent. The append function wants the performance benefits that come with move-construction or consumed parameters, but provides none of the safety. It’s entirely up to the user to avoid this unpredictable behavior, all in the name of a performance optimization.

Honorable Mention: subslicing a slice (eg. s2 := s1[1:3])

This is less of a sticking point than the other two because there’s places where it’s perfectly reasonable to subslice a slice.

For instance, Dolt represents database rows as key-value pairs in a prolly tree. Each key and value is a stream of bytes that represents multiple table columns, but we often want to extract the bytes for a single column and operate on them. Subslicing lets us do that.

Subslicing is useful but introduces a new risk: there are now two references to the underlying memory, and writes to the subslice also effectively write the original slice. Sometimes this is desired (for example, by letting us make changes to column values and have them be reflected in the backing row.) But many times, it’s not desired. Worse, it may not be obvious that this can happen once the subslice is passed across a function boundary.

And if you end up mixing subslices with append, you’re almost certainly going to get unintended results.

The following is an incredibly simplified representation of the actual Dolt code for storing a row of values as a bytestream. In the actual code, the offsets are appended to the end of the byte stream, but they’re kept separate here for simplicity.

type Tuple struct{
	bytes []byte
	offsets []int
}

func (t Tuple) getField(i int) []byte {
    var startOffset int
    if i == 0 {
   	 startOffset = 0
    } else {
   	 startOffset = t.offsets[i-1]
    }
    var endOffset int
    if i == len(t.offsets) {
   	 endOffset = len(t.bytes)
    } else {
   	 endOffset = t.offsets[i]
    }
    return t.bytes[startOffset:endOffset]
}

The getField function makes it easy to get access to the individual values that make up the bytestream. But there’s a risk involved. If the user forgets (or is unaware) that getField returns a subslice, they might assume they hold the only reference to the underlying memory and start using it for their own purposes.

t := Tuple {
	bytes: []byte{1, 2, 3}
	offsets: []int{1}
}

v0 := t.getField(0)
v1 := t.getField(1)
v2 := t.getField(2)

fmt.Println(v0) // prints [1]
fmt.Println(v1) // prints [2 3 4]
fmt.Println(v2) // prints [5 6]

// The user uses the returned slices for arbitrary computation
v0[0] = 17
v0 = append(v0, 34)

v2 = append(v2, 42)
v2[0] = 47

fmt.Println(v0) // prints [17 34]
fmt.Println(v1) // prints [34 3 4]
fmt.Println(v2) // prints [5 6 42]

fmt.Println(t.getField(0)) // prints [17]
fmt.Println(t.getField(1)) // prints [34 3 4]
fmt.Println(t.getField(2)) // prints [5 6]

(Try it yourself)

Let’s survey the damage.

  • getField(0) no longer contains its original value, but it also no longer matches the current value of v0.
  • Both v1 and getField(1) have had their values altered despite neither being written to.
  • None of the changes to v2 are reflected in t.bytes, even though all the changes to v0 were.

This behavior seems arbitrary, inconsistent, and difficult to debug. And the only way to adequately explain it is to talk about how slices actually work under the hood. So let’s do that now.

Peeling Back the Curtain

Slices are implemented in the golang runtime as structs. They’re defined in go/src/runtime/slice.go:

type slice struct {
    array unsafe.Pointer
    len   int
    cap   int
}

Knowing this, we can now cast the values from our previous examples to a struct in order to peek at their internals. (Note that the exactly values of the printed addresses will likely differ):

type slice struct {
    array unsafe.Pointer
    len   int
    cap   int
}

sliceA := []int{1, 2, 3}
sliceB := append(sliceA, 4)
sliceC := append(sliceA, 5)
sliceC[0] = 0

fmt.Println(*(*slice)(unsafe.Pointer(&sliceA))) // prints {0xc00001a018 3 3}
fmt.Println(*(*slice)(unsafe.Pointer(&sliceB))) // prints {0xc00007a000 4 6}
fmt.Println(*(*slice)(unsafe.Pointer(&sliceC))) // prints {0xc00007a030 4 6}

sliceD := append([]int{1, 2}, 3)
sliceE := append(sliceD, 4)
sliceF := append(sliceD, 5)
sliceF[0] = 0

fmt.Println(*(*slice)(unsafe.Pointer(&sliceD))) // prints {0xc00007a000 3 4}
fmt.Println(*(*slice)(unsafe.Pointer(&sliceE))) // prints {0xc00007a000 4 4}
fmt.Println(*(*slice)(unsafe.Pointer(&sliceF))) // prints {0xc00007a000 4 4}

(Try it yourself)

From this we can observe:

  • len is the number of elements in the slice (the same value you get from calling len(s).
  • cap is the slice’s total capacity (the number of elements that can fit between the start of the slice and the end of the allocated memory.)
  • Slices A, B, and C all have distinct addresses. Slices D, E, and F all have the same address: they point to the same region of memory, and changes to one are reflected in the other.

If len < cap, that means that there’s space after the end of the slice for more elements, and calling append will write to that space. Assuming that space isn’t also being used by another slice, this won’t have any observable side-effects.

If this slice was obtained by calling make, that extra space was pre-allocated and is probably unused (unless you’re reusing a slice that was passed to append, like we do above.) If this slice was obtained from sub-slicing, then that extra space is from the original slice.

The trouble is, append does not know or care about the distinction between these two cases. If there is space at the end of the slice, append will write to it. If there isn’t, append will copy the slice contents to a new allocation. It’s up to the caller to ensure that no important data gets overwritten by the operation.

One way to prevent these sort of accidents would be to have a read only slice type that can’t be written, and always creates a new allocation when appended. In fact, read only slices have been requested several times. They are not yet supported by the language.

What’s incredible is that we technically already have read-only slices in go, but only for slices of bytes: strings are effectively read-only byte slices. The Stringer tool in the standard library even leverages this fact to efficiently generate and store string representations of enums. Yet, there’s no equivalent for slices of other types.

Perhaps I’m being unfair to Golang, since other languages have to deal with this too. If you’ve used C++, you’re probably familiar with the string_view class, which functions a lot like slices: a view into a pre-allocated string.

Yet, in my experience, users tend to wrap their heads around string_view pretty quickly, despite the fact that it has more pitfalls than Go’s slices due to C++’s manual memory management. I suspect string_view causes less confusion because C++ users are already primed to think about memory ownership of their objects, whereas Go trains users to not worry about memory.

C++’s string_view is dangerous because C++ is dangerous. Go slices break with the rest of the language to be dangerous by design. It’s honestly baffling.

Conclusion

Last time I said that go interfaces were defensible once you understood exactly what they were.

I’m going to cut slices a lot less slack here. Slices have all of the concurrent memory pitfalls of C++’s string_view, but none of the protections of C++ move-only types. That makes it impossible to write code that is both efficient and safe. If you have a struct that owns a slice and an accessor method that returns a slice for reading, you have two options:

  • Return a slice of the underlying data. There’s no way to prevent the caller from modifying the data by assigning to the slice, or potentially overwriting other data by calling append.
  • Copy the data before returning it. But this requires a new memory allocation, and depending on the size could be slow and memory intensive. If you do it too often, you might trigger garbage collection more frequently, slowing your program down even more.

We could solve both these problems by returning some other type that’s indexable but can’t be passed to append... except that Go doesn’t allow user types to use the indexing syntax: only builtin types get to do that.

Ultimately, slices feel like the worst of all worlds. But anything better would require language features that simply don’t exist in go. It’s what we’re stuck with.

At least you can make a pretty fast version controlled database with it.

That’s all for now. Like always, if you feel I’m being unfair, join our Discord and let me know! You can also reach us on Twitter, or GitHub.

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt

Or join our mailing list to get product updates.