Why Are Golang Heaps So Complicated

GOLANG
6 min read

Heaps are commonly used to partially sort a set. Every insertion/deletion from the set is followed by a "fixup" to restore either min-heap or max-heap integrity. For example, a max-heap can be represented as a binary tree where every parent is "greater" than its children. It usually takes a small number of swaps to "fixup" a tree to restore the max-heap property after an insertion or deletion. Even though all of the set elements are not globally ordered, the "biggest" value is always at the top of a max-heap. As a result, heaps have a variety of practical applications.

max-heap

container/heap

Heaps can be implemented as binary trees with nodes and pointers, but most programming languages provide default heap implementations for lists.

I personally found the standard library heap confusing when I started using Golang.

This is what I was used to coming from Python:

h = []
>>> heappush(h, (5, 'write code'))
>>> heappush(h, (7, 'release product'))
>>> heappush(h, (1, 'write spec'))
>>> heappush(h, (3, 'create tests'))
>>> heappop(h)
(1, 'write spec')

For comparison, here is the equivalent code in Go adapted from the container/heap docs: (run here):

import (
	"container/heap"
	"fmt"
)


type Tuple struct {
	i int
	s string
}

// An TupleHeap is a min-heap of ints.
type TupleHeap []Tuple

func (h TupleHeap) Len() int { return len(h) }
func (h TupleHeap) Less(i, j int) bool {
	if h[i].i != h[j].i {
		return h[i].i < h[j].i
	}
	return h[i].s < h[j].s
}
func (h TupleHeap) Swap(i, j int) { h[i], h[j] = h[j], h[i] }
func (h *TupleHeap) Push(x any) {
	// Push and Pop use pointer receivers because they modify the slice's length,
	// not just its contents.
	*h = append(*h, x.(Tuple))
}

func (h *TupleHeap) Pop() any {
	old := *h
	n := len(old)
	x := old[n-1]
	*h = old[0 : n-1]
	return x
}

// This example inserts several ints into an TupleHeap, checks
// the minimum, and removes them in order of priority.
func main() {
	h := &TupleHeap{}
	heap.Init(h)
	heap.Push(h, Tuple{i: 5, s: "write code"})
	heap.Push(h, Tuple{i: 7, s: "release product"})
	heap.Push(h, Tuple{i: 1, s: "write spec"})
	heap.Push(h, Tuple{i: 3, s: "create tests"})
	fmt.Printf("%d ", heap.Pop(h))
// main.Tuple{i:1, s:"write spec"}
// Program exited.
}

I used heaps for several engine optimizations recently, and 3 years later I still find Go heaps confusing. So I took sometime to research why.

Default []int Heaps

One thing that stands out about collections/heap is that it is broadly generic but requires a lot of boilerplate. heap.Interface supports binary heaps, array heaps, disk-backed heaps, with any user supported type.

Much of Golang's standard library aims to be maximally simple, so this design feature is unsurprising. sort, for example, is a similar standard library package whose interfaces are similarly generic. One key difference is that sort.Ints and sort.Strings provide defaults that remove boilerplate for the most common cases.

Maximally flexible but good defaults sounds ideal. Why doesn't heap support simple defaults?

Slice Pointers

We have documented in the past how slices have sharp edges. Part of why slices are confusing is that they are implicit pointers to underlying storage arrays. One of Nick's main criticisms in that blog is how array mutations can have unpredictable effects on memory. A similar problem plagues the space of heap implementations.

To illustrate, here are two different append functions. One uses a default slice pointer, and one uses a pointer to a slice pointer (run here).

func main() {
	arr := []int{1, 2}
	append1(arr, 3)
	fmt.Printf("#%v\n", arr) // prints: [1 2]
	append2(&arr, 3)
	fmt.Printf("#%v\n", arr) // prints: [1 2 3]
}

func append1(arr []int, v int) {
	arr = append(arr, v)
}

func append2(arr *[]int, v int) {
	*arr = append(*arr, v)
}

Without going too deep into the details, slice pointers act like any other value type when passed to a function 1. Mutating the underlying array creates a new slice pointer. If that slice pointer was passed by value, all of the changes are restricted to the inner closure (by contrast, in-place modifying an array through a slice pointer maintains the integrity of the original reference). If the same slice pointer is passed by reference, the outer slice address is updated to the new array.

This is important for heap implementations because it limits the design space. We either need:

  1. An indirection layer that maintains the newest pointer when we modify memory

  2. Return new references when we modify memory, or

  3. Operate explicitly on pointers to slices so the original references remain valid

Indirect Slices Through Object Pointer

The first option, chosen by the Go standard library, outlines a minimal possible interface that abstracts unnecessary sorting details but forces a user to acknowledge and maintain the slice indirection:

type IntHeap []int

type Interface interface {
	sort.Interface
	Push(x any) // add x as element Len()
	Pop() any   // remove and return element Len() - 1.
}

Instead of passing a pointer of a pointer, we add a concrete type to make the additional reference explicit. This makes it easy for heap.Interface to highlight the slice's edges while supporting implementation flexibility.

Immutable Pointers

The second option, returning and tracking updated slice pointers for each heap operation, is one way to implement a heap that handles []int defaults:

var h []int
h = heap.Push(h, x)
h, y := heap.Pop(h)

Immutable updates force us to continuously track the newest heap ref. We also lose the ability to do the sort of context management, locking, and concurrency control that we usually use in production. Given that Go often designs around in-place updating central objects, I am not surprised the standard library avoided this.

Slice Pointers

The third pattern passes slice pointers to heap functions, similar to how the standard library encoding/json passes string pointers for JSON deserialization. I forked the standard library and implemented this last design to convince myself that it works 2:

	h := []int{5, 7, 1, 3}
	Init(&h)
	Push(&h, 3)
	min := Pop(&h)

This package supports both heap.Interface and built-in Comparable slices ([]int and []string for starters). The downside is that unsupported types panic (alternatively, heap functions could deviate from the standard library interface and return an error).

Arguably the biggest downside of the last approach is combining the dynamic and static interfaces into one package. If something goes wrong in the wrapper I would be even more confused than before doing all of this!

Generics

After I originally posted this article, one of the r/golang mods jerf pointed out a nice package that makes it easy to heap sort built-in types:

    h := binheap.EmptyMaxHeap[int]()
    h.Push(5)
    h.Push(7)
    h.Push(1)
    h.Push(3)
    min := h.Pop() // 1

Another user named twek found an old proposal from 2021 with a similar interface design. He was nice enough to create an example implementation sandbox for others to try if you are interested in running it yourself!

This diverges a bit from the current standard library by combining the helper methods and the heap.Interface into one. A heap implementation can standardize Push(), Pop(), and Swap() for all slices. And the generic object statically captures the comparable type's Less(). The result is zero effort heaps for all built in types!

The same strategy also makes it easy to heap sort arbitrary slices with an initialization callback similar to sort.Slice (run here:

import (
	"fmt"

	"github.com/lispad/go-generics-tools/binheap"
)

type Struct struct {
	Idx         int
	Description string
}

func (s Struct) LessThan(r Struct) bool {
	if s.Idx < r.Idx {
		return true
	}
	if s.Idx > r.Idx {
		return false
	}
	return s.Description < r.Description
}

func main() {
	h := binheap.EmptyHeap[Struct](func(a, b Struct) bool { return a.LessThan(b) })
	h.Push(Struct{5, "write code"})
	h.Push(Struct{7, "release product"})
	h.Push(Struct{1, "write spec"})
	h.Push(Struct{3, "create tests"})
	fmt.Println(h.Pop())
}

Summary

At the end of the day, the way slices work makes Golang's heaps a bit more confusing than other languages. Peeling back the curtain a bit helped me understand the design space given Golang's language limitations. The active community on r/golang helped provide me with more context and better implementations than I could find on my own. The generics solution3 uses a slightly different design than the standard library, but is a simple abstraction and removes most of the boilerplate compared to container/heap today. I linked the open proposal below if you are interested in following the Go core team's thoughts[^4].

I am sure I missed some things in the process of doing this writeup, whether previous design discussions or other blogs regarding this topic. If you have comments our questions feel free to reach out to us on Twitter, Discord, and GitHub!

here [^4] Proposal to add generics implementation to go standard library here


  1. More on pointer vs value receivers here.
  2. heap package with builtin pointer support here.
  3. Alternative heap implementation that uses generics

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt

Or join our mailing list to get product updates.