Hacking Go's Runtime with Generics

GOLANG
5 min read

Today's blog is about maphash, a new open-source package for hashing arbitrary (comparable) Golang types. We'll do a deep-dive on how its implemented as well as some interesting tid-bits on the Golang runtime.

Here at DoltHub, we love Golang! We've been pretty happy with Golang as our primary development language. It has a strong open-source community, an excellent tool chain, and the simplicity of the language itself facilitates a productive developer experience. We're using it to build Dolt, a MySQL-compatible version-controlled database. This blog is the latest in our technical Golang series, check them out if you're looking for more fun Golang content.

Hashing All the Things

maphash is a minimal package meant for one purpose: put the power of Golang's builtin hash into developers hands. Any comparable Golang type can be hashed using a maphash.Hasher. The entire public API can be summarized as follows:

type Hasher[K comparable] struct {
    ...
}

func NewHasher[K comparable]() Hasher[K] {
    ...
}

func (h Hasher[K]) Hash(key K) uint64 {
    ...
}

In Golang the comparable type constraint is the union of all types that implement the operators == and !=. This includes all scalar types, channels, interfaces, arrays, and structs of other comparable types. Essentially anything you can use as the key of map you can hash with maphash

So what's the point? The motivation for this package came while implementing a concurrent, stripe-locked cache for Dolt. The design looked something like this:

type Cache[K comparable, V any] struct {
    stripes [64]map[K]V
    locks   [64]sync.RWLock
}

Using generics, we can create a flexible container class backed by a builtin map. However, as we try to implement the accessor methods for this class, we get stuck writing a generic function to choose which stripe each key belongs to:

// getStripe returns the stripe that |key| belongs to.
func (c Cache[K, V]) getStripe(key K) int8 {
    ???
}

What we need here is a hash function for a generic, comparable type K.

Background

The concept of exposing a generic hash function is not new to Golang. This idea first appeared in the core Golang repo in a 2017 issue submitted by Byran Mills of Google's Golang team. It suggests adding a builtin function for hashing comparable types as a building block for custom containers and data structures. The proposal argues that Golang developers are forced to work around language when writing hash-based data structures like Tries or concurrent hash maps. Developers who want an efficient hash-based data structure have only the builtin map to choose from.

When this issue was originally created in 2017, Golang didn't have generics, so the proposed interfaced lacked a concrete type:

func Local(interface{}) uintptr

This API seems reasonable at first, but it leaves the compiler without the type information it needs to implement this function efficiently. The secret sauce of what makes Go's builtin map so fast is that the compiler generates custom hash functions for each map type declared in the source code. These generated functions know how to traverse a specific data type and use AES instructions to hash them with special hardware support. Unfortunately, all this magic is hidden inside the runtime, and it depends on the compiler knowing a static type at build-time.

In go 1.19, the standard library added a new maphash package that exposed utilities for hashing strings and []byte slices using the AES-based runtime implementations. These are certainly welcome additions to the language, but they still lack the flexibility needed to write generic container types like Tries. Earlier this year, a new issue proposed expanding this package to hash any comparable type. But unless this proposal gets accepted, we're left with only one option: hacking.

Hacking the Runtime

As a pretext, we know the compiler is capable of generating the exact hash functions we want. And further, we know that using a map[T]V will trigger the compiler to generate a function that will hash a comparable type T. Within the runtime, these custom hash functions are accessed through a maptype:

type maptype struct {
    typ    _type
    key    *_type
    elem   *_type
    bucket *_type // internal type representing a hash bucket
    // function for hashing keys (ptr to key, seed) -> hash
    hasher     func(unsafe.Pointer, uintptr) uintptr
    keysize    uint8  // size of key slot
    elemsize   uint8  // size of elem slot
    bucketsize uint16 // size of bucket
    flags      uint32
}

Each usage of a map takes a *maptype and an instance of a map (*hmap):

// v := map[k]
func mapaccess1(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
    ...
}

// v, ok := map[k]
func mapaccess2(t *maptype, h *hmap, key unsafe.Pointer) (unsafe.Pointer, bool) {
    ...
}

// delete(map, k)
func mapdelete(t *maptype, h *hmap, key unsafe.Pointer) {
    ...
}

So how can we get access to this? Using Golang's unsafe package, we can work around the type system and get access to the private fields within these runtime types. This is what it looks like in our new maphash package:

func getRuntimeHasher[K comparable]() (h hashfn, seed uintptr) {
    a := any(make(map[K]struct{}))
    i := (*mapiface)(unsafe.Pointer(&a))
    h, seed = i.typ.hasher, uintptr(i.val.hash0)
    return
}

type hashfn func(unsafe.Pointer, uintptr) uintptr

type mapiface struct {
    typ *maptype
    val *hmap
}

// mirrors go/src/runtime/type.go
type maptype struct {
    typ    _type
    key    *_type
    elem   *_type
    bucket *_type
    // function for hashing keys (ptr to key, seed) -> hash
    hasher     func(unsafe.Pointer, uintptr) uintptr
    keysize    uint8
    elemsize   uint8
    bucketsize uint16
    flags      uint32
}

In getRuntimeHasher[K comparable](), start by declaring a map[K]struct{}. This declaration makes the compiler create the hash function we want. Next we cast our map to type any, forcing the compiler to add additional metadata do describe the dynamic type of the variable a. For maps, this metadata is a pointer to a maptype object, which is just what we need to access the hash function! Using an unsafe.Pointer we can cast a to mapiface (a runtime type), get our hasher, and construct a maphash.Hasher:

type Hasher[K comparable] struct {
    hash hashfn
    seed uintptr
}

func NewHasher[K comparable]() Hasher[K] {
    h, s := getRuntimeHasher[K]()
    return Hasher[K]{hash: h, seed: s}
}

Using a maphash.Hasher we can create hash-based datastructures keyed by UUIDs, time.Time, or any other comparable type. And thanks to the compiler, Hasher is able to take advantage of optimized AES instructions. Hashing a string with maphash is just as fast as the standard library version!

goos: darwin
goarch: amd64
pkg: github.com/dolthub/maphash
cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
BenchmarkCompareStringHasher
BenchmarkCompareStringHasher/string_hasher
BenchmarkCompareStringHasher/string_hasher-12         	237970230	         5.341 ns/op	       0 B/op	       0 allocs/op
BenchmarkCompareStringHasher/std_string_hasher
BenchmarkCompareStringHasher/std_string_hasher-12     	207972664	         5.759 ns/op	       0 B/op	       0 allocs/op
PASS

A Humble Request

If you're thinking that this code is an abuse of Golang, I agree with you. The maphash package, like map itself, is tightly coupled to the current runtime implementation. Golang's map implementation has been fairly stable over the last several major releases, but any changes would need to be ported to maphash. Pretty fragile! Currently, maphash uses build tags to restrict its supported versions to go 1.18 and go 1.19. We'll expand this support with future major version releases.

If you're thinking there's a better way to do this, I agree with you again! If I could make a request, I would ask you to comment and upvote the open issue to expand the standard library version of maphash to include the following:

// Comparable returns the hash of comparable value v with the given seed
// such that Comparable(s, v1) == Comparable(s, v2) if v1 == v2.
// If v contains a floating-point NaN, then the hash is non-deterministically random.
func Comparable[T comparable](seed Seed, v T) uint64

Wrapping Up

Thanks for reading! I hope you learned something about Golang internals and I hope you give maphash a try. I'd like to give a shout-out here to Dave Cheney and his excellent blogs about Golang and its runtime. In particular his blog on maps was incredibly helpful background for this work. I also hope you get a chance to checkout Dolt a Git-like SQL database built using Golang. If you're interested in learning more about Golang or Dolt, join us on our Discord!

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt

Or join our mailing list to get product updates.