Embedding Go in C

REFERENCEUSE CASEGOLANG
10 min read

Here at DoltHub, our centerpiece is Dolt, which fuses a MySQL-compatible database with Git-style versioning capabilities. People have found many creative uses for Dolt, but there's one area that Dolt has yet to step foot in: the library space. To be clear, it's very easy to embed Dolt within another Go application, as Dolt is built in Go. Dolt, however, cannot be used as a general library to be embedded in non-Go applications, at least not yet. Dolt offers a very competitive feature set compared to the most popular embeddable SQL library: SQLite. We are currently researching the best ways to make Dolt an embeddable library.

In the meantime, we've learned quite few things when it comes to embedding Go applications in a foreign language, and we're creating this resource in case it helps others. We've found many references for embedding C/C++ to work within Go, but not nearly as many in the opposite direction. We'll be targeting a C++ application, but the general knowledge will apply to other languages that use a foreign function interface. I'll assume some familiarity with C/C++ and Go in this post.

GitHub Sample Project

We've released a small example project on GitHub that I'll be referencing throughout this post. It's public domain, so feel free to use it as a starter for your own projects.

https://github.com/dolthub/go-library-sample

The top-level project is the C++ application, while the go-library folder within is the Go library.

Compiling Go

Go has the option to compile to C using the -buildmode argument. There are two relevant options here: c-archive and c-shared. The shared option creates a shared library that may be dynamically loaded at runtime, and the archive option creates an archive that may be statically compiled with your target application, which is what we'll choose. The output will be the aforementioned archive file, along with a header file containing all of the exported functions. Adding your Go library into your application is as easy as referencing a header and adding the library!

Using CMake

The sample project goes a step further and handles the compilation of the Go library as well. Using CMake, we define a target named GO_STATIC_LIBRARY_TARGET, and add the archive file as a dependency.

add_custom_target(GO_STATIC_LIBRARY_TARGET
    DEPENDS "${PROJECT_BINARY_DIR}/go-library/${GO_STATIC_LIBRARY_NAME}")

Next, we compile our library to the project's build folder, and list the archive file as the output and the go-library folder as a dependency.

add_custom_command(OUTPUT "${PROJECT_BINARY_DIR}/go-library/${GO_STATIC_LIBRARY_NAME}"
    COMMAND ${GOLANG_BINARY} build -o "${PROJECT_BINARY_DIR}/go-library/${GO_STATIC_LIBRARY_NAME}" -buildmode=c-archive .
    WORKING_DIRECTORY "${PROJECT_SOURCE_DIR}/go-library"
    DEPENDS "${PROJECT_SOURCE_DIR}/go-library")

Finally, we define an imported library named GO_STATIC_LIBRARY, and point it to our archive file, while listing the target as a dependency.

add_library(GO_STATIC_LIBRARY STATIC IMPORTED)
set_target_properties(GO_STATIC_LIBRARY PROPERTIES IMPORTED_LOCATION ${PROJECT_BINARY_DIR}/go-library/${GO_STATIC_LIBRARY_NAME} LINKER_LANGUAGE CXX)
add_dependencies(GO_STATIC_LIBRARY GO_STATIC_LIBRARY_TARGET)

Without these explicit dependencies, we'd recompile the Go library on every build, which would also relink the library with the application every time.

Exporting Go

Exporting Functions

There are many rules and limitations when it comes to exporting functions from Go. First and foremost, such functions need to be preceded with the comment //export FUNCTION_NAME, and no further comments may follow the name. This marks the function for exporting, which will include it in the generated header file. Function parameters and return types may be any C type (we'll get to those later), along with the basic number types: int, float64, bool (it's an uint8 in disguise), etc. In the sample project, you can see examples of this from ModifyBool to ModifyFloat64, with each function returning a mutation on the input.

//export ModifyInt64
func ModifyInt64(input int64) int64 {
	// This applies a bitwise NOT to the input
	return ^input
}

In addition, you may use the string type as a parameter and return type, however using it as a return value is a bit tricky due to memory ownership. The Go library still manages its data using garbage collection, and therefore must retain full ownership of the data. Attempting to pass a string to C that will no longer exist within Go's scope after returning will trigger a panic (assuming GODEBUG=cgocheck is at least set to 1). Therefore, you must work around this by letting C/C++ allocate the memory, which Go then modifies and returns. Go has a built-in function for this, C.CString, which returns a null-terminated char pointer. The sample project's ModifyString does exactly this.

//export ModifyString
func ModifyString(input string) *C.char {
	// This lowercases and uppercases each half of the input depending on the length of the input. The new string must
	// be freed in the calling C code.
	inputLen := len(input)
	var output string
	if inputLen % 2 == 0 {
		output = strings.ToLower(input[:inputLen/2]) + strings.ToUpper(input[inputLen/2:])
	} else {
		output = strings.ToUpper(input[:inputLen/2]) + strings.ToLower(input[inputLen/2:])
	}
	return C.CString(output)
}

This means that the C/C++ application MUST free the returned pointer, otherwise a memory leak will occur. Also, make sure that free is used rather than delete for C++ applications, as the deallocation function should match the appropriate allocation function, else you may encounter undefined behavior.

This memory ownership issue is true for all non-trivial types. This includes slices, which must be constructed around a block of memory allocated by C.malloc (or passed in from C as a parameter). As long as Go does not own the memory, then slices may be returned to C, and Go will convert them to the GoSlice structure defined in the generated header.

typedef struct { void *data; GoInt len; GoInt cap; } GoSlice;

This GoSlice uses a void* for its data field, therefore it's recommended to return a pointer and length combo to retain the benefits of static typing. The sample project defines the ToCSlice utility function to convert slices to a pointer and length combo (it's also useful to see how to work with C memory in Go).

// ToCSlice converts a slice of floats or integers into the pointer + length combo that C operates on.
func ToCSlice[T constraints.Integer | constraints.Float](input []T) (*T, int) {
	var element T
	elementSize := int(unsafe.Sizeof(element))
	allocatedMemory := C.malloc(C.ulonglong(len(input) * elementSize))
	for i := 0; i < len(input); i++ {
		allocatedMemoryLocation := (*T)(unsafe.Add(allocatedMemory, i * elementSize))
		*allocatedMemoryLocation = input[i]
	}
	return (*T)(allocatedMemory), len(input)
}

The utility function is used for the return values in the sample project's ModifyInt32Slice function, which also demonstrates how to manipulate a slice parameter even though it was created from C++.

//export ModifyInt32Slice
func ModifyInt32Slice(input []int32) (arrayPointer *int32, length int) {
	// This modifies the input while returning a new C-compatible array. The new array must be freed in the calling C code.
	output := make([]int32, len(input))
	for i := range input {
		input[i] = ^input[i]
		output[i] = int32(i + 100)
	}
	return ToCSlice(output)
}

Again, the return pointer must be freed by the C/C++ application, else a memory leak will occur.

As you may have noticed, the ToCSlice utility function does not handle []string, as those slices have an additional layer of complexity on top of normal slices. Strings are, themselves, specialized byte slices. A slice of strings is essentially a slice of slices (or an array of arrays), which means that either the calling C/C++ program needs to deallocate each string manually, or the strings should point to different sections of a large block of memory, with that block being deallocated. In any case, it is recommended to avoid returning string slices if possible, simply due to the additional complexity.

Slices are the only non-trivial built-in type that may be returned from Go. Maps and channels may not be used as return values at all, however they can still exist in C/C++ by reinterpreting them as opaque void pointers. This would result in the loss of static type safety though. They also cannot be used as parameters since they cannot exist in C.

Exporting Structs

You can't, at least at the time of writing. It appears to be planned, but has not yet been implemented. In the meantime, structs must be defined in C, and then modified by Go's code. This is straightforward when embedding C into Go, however our Go library does not have any visibility of the C/C++ application that it is being imported into. To workaround such issues, you can declare a block comment containing C code before the import "C" line at the top of the file (ensure no spaces between the comment and the import line). This block comment will get added to the generated header, and the structs within are referencable by Go using C.struct_STRUCTNAME. The struct_ prefix is important, and also applies to union and enum. We use this block comment in the sample project to define a struct named DemoStruct.

/*
#include <stdint.h>
struct DemoStruct {
    uint8_t A;
    int32_t B;
};
*/
import "C"

Again—it cannot be stressed enough—if a struct is created via C.malloc in Go, then it must be freed in the calling C/C++ code. This extends to setting any internal pointer fields within those structs, as those fields cannot reference any Go memory. It is okay to return the struct as seen in ModifyDemoStruct in the sample project, as it is returned by value, causing it to be allocated on the stack.

//export ModifyDemoStruct
func ModifyDemoStruct(input1 *C.struct_DemoStruct, input2 C.struct_DemoStruct) C.struct_DemoStruct {
	// This modifies the first input while returning a new DemoStruct based on the second input. The struct type is
	// declared in C, as there is not yet a way to pass standard Go structs.
	input1.A = ^input1.A
	input1.B = ^input1.B
	return C.struct_DemoStruct{
		A: (C.uint8_t)(^input2.A),
		B: (C.int32_t)(^input2.B),
	}
}

As an interesting detail, this is one of the reasons that map and chan cannot be returned from Go, as those are aliases for pointer types. A slice is a SliceHeader, which is generally passed around as a value.

type SliceHeader struct {
	Data uintptr
	Len  int
	Cap  int
}

This is also why changing the length of one slice doesn't affect other slice references, while modifying a map will modify all references of that map. Strings use a special StringHeader for the same reason.

type StringHeader struct {
	Data uintptr
	Len  int
}

Go Threads

One of the best features of Go is how easy it is to work with multiple threads. Although these threads cannot be exposed to C/C++ directly, they can be interfaced with via exported functions. The functions StartChannel and ReadChannel in the sample project show a brief demonstration of starting a thread to insert values to a channel, and then reading those values from our C++ application.

// Remember to close channels once you're done with them. This is being ignored for the sake of demonstration.
var demoChannel = make(chan int32)

//export StartChannel
func StartChannel() {
	// This starts sending integers over the channel
	go func() {
		count := int32(1)
		for {
			demoChannel <- count * 10
			count++
		}
	}()
}

//export ReadChannel
func ReadChannel() int32 {
	// This returns the next integer from the channel
	return <- demoChannel
}

Importing Go

Now that our Go library is an archive, all we need to do is import the header that the Go compiler generated for us. The sample project's CMake file marks the library's build directory to check for the header file, and our src/main.cpp file includes it.

include_directories(include src "${PROJECT_BINARY_DIR}/go-library")
#include <go_library.h>

From there, it's as simple as using any other C/C++ library. The header contains aliases for all of the basic types, along with ones for slices and interfaces.

typedef signed char GoInt8;
typedef unsigned char GoUint8;
typedef short GoInt16;
typedef unsigned short GoUint16;
typedef int GoInt32;
typedef unsigned int GoUint32;
typedef long long GoInt64;
typedef unsigned long long GoUint64;
typedef GoInt64 GoInt;
typedef GoUint64 GoUint;
typedef size_t GoUintptr;
typedef float GoFloat32;
typedef double GoFloat64;
typedef struct { void *t; void *v; } GoInterface;
typedef struct { void *data; GoInt len; GoInt cap; } GoSlice;
typedef struct { const char *p; ptrdiff_t n; } GoString;

Curiously, the header also defines aliases for maps and channels even though they're unusable from C/C++.

typedef void *GoMap;
typedef void *GoChan;

Any functions that return multiple values will instead return a struct with the field types matching the return types. They'll also be in the same order, with the first return type have the field name r0, the second return type having r1, and so on. If the return values are named in Go, then they still use the rX naming scheme for the fields in the generated struct, with the original names being comments added to the fields.

/* Return type for ModifyInt32Slice */
struct ModifyInt32Slice_return {
	GoInt32* r0; /* arrayPointer */
	GoInt r1; /* length */
};

Even with the tuple to struct translation, you must still remember to free any data that was created within Go using C.malloc. This can be seen with ModifyInt32Slice in the sample project, where we have to free our r0 field as it holds the integer array's data.

std::vector<std::int32_t> inputInts;
...
auto outputInts = ModifyInt32Slice(GoSlice{
	.data = inputInts.data(),
	.len = GoInt(inputInts.size()),
	.cap = GoInt(inputInts.size()),
});
...
free(outputInts.r0);

Conclusion

We hope that this post is useful to those who were in the same situation that we were in when first researching how to make Dolt a library to embed in other languages. Again, the sample project is public domain for everyone to use. We hope you'll join us as we continue to iterate on Dolt! You can keep up to date with us through Twitter, or you can chat directly with us through Discord.

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt

Or join our mailing list to get product updates.