Dolt Supports Cached Prepared Statements

April 20, 2022

4 min read

Last year we announced prepared statement support, a simple form of query parameterization. True prepared statements save plans to be executed later, often multiple times. But the original work only progressed as far as satisfying the handler API. The ComStmtPrepare command discarded plans that had to be recompiled for each ComStmtExecute call.

Today we are excited to reintroduce prepared statements, now with server side caching! This addition is available in the latest version of Dolt, the first application database that can diff, merge, and revert with Git versioning semantics.

Prepared statements let a SQL client peek into the server's machinery and choose how and when to compile a query plan. A client that frontloads this work benefits from query caching and direct query parameterization. Locking in a query plan before considering user input can also protect against SQL injection, one of the many reasons object relational mappers (ORM's) make heavy use of prepared statements. All of the code referenced in this blog is freely available on GitHub. Join us as we walkthrough the nuances of this advanced feature!

How Do Prepared Statements Work

Prepared statements are an advanced performance and maintainability optimization for application databases.

Starting simple, consider a query template invoked several times for an import workload:

	db.MustExec("INSERT INTO objects (label, bbox) VALUES ('cat', [1,2,3,4])")
	db.MustExec("INSERT INTO objects (label, bbox) VALUES ('rabbit', [1,2,3,4])")
	db.MustExec("INSERT INTO objects (label, bbox) VALUES ('dog', [1,2,3,4])")

> INSERT INTO objects (label, bbox) VALUES ('cat', [1,2,3,4]);
> INSERT INTO objects (label, bbox) VALUES ('rabbit', [1,2,3,4]);
> INSERT INTO objects (label, bbox) VALUES ('dog', [1,2,3,4]);

For each query above, we compile and then execute the same template three times with three sets of parameters. One way of improving this setup might format the parameters into a string:

	var batch []struct{
		label string
		bbox string
    } {...}
	b := strings.Builder{}
	b.WriteString("INSERT INTO objects (label, bbox) VALUES ")
	sep := ""
	for _, r := range batch {
		b.WriteString(fmt.Sprintf("%s(%s, %s)", sep, r.label, r.bbox))
		sep = ", "
	}
	db.MustExec(b.String())

> INSERT INTO objects (label, bbox) VALUES ('cat', [1,2,3,4]), ('rabbit', [1,2,3,4]), ('dog', [1,2,3,4]);

This is more efficient and flexible because we can parameterize a list of values. But it is also unwieldy. You have to stare at the code for awhile to understand what is happening. String substitution is also subject to SQL-injections, and the query still has to compile every time we want to insert data.

Prepared statements compile the query once, followed by executions with user specific parameters called "bindvars":

	stmt, err := db.Preparex(`INSERT INTO objects (label, bbox) VALUES (?, ?)`)
	for _, r := range batch {
		stmt.MustExec(r.label, r.bbox)
	}

> INSERT INTO objects (label, bbox) VALUES ('cat', [1,2,3,4]);
> INSERT INTO objects (label, bbox) VALUES ('rabbit', [1,2,3,4]);
> INSERT INTO objects (label, bbox) VALUES ('dog', [1,2,3,4]);

For the visually inclined, the diagram below shows how regular statements pass through the entire analyzer for each query, while prepared statements cache a partial plan.

Prepared statements offer an interface tradeoff. We push control to the SQL Server in exchange for a new backend contract for a prepared query. Whether this contract is a good or bad thing comes down to personal preference. Prepared statements sometimes require more code and more asynchronous thinking. Often a separation of concerns and repeat parameterization simplifies the codebase and query lifecycle.

Why Prepareds

ORMs as a domain favor prepared statements. The ORM that I used for the above examples, sqlx, translates between Go native structs and database rows for simple CRUD queries. But with prepareds, ORMs provide value beyond simply hiding cursor iteration and formatting result sets.

SQL Injection

Prepared statements prevent SQL injection attacks. A SQL injection happens when a porous backend application parameterizes arbitrary text. For example, a web server might string-format the query SELECT * from objects where id=%s based on a browser provided parameter. A malicious user navigating to www.myapp.com/user?uid='1;drop table users' might delete all of your data! Prepared statements enforce a query plan that can only substitute literal bindvars at execution time, preventing arbitrary execution logic.

Performance

The first version of Dolt prepared statements caches query plans, but yields mixed performance wins. The benchmarks below compare insert speed with and without prepared statements:

var Result interface{}

func BenchmarkPrepStmt(b *testing.B) {
	db := newdb()
	stmt, err := db.Preparex(`INSERT INTO objects (label, bbox) VALUES (?, ?)`)
	if err != nil {
		b.Fatal(err)
	}
	var res interface{}
	for n := 0; n < b.N; n++ {
		res = stmt.MustExec("label", "box")
	}
	Result = res
}

func BenchmarkExec(b *testing.B) {
	db := newdb()
	var res interface{}
	for n := 0; n < b.N; n++ {
		res = db.MustExec(`INSERT INTO objects (label, bbox) VALUES ('label', 'bbox')`)
	}
	Result = res
}

The three result columns below correspond to (1) test name, (2) iteration count, and (3) average execution time in nanoseconds per operation. The third value is the most useful, indicating the time it took to roundtrip a client MustExec to the server and back again as spooled results:

~/g/s/g/m/prep-stmt-tutorial > go test -bench=.
goos: darwin
goarch: amd64
pkg: github.com/dolthub/prep-stmt-tutorial
cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
BenchmarkPrepStmt-12    	     378	   3302280 ns/op
BenchmarkExec-12        	     374	   3323406 ns/op
PASS

The execution time is equivalent, which is unfortunate given how simple and common the query is.

We can up the ante with a beefier query. Joins and subqueries take longer to compile, and caching those should should yield more impressive results:

    query := `
        insert into objects (label, bbox)
        select a.label, a.bbox
          from (select ?, ?) as a(label, bbox)
        join other
          on a.label = other.x;`
    stmt, err := db.Preparex(query)

Sure enough, the newer query is about 35% faster when prepared:

~/g/s/g/m/prep-stmt-tutorial > go test -bench=.
BenchmarkPrepStmt-12    	    1000	   1241423 ns/op
BenchmarkExec-12        	     650	   1683874 ns/op

But we can do better! Ideally, the analyzer tees up an optimized plan that stops short of bindvar substitution. The plan is cached, and every new set of parameters launches a short finalization phase. The work required finalize should be small and proportional to the number of bindvar replacements. Small prepared queries have the potential for similar performance benefits.

Summary

We discussed prepared statement caching, a new Dolt feature that enables query parameterization and prevents SQL injection. Your ORM probably uses prepared statements behind the scenes to clean and cache simple INSERT and SELECT queries.

This feature is a steady improvement since implementing the prepare API last year. We also teased the potential for even juicier performance wins in the future.

If you have any questions about Dolt, databases, or Golang performance reach out to us on Twitter, Discord, and GitHub!

Blog

How Do Prepared Statements Work

Why Prepareds

SQL Injection

Performance

Summary

Get started with Dolt