Blog

5 min read

The world is starting to realize agents need branches, specifically database branches. Dolt is the only database with branches. This makes Dolt the database for agents.

You heard about database branches for agentic workflows on this blog first. Then Cursor, makers of the world’s most popular agentic integrated development environment (IDE), said it on Lex Fridman. Now, a new paper published in Arxiv by the UC Berkeley Computer Science (CS) department, entitled “Supporting Our AI Overlords: Redesigning Data Systems to be Agent-First” joins the “databases need branches for agents” party.

You won’t be shocked to hear we soundly agree with the claims in this new paper. However, we’re sad both Cursor and the Berkeley CS department had not heard of Dolt yet. We’re bad at marketing. We’re trying to fix it by hiring a head of content marketing. In the meantime, we’ll shout at the internet until it hears us.

This article dives deep into what “Supporting Our AI Overlords: Redesigning Data Systems to be Agent-First” says about database branches and acts as DoltHub’s formal response.

The Paper#

“Supporting Our AI Overlords: Redesigning Data Systems to be Agent-First” was published on August 31, 2025 in Arxiv. The paper has fifteen co-authors all in the Computer Science department at UC Berkeley. The paper describes the authors’ vision on how data systems will evolve to support agentic workloads.

Relevant to our purposes here at DoltHub is the final section 6.2 entitled “Performing Branched Updates”.

Let’s walk through that section in detail.

Introduction#

When transforming or updating data, agents typically explore multiple “what-if” hypotheses, i.e., branches. For example, at Neon, we observed that agents created 20× more branches, and performed 50× more rollbacks, relative to humans. Traditional transactional guarantees instead operate within a linear thread of execution. Here, with agentic speculation, we instead want multi-world isolation, where each branch must be logically isolated, but may physically overlap.

Neon is a Postgres-compatible database that uses a copy-on-write filesystem to expose an interface they call branches. These branches cannot be diffed or merged. Thus, Neon “branches” are technically “forks” in version control parlance. That said, Neon is the most popular database that has branch-like functionality, resulting in a recent large acquisition by Databricks.

It’s unclear where these Neon agentic write statistics quoted in the paper come from. The citation is just to the Neon website. Regardless, these agentic statistics seem directionally correct. From personal experience with coding agents, branching and rollback are essential to coding agent effectiveness. It stands to reason that the same would be true of agentic write workflows to databases.

The comparison between traditional linear transactional isolation and “multi-world isolation” resonates with me. Traditional multi-view concurrency control (MVCC) transaction management has only the rollback mechanism to deal with conflicting concurrent writes. The longer a transaction is in flight, the greater the probability of conflicting writes. Agentic workloads are designed to be in flight for many minutes as the agent experiments with possible approaches. This almost certainly would cause conflicting writes, especially if multiple agents were tasked with the same prompt. Moreover, with rollback, the whole transaction is rolled back. An agent cannot rollback the prior SQL statement, for instance. Agents hallucinate and should therefore use test harnesses to check their work, requiring frequent rollback. These two factors make MVCC intractable for agentic concurrency control.

Branches, on the other hand, are designed to be long lived, “logically isolated but may physically overlap” in the authors’ words. Conflict detection and resolution are supported workflows. With Git-like branches, individual SQL transactions can be rolled back as well as have the diff they generated inspected by an agent. These two features make Git-style branches ideal for agentic workflows.

Branch Isolation#

Branch Isolation Existing models of branching consistency, developed in the weak consistency era, e.g., in Bayou, Dynamo, or Tardis, can offer inspiration. However, agentic speculation goes further: multiple agents may create forks that must eventually reconcile—not just with the mainline, but with each other. This requires new models of multi-agent, multi-version isolation. Most branches will be similar—e.g., same schema, 90% identical data—but isolation requires that their effects remain logically separate.

I especially like the observation that in agentic workflows, branches will need to reconcile with each other, not just the main copy. This graph-like model of versions where branches are merging forming a “commit graph” will be familiar to Git aficionados. Dolt supports the same Git-like commit graph on database tables instead of files.

Indeed, most branches will be similar. Dolt uses a content-addressed storage engine based on a novel data structure called a Prolly Tree. This data structure allows for similar data to be stored only once, massively decreasing the storage required for branches.

Efficient forking and rollbacks#

Efficient forking and rollbacks. Naively duplicating entire databases per branch is prohibitively expensive and inefficient, making support for efficient forking crucial. Industrial systems like Neon [10], Aurora [15] and academic systems like Tardis [3] adopt copy-on-write approaches to lazily clone state. However, these are still far from what is needed for massive speculation. We need new concurrency mechanisms that exploit similarity across branches and preserve logical isolation (no cross-contamination), to enable massive parallel forking. This is analogous to MVCC on steroids: forking possibly thousands of near-identical snapshots and rolling back all but one. Unlike traditional data systems, where rollbacks are rare, we require ultra-fast rollbacks (i.e., fast aborts for failed branches).

I wholeheartedly agree that any branching mechanism that requires duplication will not scale to agentic workflows. Dolt is the only database with full Git-style version control. In order to achieve this, a new content-addressed storage engine is required. One cannot add version control to an existing storage engine. The database must be designed and built with version control as a first-class consideration. This is why the databases mentioned fall short and do not implement full version control.

Again, Dolt’s branching mechanism is “MVCC on steroids”. Moreover, each branch head is concurrency controlled via MVCC so one can leverage the concurrency mechanism that is right for the job at hand.

The authors keenly observe that “unlike traditional data systems, where rollbacks are rare, we require ultra-fast rollbacks”. Dolt rollbacks, branch deletions, and branch creations are instantaneous. Each Dolt commit is a pointer to a content-addressed root in storage. A rollback simply moves a pointer to another root. A branch deletion removes a pointer. A branch creation makes a new pointer to a root. These are all simple, fast operations.

Conclusion#

It sounds like the Berkeley CS department really wants Dolt for agentic workflows. How would this paper be different had they known Dolt existed? We have reached out to the authors via email and have received a response. We will keep you posted. Do you know anyone else that wants Dolt but doesn’t know it exists yet? Please tell them! Or come by our Discord and tell us so we can make our best effort to contact them.