Blog

7 min read

Recently, I posted a blog describing my experience vibe coding an agentic accounting application called DoltCash. In the post, I mentioned that one of the challenges with the application was instability in accuracy and correctness due to the inevitable context compaction cycles that reduce the size of the agent’s working context.

As a way to circumvent this problem, I told the reader that I’d be testing out Beads, created by Steve Yegge, to offload important context and TODO items into this persistent SQLite + Git datastore. I then told the reader that I’d let them know if it worked to improve DoltCash.

Gob circumvent

Well, today I’m here to report that using Beads for DoltCash was a huge success, and it inspired me to start using it everywhere I vibe-code.

What’s interesting is that my integration of Beads into DoltCash was very basic and naive, and yet, because the agent (Claude Code) had enough awareness to check the persistent store for additional context and tracked TODOs (called “issues” in Beads), my DoltCash robot behaved much more correctly and for longer — over multiple context compaction cycles — which is something I had not seen it do before trying Beads.

Inspired by this large leap in agentic performance, I decided to use Beads again to try a larger, harder agentic task I’d been putting off, to see if it supercharged my coding agent once again.

In today’s post I’ll explain how I used Beads to successfully refactor 315 frontend files in a single 12-hour session.

The Problem#

Like some of you, I’ve been vibe-coding an application in my free time with moderate success. And, while the app is technically functional, the coding-agent produced complicated, well-nested, spaghetti code, spread throughout 1000+ line files, to build it.

For some time I’d put off cleaning this code up, because it’s so much easier to vibe out a new feature than it is (or had been) to clean up the code of an existing feature.

Prior to my experience with Beads, my workflow for successful vibe coding was to allow my coding agent to write whatever they wanted as long as it mostly worked, then go back through their code myself and clean it up manually. I did this because it’s basically impossible to keep an agent on task for a long period of time, especially across context compaction cycles. So prior attempts at agentic refactoring left me wanting.

And my human-in-the-loop vibe-refactor workflow worked really well…until I got lazy and allowed the pile of spaghetti code to become a mountain.

But Beads offered me a way out.

Gob Bees

I decided to add Beads to my personal project and see if I could clean up my code without me actually cleaning up the code.

Building the Issues#

To start, I ran bd init in my code repository and updated AGENTS.md with instructions to use Beads for TODO and task management. Then, I came up with a plan for how to define a large refactoring task for my coding agent.

Having worked with agents for a while now, I know that they can be as lazy as I am at times (which is super annoying), so I needed to be a bit strategic about assigning this work. My plan was to turn every directory into an “epic” and every file in those directories into a “bead”.

Beads supports hierarchical issues, where “epics” represent a group of related “beads” (“beads” and “epics” are just canonical names for tracked issues), both of which can be structured dependently in parent-child relationships, similar to a file-directory hierarchy tree.

By requiring an epic per directory and a bead per file, I hoped to “force” the coding agent to actually refactor each individual file, to discourage it from doing a less thorough refactoring pass. It’s not uncommon for agents to take shortcuts on long-running or large tasks and just call it a day. I figured that making it have to close an individual bead for a specific file might encourage or even trick the agent into actually doing the refactoring work for that file. Both would be fine by me.

To do this, I introduced the agent to my overall goal and gave it clear instructions about how to structure the beads it should work from.

~/src/web/app contains the frontend code for my application. This code is functioning correctly but contains duplication, dead code, deeply nested rendering methods, inline styles, and is generally hard to read and reason about. It also contains many files that are over 500 lines long. I want to refactor this code so that it’s simpler, modular, extensible, and generally more clean, without changing its output. I want you to iterate over every directory in ~/src/web/app and create a new epic for the directory. Then, under each epic, for every file in the directory, create a new bead task with instructions for refactoring the file.

As requested, the agent created this large task graph for my refactoring work. It did so by recursively finding the files, then copying/pasting the core ideas in the prompt above in each bead, targeted at each file. Fine. Now it was time to vibe.

Light Management on the Beach#

Once the issues were created, I told my coding agent, “Next, let’s work on the bead in the first epic.”

If you’re wondering why I opted to go item-by-item to accomplish this task instead of using something like Ralph Wiggum, it’s because, in my experience, coding agents require human-in-the-loop workflows in order to produce production-quality output. In their current iteration, they are able to do the right thing about 80% of the time, but that’s still 20% bullshit that needs a human. So I always review (or vibe review) every completed task. I’ll stop this workflow when my experience informs me otherwise.

And so my agent got underway and once we got going, it made great progress, and I was much more hands-off than I’d ever been with a coding agent before. I was able to casually monitor the agent’s work, while mostly just tapping up on the keyboard to re-run “Ok, great job! Let’s do the next item now.” The agent was able to make credible progress on the epics, and when the time came for context compaction, it was not derailed. The agent could pick up where it had left off without a complete context refresh from me!

Of course, every so often I’d have to manually intervene with the 20% bullshit that would happen. But if you vibe code, you know this is par for the course. In this case, the “bullshit” I encountered was catching the agent talking itself out of doing its assigned work or taking a shortcut, fucking with my ESLint config to disable a specific lint rule instead of modifying the code to actually respect the rule, or often times, just getting it unstuck after it would be spinning and spinning for minutes with no tokens coming in. But again, all of this is typical of coding agents whether using Beads or not.

What was atypical about this new workflow is that I was able to “work” on this refactoring task for 12 hours straight while doing a bunch of leisurely activities on the side! This was something I hadn’t experienced before. Before using Beads, if I wanted to work for such a long period of time on a repetitive agentic coding task, I would have had to write a repeatable prompt I could copy and paste into context manually after every compaction event and do much more active driving of the session. With Beads, I don’t have to.

Gob Beads

Wanting More#

If you haven’t used Beads yet, you surely will be soon, as Anthropic added a Tasks feature to Claude Code called Tasks, which they built right into their application. I’m curious to try this out myself and compare which implementation/interface I prefer. One major difference between Tasks and Beads is that Beads is migrating to a Dolt backend, which we hope will supercharge the utility and have major functionality implications for Steve’s latest innovation, GasTown. But more on that in the future, as it develops.

But my experience with Beads has me thinking much more about agentic memory and what features I think will be useful to add to a coding agent or their runtime environment. There are also many Beads features that I’ve yet to try, but seem to address other pitfalls I’ve had while working with agents. One that’s particularly interesting is the built-in ability to “prime” the agent with the same context on startup and pre-context-compaction. This enables you to define some initial context once, then have it hydrated and rehydrated at the correct times to encourage consistent agentic awareness. Something like this is precisely what I need in DoltCash to give the agent their role/persona as my “vibe-accountant”, which I currently do by copying and pasting SYSTEM_PROMPT.md into context on session start, LIKE A CAVEMAN.

Anyway, are you into agents and finding the latest ways to make them more productive? Come by our Discord and let us know.

Blog