Dueling Agents: Claude and Codex

April 22, 2026

8 min read

Last July, at the dawn of the vibe code era, I compared Claude, Codex, Gemini, and Cursor, crowning Claude the champion.

A lot has changed with vibe coding tools since July 2025. As frequent readers know, I’ve joined the vibe coder side in the brewing vibe code vs trad code holy war. I threw down the gauntlet after I spent a week in Gas Town building DoltLite, a version-controlled SQLite. If you think DoltLite or vibe coding is cool, throw it a star on GitHub.

Since the initial DoltLite release a month ago, various agents and a couple customers are hammering out the DoltLite kinks almost non-stop. DoltLite is up to 1352 commits and 30 releases. I even merged upstream SQLite. Through this process, I learned a lot about vibe coding. I moved away from Gas Town and settled on a different agent set up: dueling Claude and Codex. This article explains.

Leaving Gas Town#

Gas Town is a coding agent orchestrator. It burst on the scene on New Year’s Day 2026. Gas Town gained instant popularity, but we here at DoltHub like to believe its adoption of Dolt as its primary database took it to the next level. Suffice to say, DoltHub has a strong affinity to Gas Town. We’ve written three blogs about using it. We love Gas Town and are part of the Gas Town community.

Gas Town’s orchestration model is primarily hub and spoke. The hub is called the Mayor. In practice, the Mayor is a coding agent session, Claude by default, that manages other coding agent sessions. You, as the the user, primarily talk to the Mayor. The spokes are called Polecats. Polecats are coding agent sessions that do specific tasks. Gas Town has a whole custom lingo loosely inspired by Mad Max. The beauty of Gas Town is you have a conversation with the Mayor and the Mayor creates tasks and farms them off to be worked by as many Polecats as the Mayor can muster. If there’s a problem with a Polecat, you ask the Mayor to fix it.

For DoltLite, Gas Town was incredible at getting started quickly. Within 4 hours, I was able to have a working DoltLite prototype with Prolly Tree storage that passed all ~100,000 SQLite acceptance tests. If you have a lot of independent, achievable tasks, Gas Town’s model really works. The Mayor delegates to a handful of Polecats. Merges or PRs land on GitHub. Gas Town will get you from zero to working faster than an individual agent.

However, even towards the end of my first week in Gas Town, the parallel execution became a burden. As I got closer to a working DoltLite, there was more back and forth with the agent discussing design, testing, and solutions. My Gas Town experience degraded to me working directly with my Mayor to fix gnarly bugs or performance problems. Delegating work to Polecats often ended with wasted work or worse, confusion. Gas Town was still getting work done, but I wasn’t using many of the features.

Moreover, Gas Town spawns background agents other than Polecats to keep the town running smoothly. These agent sessions aren’t free. So, I had the sneaking suspicion I was spending more money on Gas Town than I needed to if I wasn’t going to use the parallel execution features.

I used Gas Town for another few weeks improving DoltLite but a couple weeks ago, I shut down my town and went pure Claude. I wanted to see the difference. Was my session with the Mayor similar to a pure Claude Code session? The agentic coding space is evolving so quickly I was curious to compare the latest tools.

Pure Claude#

After a bit of set up, I had Claude (with --dangerously-skip-permissions on) start working on DoltLite issues. Immediately, I noticed Claude was far more verbose than my Mayor. Claude just likes to output text about what it is doing or has done. It’s not uncommon for a summary of what Claude has done to overrun my full 80 character x 92 line terminal. I’m not sure what Gas Town does to tune the Mayor but it’s the right level of verbosity for me.

Note, Claude now has plan mode and sub-agents so you can still invoke the raw Claude versions of Gas Town features if you need them. In Gas Town, these modes are the default. When I need Claude to execute a complicated task, I use phrases like “make a plan”, “think deeply”, and “take your time”. These phrases improve Claude’s reasoning even if you have /effort max on. If I want to use sub-agents, I say “do this in the background” or “do this in parallel”. Claude will not go as wide on sub-agents as Gas Town and is worse at understanding task dependencies. So the end result is you end up driving Claude more than Gas Town.

Driving is the main benefit of using Claude over Gas Town. Claude wants to have a conversation about what we’re building. At this stage of DoltLite where I’m mostly finding and fixing bugs, this is the mode I need. I’m modifying very fine details of DoltLite’s implementation, often making non-obvious design decisions or trade-offs. I can’t trust an agent working in the background to make these decisions yet. As noted earlier, in the later stages of DoltLite, if I let a background agent cook in Gas Town I was disappointed with the result. Using Claude forced the “drive me” mode of operation that I needed.

I had worked in Claude for a few days before I read Aaron’s scathing DoltLite code review. I had the sneaking suspicion that Claude was playing fast and loose but Aaron confirmed it. I needed to focus on code quality: refactoring duplicate code into functions, enforcing disk durability requirements befitting of a database, and issues of that ilk. Previously I had Claude review its own code and I was not very impressed with the results. I had read on X that Codex was tuned to be more of a stickler for code quality. What if I had Codex review Claude’s code?

Codex as the Foil#

I purchased a $100/month Codex plan and pointed it at a fresh clone of DoltLite. I prompted Codex to review DoltLite’s code. I even put a little motivation in there about Claude having wrote the code. Codex code reviews have been so fruitful that I haven’t really stopped since. Two weeks later, Codex continues to find issues with a simple “review the code” prompt.

Codex finds code quality issues that Claude ignored. You can start to see the types of changes Codex makes starting in DoltLite 0.5.2. The changes are the ones where the label is missing the Claude logo. Codex was capable and willing to centralize logic like in this change. Codex is tuned for quality.

As I became more comfortable with Codex reviewing DoltLite code, I started to be more directed with the prompts when Codex hit an architectural flaw. The last three major breaking releases 0.6.0, 0.7.0, and 0.8.0 were driven by Codex discovering architectural issues with Claude’s (via Gas Town) original design.

I would definitely recommend adding Codex to your coding agent mix. It’s tuned differently than Claude, definitely much more of a grumpy code reviewer. Codex is at least complimentary and at best good enough to be your only agent.

Where I settled#

I haven’t gone pure Codex. I still run Claude in another terminal. I settle on the dueling banjos approach, Claude on the left and Codex on the right.

Dueling Claude Codex

Codex is good at review and fine tuning. Claude is better at features and testing frameworks. I’d still reach for Claude if I was starting something new. Let’s review the pros and cons.

Claude#

Pros#

Much better at tool use like navigating bash or GitHub. This smooths the entire experience.
Makes good plans. Can execute longer, more complicated tasks.
Tests code thoroughly. Test failures rarely make it to continuous integration (CI).
My go-to if I want the code explained to me.

Cons#

Lazy even on effort max. Gives up quickly. Needs prompting to complete all tasks. Rewrites tests to pass.
Duplicates code. Does not naturally factor common logic into objects or functions.
Too much explanation. Quit talking and fix it.

Used For#

Completely new builds.
New features.
Testing frameworks.
GitHub Actions manipulation/debugging.

Codex#

Pros#

Great bug finder. Will find bugs when Claude says the code is perfect.
Prefers clean code. Will expend effort to produce clean code. Will fix ugly code.
Faster. PRs per unit time is higher.
Tenacious. Fewer prompts to finish the task.

Cons#

Very little explanation. Here’s the code.
Doesn’t test enough locally. Failures often caught in CI.
Good at tool use but not as good as Claude. Will occasionally get stuck or waste time navigating bash.

Used For#

Code review.
Bug hunts.
Refactoring.
Making a project perfect.

Conclusion#

Using Claude and Codex in tandem improved DoltLite immensely. Claude and Codex are complimentary, both excelling at different types of tasks. I highly recommend using them both on the same codebase.

This experience made me wonder. Should I also run Gemini? What is it good at? Will there be hundreds of different coding agents in the future? Will I be able to switch agent tuning/personality within a single agent? Should I use aaron, the mean code reviewer agent, instead of claude? Maybe the interface is something like /personality aaron? I think some people use markdown in AGENT.md for stuff like this but I’m not sure that gets you all the way from Claude to Codex. The differences seem more fundamental to the construction of the model and agent harness.

Exciting times. Even though Claude dueling Codex is working for me now, I’m curious to try new coding agent workflows on DoltLite. If you have one you’d like me to try, find me on our Discord. The cool kids hang out in #doltlite🪶.

Blog

PRODUCTS

KEYWORDS

Dueling Agents: Claude and Codex

Leaving Gas Town#

Pure Claude#

Codex as the Foil#

Where I settled#

Claude#

Pros#

Cons#

Used For#

Codex#

Pros#

Cons#

Used For#

Conclusion#