Claude Code Gotchas

AI
7 min read

Have you met my new best friend, Claude Code? It's not perfect but it's pretty darn good. I've used Claude Code for a month now to write code for our open source, version controlled database, Dolt. I've produced dozens of accepted Pull Requests while still not knowing how to code in the project's language, Golang.

Over that time, I started to notice some patterns of Claude Code misbehavior. None of these issues are show stopping but they can be annoying. I wanted to point them out and suggest any mitigations I've discovered. I think some of these are common knowledge already but I have a hard time finding good AI content. This tool is new and exciting so maybe this will be helpful to other users.

Imperfections

As I said, I've been using Claude Code for about a month on tasks of various sizes and levels of ambiguity. Some tasks have been bug fixes to existing code. Some tasks have been new features that leverage a small amount of existing code. As I've used Claude Code more, I've observed some failure modes. Here's a quick synopsis of each.

  1. Claude Code gives up too early.
  2. Claude Code runs out of context. After it compacts the context, it's dumber.
  3. Claude Code writes a lot of failing tests and needs to see the tests fail to fix them.
  4. Claude Code will change the test to match bad code when it's way easier to do that than fix the code.
  5. Claude Code forgets how to compile, or that it needs to compile to run tests.
  6. Claude Code leaves crap around in the working directory.
  7. Claude Code uses weird Git commands.
  8. Claude Code will decide to rewrite something and leave the old stuff around.

Now, I'll dive into each and explain any workarounds.

Gives Up

I noticed for larger tasks, Claude Code will sometimes just give up. I made this exit text up but it will saying something like:

I've made significant progress on the feature you requested. The functionality works 
correctly for 
* <bulleted list of these very narrow cases>. 

However, the requested functionality does not work 
* <at all for these bulleted list of major cases>. 

The code is well factored and tested. This is a good start. 

Thanks Claude. I've had some success with a "please implement the remaining cases that don't work" prompt. But, I think if Claude Code does this it is a signal to you that the task you gave it is bumping up against the limits of its powers. Can you break the problem down into two or more separable tasks?

For instance, Claude Code really struggled with this feature Pull Request. The two tables I asked it to implement are very similar in functionality. However, doing both at one time was very taxing on Claude Code, exhibiting many of the listed failure modes in this article. I argued with Claude Code for almost two whole days and it cost me about $100 in tokens. Claude Code gave up multiple times. In the end, I was able to get a functioning Pull Request.

Later, I needed to do a very similar task and broke the task up into two, this PR and this PR. Claude Code pulled each off in less than 10 minutes.

It seems the smaller and more isolated the problem, the better. Even if you would group the tasks together as a human because your context would help you achieve both faster, the same does not hold for Claude Code. Break up larger tasks.

Dumber after Context Compaction

For larger tasks, Claude will often hit the limit of what it can hold in context. At this point, it performs a context compaction that takes a couple minutes. After it is done, it prints out what the new context is which is a cool summary of what you've done in the session up to that point.

The summary is cool but Claude Code is definitely dumber after the compaction. It doesn't know what files it was looking at and needs to re-read them. It will make mistakes you specifically corrected again earlier in the session. If it gave up earlier, it may give up again.

However, and this is interesting, sometimes if Claude Code was on the wrong track, the context compaction is just what the doctor ordered. The thinking process restarts after compaction and all of a sudden progress is being made again. Context compaction seems to be like a garbage collection process. If you've filled up your context with too much garbage, this clears it out. You can trigger compaction manually with the /compact command so if you notice Claude Code getting lost, it's a good thing to try.

I've never had to do this but others have told me sometimes even compaction can't fix a bad path. In this case, it is best to /clear and re-prompt as if Claude has gotten no instructions. If you do this, I think you should pair this with a git reset --hard on your working branch. Otherwise, Claude Code will get confused about what code is new and which code is existing.

Writes Bad Tests Initially

Claude Code needs tests to reach its full potential. Claude Code knows that code it generates must compile and related tests must pass. A strong existing test suite goes a long way to improving your Claude Code experience.

Dolt is extremely well tested. Dolt has:

  1. ~42,000 Golang SQL Engine Tests
  2. ~3,000 Bash Automated Testing System (BATS) tests

Claude Code has a lot to work with. However, even with all these examples, having Claude Code examine existing tests to construct new tests for itself is often fraught with peril.

Often, Claude Code will generate tests look right at first glance but fail on first encounter with implemented code. Claude Code will loop until both the code compiles and the tests pass. A poorly defined test can throw Claude Code into a death spiral of bad test, bad code, feature not to specification.

Thus, I recommend using Claude Code in a "test driven development" fashion by having it write the tests first. Then, spend a bit more time than usual in review of the generated tests. After this process, as it implements your bug fix or feature, be very wary of changes to your tests.

Changes Tests instead of Code

Following on from the above, Claude Code is not bashful about modifying tests to be less specific or worse, changing the test to assert the implemented (wrong) behavior. When challenged, it will often even say something like "this is how it should work anyway". Again, be very wary of changes to your test files.

Forgets to Compile

Claude Code will forget how to compile your application. Even if the steps are in CLAUDE.md, Claude Code will get confused and may need help with compilation. This is especially true when dealing with dependency changes like those specified in go.mod.

Claude Code will also forget to compile before it runs your tests. This can be frustrating but I take solace in the fact that I've made the mistake myself. If you work on interpreted languages, interspersed with compiled languages, it's natural to just write the code and run the tests. Claude Code must have a lot of interpreted language data in its training data.

Claude Code will be looping saying the tests pass or fail looking for some smoking gun. I often have to press esc and tell Claude Code to go install so BATS gets the new dolt. "You're absolutely right!" and it's back on track. Once you know the pattern, you can save yourself a lot of tokens by making sure Claude Code is not stuck in this fashion when tests are passing or failing unexpectedly.

Messy Working Directories

Claude Code will often write testing scripts, initialize and test against a Dolt database, or run go build to make sure Dolt compiles. These all leave artifacts in your working directory. Sometimes Claude Code will clean these up. Other times it will just leave them there. I have accidentally committed these files to our Git repository and had to clean them up manually, only noticing becasue the dolt I accidentally committed is 100MB.

Fortunately, the work around here is fairly simple. Run git status to see what Claude Code has changed. Review the relevant files and delete the irrelevant ones before adding and committing to Git.

Git is too Dangerous

Which brings me to my next point. I do all the Git stuff. At first, I let Claude Code handle branching, committing and rollbacks. However, I was getting in situations where the wrong changes were in my merge base when I created a Pull Request on GitHub. For one Pull Request, I had to make a patch, delete the branch, and apply the patch to a new branch. This wasted a bunch of time.

Now, I drive Git and let Claude make changes to the files. I think this is a better workflow because it encourages local human review. After Claude Code thinks it is done, I look at git status and git diff as a key part of the workflow. I spot check changes before committing and pushing to GitHub and performing a full review. If I need to revert or reset something, I do it and tell Claude Code what I did. This seems to work better than letting Claude Code run wild with Git commands.

Rewrite without a Corresponding Delete

For my large PR described above, at one point, Claude Code decided to make a new implementation from scratch. Instead of deleting what was there, it created parallel functions with the "New" prefix. It ended up making a working implementation which was great. But even after being instructed to clean up the old code, it left a partial implementation that my colleague caught in code review. The dead code was sufficiently interspersed in the file that I did not catch it on review. This was somewhat embarrassing. My colleague asked "There's a bunch of objects created that are never used. Does Claude Code usually do that?"

The good news is this was fixed after clearing the context and starting with a new prompt of "There's a lot of dead code in file X. Please examine the code carefully and remove any dead code." It seems that Claude Code can't clean up after itself sometimes when the context is cluttered. Start a new session and have it delete dead code. It seems like that is a reasonable task especially if constrained to a single file.

Conclusion

Despite all the imperfections, Claude Code is still my new best friend. You should definitely try it on your code or a Dolt issue if it suits your fancy. Any weird Claude Code behavior you've observed that I missed? Come by our Discord and tell me about it.

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt

Or join our mailing list to get product updates.