Blog

8 min read

Here at DoltHub, we first struggled to understand how Artificial Intelligence (AI) would impact our business. There seemed to be a lot of hype and not much substance, much like Crypto and the Metaverse previously. Then, we discovered Claude Code and started taking Waymos. The hype was all of a sudden real.

This article walks through a brief history of AI in preparation for tomorrow’s article about how these innovations impact databases.

What Can AI Do Today?#

AI stands for artificial intelligence. The big innovation in AI over the past few years is generative models. Generative models started with text and are now very good with image, audio, and video as well. An AI can generate realistic text, image, audio, and video output based on a simple prompt. Generative AI models are getting better quickly. The companies that make them, OpenAI and Anthropic, have become household names.

Generating content from text prompts is generally useful, but a few specific use cases have emerged. Let’s dive into a select few of those in detail roughly ordered from oldest to newest.

AI Timeline

Universal Similarity Search#

The universal similarity search use case has been around since the earliest generative models. A generative model can take an input whether that’s text, an image, audio, or a video, and summarize it into a fixed length vector (ie. series of numbers) called an embedding. The distance between two vector embeddings indicates how similar the two inputs are in the eyes of the model that generated the embeddings. Since the model exhibits a human-like understanding of the input, the similarity returned is what a human would identify as similar. Universal similarity search can be used directly in a number of applications like search and recommendations.

Universal similarity search is also useful for generating content or answers with a generative AI model using a process called Retrieval Augmented Generation or RAG. Generative models are trained on public data. So, if you ask a generative model questions in an enterprise setting about private data, the answer will likely be incomplete or wrong. RAG solves this problem. First, you index all your internal documents into vector embeddings using a generative model. Then, you use the same model to embed the user prompt. Then, by using similarity search between the user prompt and your indexed documents, you augment the prompt with the top N most similar documents. In theory, the generative model will now consider your private data when composing a response.

However, in practice, data sent in context effects the generative model differently than data used to train the model. If you’ve used a coding agent, you probably realized that incorrect or excessive information in the context yields frustrating results. Data in the context receives hyper-focus from the model. The model assumes the data must be important. So, RAG flooding the model context with similar documents seems risky at best. We tried RAG internally for a blog article generator, and the results were worse than without RAG. It’s not just us. The internet is not filled with many RAG success stories. However, I’m sure in some use cases, RAG is better than nothing. My intuition is that RAG use cases will switch to post-training using private data on open or licensed proprietary models.

Summarize#

Generative AI is very good at summarizing information. The model is trained on the entire internet’s worth of data and then some. A user creates a prompt requesting some piece of information in some style. The model generates a bespoke summary of the information in the model that answers the user’s prompt.

The main consumer use case for this is AI chat with the main brand being OpenAI’s ChatGPT. Effectively, this use case is a better web search for many queries. As many of you know, Google search has long been one of the most profitable and defensible businesses on the internet. So a new challenger is exciting. Anecdotally, since ChatGPT was released in 2023, search traffic to dolthub.com is down about 50%. We believe these search queries have been replaced by ChatGPT and other AI chat queries.

Google under siege

Since ChatGPT was released in 2023, there have been many improvements to these models. The models hallucinate less and produce more factually accurate information. The models are better at identifying errors or anomalies in large amounts of information. In short, the models are smarter and will likely continue to get smarter. These improvements have opened up research and editing use cases traditionally served by clerks or analysts.

Write Code#

Generative AI write summaries meant for human consumption. These summaries are not authoritative writes to the system of record. For AI to actually do “real work”, AI must make writes to the system of record. Enter AI coding, the first AI use case where AI was reliably writing to the system of record.

AI coding came on slowly then all at once. In 2024, AI-assisted coding was popularized by the Integrated Development Environment (IDE) Cursor. Cursor bolted an AI-connected chat window onto the side of Visual Studio Code, a popular open source IDE. To get the AI to write code, you instructed it to do tasks via the chat window, similar to a conversation with ChatGPT. I found this process very slow and frustrating, but others claim it increased their productivity.

Cursor

Then in Q2 2025, agent-based AI coding burst onto the scene. First, Cursor added agent mode. Then, a command line tool called Claude Code was released by Anthropic. Coding agents looped on a prompt, ensuring the code the agent produced compiled and passed tests. This simple change meant that a coding agent could work for minutes in the background and reliably produce production code for small to medium sized tasks. Coding agents work and reliably make writes to the system of record, opening up the possibility for the automation of many more knowledge work tasks.

Coding Agent Looping

Existing version control systems for code facilitated agentic coding. Version control allowed agents to work in parallel to their human operators. Version control allowed an agent’s code to be reviewed. If the agent’s code was wrong in some way, the code could be easily discarded or rolled back using version control.

In parallel to Cursor tackling AI-assisted coding, a parallel category of tools called App builders was introduced. App Builders use AI to build full-stack websites or mobile applications. Popular App Builders include Replit, Vercel v0, and Lovable. App builders not only write code but also handle deployment and operations, making them very popular for non-coders and prototyping. Being full stack, App builders generally include a database which is directly relevant to this article.

Self-Driving#

Outside the realm of information technology, AI also has a profound physical world application. Cars can drive themselves! Self-driving cars use generative AI to predict the next few seconds of the surrounding environment and react accordingly.

DoltHub is headquartered in Santa Monica, and we are one of the early Waymo service locations. Tesla makes cars with a self-driving option as well, and they are rolling out a Waymo-like service in Austin, Texas. Self-driving cars are comfortable and safe. In the next few years, I suspect we will see self-driving cars in most cities.

Waymo

What Will AI Do Next?#

Based on what AI is doing already, let’s speculate on what we will see AI do next.

Cursor For Everything#

Based on the success of AI-assisted coding in tools like Cursor, we are going to start to see Cursor for Everything. Coding started with this interface. I think most other AI automation will start with this interface as well. Google has already added a Cursor-like UI to Google Docs and Google Sheets. I suspect most applications are going to have an AI-assisted mode powered by a chat window on the side.

Instead of manually performing tasks in an application’s user interface, a user will instruct an AI to do a task for them through the chat window, iterate, and review the results before committing the final result to the system of record. The main window of the application will look and work the same as usual with some slight tweaks to expose what the AI is doing or has done.

Cursor for Everything

There are some challenges to be solved with version control and automated verification for these applications to work as well as Cursor. What did the AI do? Is it correct? How do I review? A database for AI could help solve some of these challenges.

Agents For Everything#

Just as chat-assisted coding morphed into full agent mode, I think you’re going to see other specialized agents capable of performing other small- to medium-sized business tasks. Ideal tasks have verifiably correct output such that an agent can loop until it gets the correct result instead of needing human-in-the-loop, “Cursor for Everything” applications. Again, having a version-controlled system of record so that an agent’s changes can be done in isolation, audited, rolled back, and reviewed becomes critical.

Once we get agents for more business tasks, we’re going to get multi-agent systems. Specialized agents are going to assist in most business tasks and whole business workflows are going to have tasks passed between specialized agents. The goal would be an agentic closed system where humans just watch work get completed and react to anomalies. For Dolt, I could imagine an agentic SQL tester finding Dolt bugs and creating issues on GitHub. Another testing agent would create a skipped test for the bug and get it reviewed by the code review agent. Finally, a coding agent would pick up the task, fix the bug, and again get it reviewed by the code review agent. Once complete the change would be merged and a Dolt release agent would ship the change to the world. This could all happen without a human ever reviewing any code given high enough code quality.

Advanced Robotics#

Beyond self driving we’re going to see a robotics revolution. Generative AI has provided the “vision” and “brain”. Cameras, sensors, batteries, and electric motors have become small and cheap enough to open up many new robotics use cases. The use case capturing most people’s imagination are humanoid robots. I’m sure we’ll see other more specialized robots, especially in the warehouse and factory settings, before we have an in-home robot to do our dishes.

Conclusion#

As you can see there’s been a lot of AI innovation in a short period of time and we think there is more to come. I hoped this article prepares you for how all this change impacts the database space. Did I miss something? Just let me know on our Discord. We always like to chat.

Blog