Developing an AI Agent

AI
10 min read

I spent some time recently trying to get AI agents to import data. My first, and second attempts were largely unsuccessful. I ran into many issues, but I walked away feeling like this is something an AI agent should be able to do. I did some more experimentation before deciding to try my hand at building an AI agent specifically to import data. I started by trying to find good resources on building AI agents. In reality, there is very little out there beyond the documentation for the various agent frameworks. In this blog, I will share what I've learned so far.

Agent Frameworks

First, let's talk about what an AI agent framework is. At its core, it's a library that allows you to send requests to LLMs. There are many other features that these frameworks may support such as session management, extensibility options like mcp server support, function calling, and built in tools; however, the core functionality is sending requests to LLMs.

After doing some research, I decided to use OpenAI's Python Agent SDK. I liked how well the underlying OpenAI API was documented. Understanding how the underlying API works is really helpful when working in a new field, and that understanding provided the greatest insight into what was happening behind the scenes when using the Python Agent SDK. Using something like LangChain may have led to better results, but the details of what was happening behind the scenes would have been more opaque, and I would have learned less.

A Simple Agent

Using the OpenAI Python Agent SDK, the code to create an instance of the Agent class and run it is very simple. Here is a basic example (If you want to run this you will need to follow the installation instructions

import asyncio
from agents import Agent, Runner

async def main():
    agent = Agent(name="Assistant", instructions="Answer whatever questions I ask you.")
    result = await Runner.run(agent, "Generate a random number between 1 and 100.")
    print(result.final_output)

    result = await Runner.run(agent, "Now add 10 to the number you gave me.")
    print(result.final_output)

if __name__ == "__main__":
    asyncio.run(main())

As you can see, the code is trivial. It is one line of code to create the agent, and one line of code to run it. The agent here is given the instruction to answer questions, and then you provide a prompt (In this case a question for the agent) to the run function. After printing the response, we run the agent again with a follow-up question.

Let's see our output:

~/dev/test/agents-test>source .venv/bin/activate
(.venv) ~/dev/test/agents-test>python main.py
Sure! Here is a random number between 1 and 100: **47**
It appears you haven’t provided a number yet. Could you please provide the number you'd like me to add 10 to?

As you can see, the agent was able to generate a random number, but when I asked it to add 10 to the number it had given me, it had no idea what I was talking about.

Adding Context

Using an AI agent framework hides a lot of the details, so understanding what is going on at the API level is vital. LLMs are stateless. When you send a request to an LLM, it has no memory of previous requests unless you provide that context. Some APIs may allow you to send some sort of conversation ID, or last request ID so it can retrieve what has happened leading up to that request. However, these systems are separate from the LLM execution engine. They are a shortcut that takes away some of the control you have over what context is provided to the LLM, but let's look at what is being sent in our requests. Our initial request looks like:

{
  "model": "gpt-4.1",
  "input": [
    {
      "role":"developer",
      "content":"Answer whatever questions I ask you"
    },
    {
      "role":"user",
      "content":"Generate a random number between 1 and 100."
    }
  ]
}

That's a pretty easy question for our LLM to answer, and it responded with Sure! Here is a random number 1 and 100: **47**. Now our second request looks like:

{
  "model": "gpt-4.1",
  "input": [
    {
      "role":"developer",
      "content":"Answer whatever questions I ask you"
    },
    {
      "role":"user",
      "content":"Now add 10 to the number you gave me."
    }
  ]
}

Looking at the request, you can see the problem. Our LLM is stateless and we did not provide any context about the previous request. Providing that context to the OpenAI Python SDK is pretty trivial. They provide a Session interface which can be passed in when we call Runner.run. In this case we will use a SQLiteSession which is an OpenAI provided implementation.

import asyncio
from agents import Agent, Runner, SQLiteSession

async def main():
    session = SQLiteSession("conversation_name")
    agent = Agent(name="Assistant", instructions="Answer whatever questions I ask you.")
    result = await Runner.run(agent, "Generate a random number between 1 and 100.", session=session)
    print(result.final_output)

    result = await Runner.run(agent, "Now add 10 to the number you gave me.", session=session)
    print(result.final_output)

if __name__ == "__main__":
    asyncio.run(main())

And running this code outputs:

(.venv) ~/dev/test/agents-test>python main.py
Sure! Here is a random number between 1 and 100: **57**.
Adding 10 to 57 gives you **67**.

Now the agent has access to the previous requests and responses through the session, and it was able to answer our follow-up question.

If you are at all curious, the Session interface is pretty simple. An implementation simply gives read, write, and clear access to a list of messages. Here is an in memory implementation:

from threading import Lock
from agents.memory import Session
from agents.items import TResponseInputItem

class MemorySession(Session):
    """In-memory implementation of Session protocol."""
    def __init__(self, items : list[TResponseInputItem] = None):
        self.items = items

        if items is None:
            self.items = []

        self.lock = Lock()

    async def get_items(self, limit: int | None = None) -> list[TResponseInputItem]:
        """Get items from the session with optional limit."""
        with self.lock:
            return self.items[:limit] if limit is not None else self.items

    async def add_items(self, items: list[TResponseInputItem]) -> None:
        """Add items to the session."""
        with self.lock:
            self.items.extend(items)

    async def clear_items(self) -> None:
        """Clear all items from the session."""
        with self.lock:
            self.items.clear()

    def size(self):
        """Get the number of items in the session."""
        with self.lock:
            return len(self.items)

Context is one of the most important aspects to working with LLMs and building Agents. Every LLM has a context size limit, and managing the data can be a critical part of building a successful agent. This will become even clearer as we talk about tools.

Doing Things Locally with Tools

Tools are a way of giving our LLM access to external items. If you want your LLM to be able to read data from a file on your system, or write data to a database, you need to provide that functionality through a tool. The [OpenAI Python SDK]((https://openai.github.io/openai-agents-python) supports the @function_tool decorator to make functions available to the LLM. Here is an example of a tool that reads a portion of a file:

@function_tool
def read_file(file_path:str, start:int = 0, num_bytes:int = -1) -> bytes:
    """
    Reads a portion of a local file from a specified start position for a given number of bytes.

    Args:
        file_path (str): The path to the local file.
        start (int): The starting byte position to read from.
        num_bytes (int): The number of bytes to read. If -1, reads until the end of the file.

    Returns:
        bytes: The binary content read from the file
    """
    with open(file_path, 'rb') as file:
        file.seek(start)

        if num_bytes == -1:
            content = file.read()
        else:
            content = file.read(num_bytes)

        return content

Now let's create a simple file '/Users/brian/test.csv' with the following content:

w,x,y,z
324,94,6,52
0,54,-100,8.34
23,42,99,1000

Now here is a full sample program that uses the read_file tool to read `/Users/brian/test.csv' and answer a question about it:

import asyncio
from agents import Agent, Runner, function_tool

@function_tool
def read_file(file_path:str, start:int = 0, num_bytes:int = -1) -> bytes:
    """
    Reads a portion of a local file from a specified start position for a given number of bytes.

    Args:
        file_path (str): The path to the local file.
        start (int): The starting byte position to read from.
        num_bytes (int): The number of bytes to read. If -1, reads until the end of the file.

    Returns:
        bytes: The binary content read from the file
    """
    with open(file_path, 'rb') as file:
        file.seek(start)

        if num_bytes == -1:
            content = file.read()
        else:
            content = file.read(num_bytes)

        return content

async def main():
    agent = Agent(name="Assistant", instructions="Use your ability to read files from the local filesystem to answer questions.", tools=[read_file])
    result = await Runner.run(agent, "What is the sum of columns w,x,y, and z in the file /Users/brian/test.csv?")
    print(result.final_output)

if __name__ == "__main__":
    asyncio.run(main())

and running this code outputs:

(.venv) ~/dev/test/agents-test>python main.py
Here are the values from the specified columns in the file:

- w: 324, 0, 23
- x: 94, 54, 42
- y: 6, -100, 99
- z: 52, 8.34, 1000

Let's calculate the sum for each column:

- Sum of w: 324 + 0 + 23 = 347
- Sum of x: 94 + 54 + 42 = 190
- Sum of y: 6 + (-100) + 99 = 5
- Sum of z: 52 + 8.34 + 1000 = 1060.34

So, the sums are:
- w: 347
- x: 190
- y: 5
- z: 1060.34

Pretty cool. But what actually happened here? It's documented in OpenAI's Function Calling documentation. When we add a function with the @function_tool decorator, the SDK adds a description of that function to the request (This description comes from the function signature and docstring). If, while processing the request, the LLM determines that it needs to call a function to answer the question, it will return a response that indicates which function to call and what arguments to pass to that function. The SDK will then call the function, and add the result of that function call to the context, and re-run the LLM with that additional context. This process can repeat multiple times until the LLM is able to answer the question.

Now you can imagine how this could blow up the context size pretty quickly. Say you are importing a directory containing many csv files. It's possible that the LLM has no problem with the amount of data in any one of the files, but the combined size of all the files is too much for the LLM to handle. If our function calls add to the context, and we just keep adding more and more data, we will eventually exceed the LLM's context size limit.

I won't go into too much detail here, as the topic of context management is complex, but I will give you one example of a way to limit the amount of data being added to the context.

Filtering Out Old Function Calls and Responses

When you send a request to the LLM, there may be several tool calls made to answer the question. Each of those tool calls and their responses are added to the context. Once the question is answered, it's likely unnecessary for all of that data to stay in the context for subsequent requests. In the code below, we filter out tool calls and their responses once we have a response to our prompt.

    async def to_in_mem_session(session : Session) -> MemorySession:
        items = await session.get_items()
        return MemorySession(items)
        
    async def filter_add_items(session: Session, items: list[TResponseInputItem], filter_func):
        to_add = []
        for item in items:
            if filter_func(item):
                to_add.append(item)
        if len(to_add) > 0:
            await session.add_items(to_add)

    def include_non_tool_calls_and_responses(item: TResponseInputItem) -> bool:
        if 'type' in item:
            if item['type'] == 'function_call' or item['type'] == 'function_call_output':
                return False
        return True

    async def run(agent, session, prompt:str):
        if session is None:
            raise ValueError("Session cannot be None")
        if len(prompt) == 0:
            raise ValueError("Prompt cannot be empty")

        mem_sess = await to_in_mem_session(session)
        size = mem_sess.size()

        result = await Runner.run(agent, prompt, session=session)

        new_items = (await mem_sess.get_items())[size:]
        await filter_add_items(session, new_items, include_non_tool_calls_and_responses)

        return result

In this code, our run method copies the current session context into an in-memory session saving the size before our agent runs. After our agent runs, during which time any number of items could have been added to our session, we take the new items and filter out any tool calls or their responses before adding them to the original session. Here we will likely only add the original prompt and the final response to the session. If for some reason that data is needed in the future, you can provide the same tools to the agent so that it can call them again.

Agent Scope

Now that we have a basic understanding of how the agent works, we can attempt to design an agent to import data. This is where you realize how new this field is. There are no best practices, no design patterns, no established ways of doing things, and the results are non-deterministic. You can try the same thing multiple times and get different results. We understand our inputs. How our context is handled, how tools are called, but we have no idea what the LLM is going to do with that information. Managing the scope of work the LLM is being asked to do feels like more of an art than a science. If you limit the scope to something small, you are likely to get a good result, but at that point, you might as well write the code to handle it. Our initial example could simply be written as:

import random
number = random.randint(1, 100)
number += 10

Additionally, if you limit the scope too much, you don't allow the LLM to show its strengths. This is where the art comes in. You have to find the right balance of scope. You want to give the LLM enough freedom to show its strengths, but not so much that it gets lost.

Conclusion

Developing an AI agent is a new experience for most software developers. There are many unknowns, and there is a lot of trial and error. However, the potential is great. I believe that AI agents will become a common tool in the software developer's toolbox. Stay tuned for a future blog where I talk specifically about d00lt (double 0 Dolt). A data importing agent I'm working on... with some success. If you have insights, or suggestions, drop into our Discord. server and share your thoughts.

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt

Or join our mailing list to get product updates.