AI Agents in Practice: part 3
This article is part of AI Agents in Practice. In part 1 we explain what AI agents are and how function calling works. In part 2 we cover where agents do and don’t work. In this part: how do you actually build one?
Spoiler: the technology is the easy part. The real work is elsewhere.
The technical barrier is low
With tools like n8n, Make.com or Langchain, you can have a working agent in an afternoon. You choose your tools, write a prompt, connect everything and you have something that does things. That’s not an exaggeration, the technical barrier really is low.
The bad news? That’s also exactly where the danger lies. Start with no-code. Not because it’s better, but because you’ll learn faster where the problems are. Once you know that, you can always switch to a framework.
Building agents isn’t the problem. Testing agents, that’s the problem. And making sure they do what you want them to do.
Every agent has three layers
Before you start building, you need to determine three things. In this order.
Layer 1: what tools does it need?
An agent’s tools live in APIs. Access to your email, CRM, Google Analytics, Slack, a database. The first question is simple: which external services does it need to perform its task?
How you expose those tools matters. With the Model Context Protocol (MCP), you can make tools available in a standardized way, so your agent can use the same toolset regardless of the underlying model. That saves work if you later switch providers or want to deploy multiple models.
This is also the moment to think about risks. Every API you add is a potential failure mode, and every API with write permissions is a potential problem.
Layer 2: what is it allowed to do?
Here’s where it gets interesting. You have roughly two types of permissions:
Read permissions
Retrieving data. Relatively safe. The agent fetches information from your systems and does something with it. Your data goes into a model, but nothing changes.
Write permissions
Changing things. This is where it gets exciting: sending an email, creating a ticket, starting a campaign, spending budget. This is where things can go wrong.
Our advice: always start with read permissions only. Let the agent gather and analyze data. Only when that works stably do you add write permissions, one by one.
You don’t want your first experiment with agents to be an AI that’s allowed to spend money. That’s asking for trouble.
Layer 3: how do you instruct it?
This is the system prompt. The first instruction the agent receives before doing anything. Here you define:
- Who it is: “You are a marketing assistant for company X”
- What it should do: “You analyze campaign data and create weekly reports”
- What it may and may not do: “You may not send emails without approval”
- How it should respond: “Be concise, use numbers, no marketing speak”
The problem? You have to hope it follows that prompt correctly enough, and that you’ve explained that prompt well enough for all edge cases, which is nearly impossible.
Implicit knowledge is the real obstacle
This is where most agent projects fail, and almost nobody talks about it.
Your organization has a way of working. Implicit rules. Historical context. That lives in people’s heads, not in systems.
Say you ask an agent to analyze Google Ads data. But does it know which account to use if you have multiple? That campaign X is experimental and campaign Y is your core business? That there was an outage last month making the data unreliable? Or what today’s date is?
Every piece of missing context is a potential error. And the annoying thing: the agent doesn’t know it’s missing context. It makes assumptions and continues. The output looks complete, but is built on a foundation of gaps.
If people can’t transfer their knowledge to the model, the model does something. And “something” is rarely what you meant.
Before you build an agent, you first need to inventory what knowledge currently lives in heads and find a way to make that knowledge queryable. That’s often more work than building the agent itself.
From demo to production is where it goes wrong
A working demo is built quickly, but a reliable agent in production is a different story. There are three things that almost every agent project underestimates:
Model updates change behaviour
Models get updated, and a new model can behave differently. Better at benchmarks doesn't mean better for your use case. When your model changes, you need to test again. And if you can't measure how well the agent performs, you also don't know if the update broke your workflow.
Every write permission is a risk
When you give an agent access to systems, you're giving an AI the keys to the castle. Ask yourself four questions: what's the worst case scenario, is it reversible, who's responsible, and how do you detect that it went wrong? No good answer? Don't start with write permissions.
Without logging you're blind
When something goes wrong, you need to be able to look back at what the agent did: which API calls, which parameters, what conclusions. Without good logging you only see the output, not the dozens of steps that preceded it.
Every time the model gets updated, you essentially have a new employee who might work subtly differently than the previous one.
Conclusion: test before you build
The hardest question isn’t how to build an agent, but how to know if it does what you want. Without a clear way to evaluate the output, your agent isn’t saving time but creating a new source of uncertainty.
Start with read permissions
Let the agent first retrieve and analyze data. Only when that works stably do you add write permissions.
Solve the context problem
Inventory what knowledge currently lives in heads. Make it queryable before you start building.
Test obsessively
Define your scope absurdly specifically and test every edge case. With broken data, missing fields and extreme values.
Log everything
Every API call, every decision. Without logging, debugging is impossible.
Before you start building, ask yourself these two questions: how will you test if the agent does what you want, and who will evaluate the output? If you don’t have answers, don’t start building but start thinking about those questions.
And if you want to know where agents do and don’t work, read part 2 first. The red button problem and the context problem are essential background for any agent you build.
Need help building agents?
We help companies build AI agents that actually work. From strategy to implementation, with the lessons we’ve learned ourselves.