AI StrategyOperational Design

Most AI Agents Should Have Been Automations (And Why That's Good News)

13 May 2026·6–8 min read

Most AI Agents Should Be Automations

A pattern keeps showing up in conversations with founders and operations leads. Someone walks in asking for an AI agent. They've seen the demos, read the LinkedIn posts, maybe even told their board it's on the roadmap. By the end of the conversation, the honest answer is usually the same: what you actually need is an automation with one language model call in the middle.

That's not a failure of ambition. It's the opposite. It's the version of the project that ships, works on a Tuesday, and is still working three months later.

The label problem

Walk through the AI tooling market and you'll find that a large share of what gets sold as an "agent" is really a workflow with an LLM call wired into one of the steps. The branding makes sense from a marketing angle. "Agent" implies autonomy, intelligence, a digital colleague. "Automation with a model call" sounds like something your IT team built in 2014.

But the label matters because it sets expectations. When a founder builds or buys an agent, they expect a system that can handle ambiguity, make judgement calls, and adapt to situations nobody mapped out in advance. When they get an automation dressed in agent clothing, they're disappointed even when the thing works perfectly. And when they get a real agent for a job that didn't need one, they get something brittle, unpredictable, and expensive to run.

Where agents quietly fall over

The failure mode is consistent enough to be predictable. Agents struggle in production because they're handed too many decisions at once. A well-designed automation has one decision per step, with a clear rule at each branch. An agent gets a goal and is told to figure it out. That's beautiful in a demo, where the inputs are clean and the stakes are low. It's brutal in a real customer support queue at two in the morning when someone phrases their refund request in a way nobody anticipated.

Three things tend to break:

The first is auditability. When something goes wrong — and something always goes wrong — you need to be able to trace what happened. Automations give you that for free. Agents, especially the ones reasoning across multiple tool calls, leave you reading transcripts trying to figure out why the system decided what it decided.

The second is cost. Agents that loop through a problem, calling models repeatedly until they're satisfied, can burn through tokens at rates that look fine in pilot and horrifying at scale. I've seen monthly bills that would have funded the full-time hire the agent was meant to replace.

The third is trust. A flaky agent doesn't just fail in the moment. It teaches the people around it not to rely on AI for anything. That's a much harder hole to climb out of than a clean automation that has a known scope and stays inside it.

A simple test

Before committing to an agent, it helps to walk through a short set of questions. If you can answer the first one cleanly, you probably don't need an agent at all.

Can you draw the workflow as a sequence of steps? If yes, build an automation. The clarity you have on paper is a gift — don't throw it away by handing the work to a system that has to rediscover the structure every time it runs.

Does the workflow involve genuinely unpredictable inputs and more than a handful of meaningful branches? This is where agents start to earn their keep. If you can't enumerate the paths, an agent's flexibility becomes useful rather than risky.

What's the cost of the worst plausible wrong answer? If it's high — financial, regulatory, reputational — lean toward automation. Constraint is your friend when mistakes are expensive.

Will compliance ever review this? In regulated industries the answer is usually yes, and that pushes the decision firmly toward automations. SOC 2 and HIPAA reviewers want to see deterministic flows they can reason about. An automation passes that conversation. An agent turns it into a months-long exercise in documentation.

So where do agents actually shine?

The skepticism above is real, but it's not the whole picture. There are jobs where an automation can't reach and an agent is the right tool. Three patterns come up often.

Research and synthesis across messy sources. When the task is "go find me everything relevant about X across these twenty places, read it, and tell me what matters," you're describing work that's genuinely hard to specify in steps. The inputs are unstructured, the paths through the material depend on what's found along the way, and the output is a judgement, not a transaction. This is where agentic systems — ones that can plan, search, read, and revise — actually do something a workflow can't.

Long-horizon software tasks with feedback loops. Coding agents working inside a defined repository, with tests they can run and errors they can read, are a real category. The agent's ability to try something, see what broke, and adjust is exactly the loop that makes the approach work. The constraints (the codebase, the test suite, the type system) give the agent enough structure to be productive without enough freedom to wander off.

Customer-facing conversations with genuine variety. Not every chatbot needs to be an agent. Most don't. But when the inbound questions span a wide enough surface area, and the value of resolving them in one turn is high enough, an agent with access to the right tools can outperform a decision tree. The trick is scoping the tools tightly enough that the agent can't take a destructive action without a human in the loop.

The thread running through all three: agents work when the environment provides enough feedback for the agent to course-correct, and when the cost of a bad intermediate step is low because something downstream catches it.

The part that gets left out

Here's what most of the agent-versus-automation debate misses. Both sides talk about the technology as if it's the point. It isn't.

The point is the person whose work changes. A clinician who gets four hours back because intake routing happens automatically. A finance lead who isn't manually reconciling discrepancies anymore. A support team that stops drowning in tier-one tickets and starts having actual conversations with customers.

The question isn't "agent or automation." The question is what gets handed to the system, what stays with the human, and how the handoff between them is designed. Get that right and the underlying technology choice becomes a downstream detail. Get it wrong and the most sophisticated agent in the world won't save you, because you've automated the wrong thing or pushed something onto the human side that the system should have handled.

This is where most projects actually succeed or fail. Not in model selection. Not in framework choice. In the careful, often unglamorous work of mapping what people do, where they get stuck, what's worth their attention, and what isn't.

A good rollout starts there and lets the technology follow. The team trusts the system because it does what it said it would do. The system earns more responsibility over time because the easy wins are stacking up. The humans in the loop are doing work that uses their judgement, not work that's just feeding the machine.

That's the rollout that compounds. Not the one that ships an autonomous agent in week one and quietly rolls it back in week six.

The honest take

If you're a founder weighing this, the boring answer is usually the right one. Start with the smallest automation that solves a real problem. Ship it. Measure it. Let the team feel the difference. Then look at what's still hard, and ask whether the next step really needs an agent or whether it's another well-scoped workflow waiting to be built.

Agents have their place. That place is smaller and more specific than the marketing suggests, but it's real and it's growing. The teams that figure out which jobs go where — and who keep the humans in the picture at every step — are going to build the things that actually work.

The rest will keep buying agents that should have been automations, and wondering why the demo never quite matches the deployment.

Back to Insights