The difference between a chatbot and an agent is agency: an agent doesn’t stop at an answer, but at a state change — a sent email, an updated record, a processed lead. That’s a huge value, but also responsibility. Agency without boundaries is a risk, so we design boundaries alongside agency.
Three Pillars of Agent Security
#- Tool allow-list — the agent has a catalog of permitted tools (e.g., navigation, search, booking), not unrestricted system access. What’s not on the list, it won’t do.
- Human-gate — irreversible actions (sending, payment, data modification) require a server-side confirmation token, signed with HMAC. The model’s declaration alone isn’t enough — you need a human “yes” where undo isn’t possible.
- Full log — every step (thought → tool → result) is logged, so after the fact, you can replay what the agent did and why. No trace, no accountability.
How Agent Risk Differs from Chatbot Risk
#| Criterion | Chatbot | Agent |
|---|---|---|
| What it does | returns text | changes state |
| Error impact | wrong answer | wrong action |
| Required barriers | output guardrails | + allow-list + human-gate |
| Trace | conversation | log of every step |
| Supervision | answer review | action confirmations |
That’s why agents aren’t deployed “wild.” We also describe the boundary between conversation and execution in the post agent vs chatbot.
Gradual Relaxation of Supervision
#We don’t start with full autonomy. The agent begins with a tight human-gate (you confirm almost everything), and as trust evidence grows — logs are clean, decisions accurate — we loosen gates on proven paths. The same approach as with prompt injection: security built-in, not bolted on.
Try It Live
#We launch the agent in a secure sandbox with a transparent trail (playground: PII masked, zero retention). Ask the model to outline task steps:
FAQ
#Is an AI agent safe if it operates autonomously?
#It’s safe when it has clear boundaries: a tool allow-list, human-gate for irreversible actions, and a log of every step. Agency without these barriers is a risk, which is why we design them from the start. The agent operates autonomously within a narrow, well-defined scope — not “in general.”
What is a human-gate?
#It’s a point where an irreversible action (sending, payment, record modification) requires human confirmation — technically, a server-side token signed with HMAC, not just the model’s decision. So even if the agent “decides” something needs to be done, it won’t proceed without the green light.
Where do I start with agents?
#With one narrow, repeatable process under tight supervision — you confirm almost everything, logs are complete. As trust evidence grows, you relax gates on proven paths. That’s how you safely give AI agency, step by step.