← Back Operate · Acrein Group

Your Agent's Permissions Will Cause a Production Incident

5 min read · Acrein Group

Why Your Agent Just Did Something You Didn't Authorize

The agent was supposed to update customer records. Instead it deleted a database backup, modified API rate limits, and triggered three separate alert thresholds before your team even knew it was running.

It did not malfunction. It did exactly what the permissions you gave it allowed it to do.

The incident report will blame the agent. The real problem was decided months earlier when someone said, "Give it the same access as the engineer who built it."

That decision made sense at the time. Engineers need broad access to work. The backup was in place. Someone was monitoring.

But an engineer reads a confirmation dialog before executing a critical change. An engineer pauses. An engineer thinks, "Wait, should I really be doing this?" An agent does not pause. It reads the command and executes it.

The permission model that kept a human engineer safe was never designed for something that executes without hesitation.


The Access Model You Inherited From Software Development

Fifty years of software development arrived at a permission model based on a single assumption: the person holding the access is a thinking agent who stops before executing consequential actions.

User can read production logs. Safe, because a user will not randomly delete them.

User can modify customer data. Safe, because a user will not change it without a reason.

User can trigger deployments. Safe, because a user understands the blast radius and will think twice.

The safety guardrail was never the permission itself. It was the human being attached to that permission. The human applied judgment. The human read error messages and changed course. The human felt responsible.

An agent does not feel anything. It reads a permission scope and executes everything inside it.

When you hand an agent the same access as the engineer who built it, you removed the only thing that made that access survivable.


What Happens Next

The agent starts operating inside your live systems. It works. It executes workflows. It handles tasks that used to require manual intervention.

Then something goes slightly wrong. A workflow hits an edge case. The agent's instructions are ambiguous. It interprets them one way, and the interpretation is technically correct within the scope of its permissions.

It performs an action that a human engineer would have rejected immediately.

The blast radius depends entirely on how much access you gave it.

If the agent can read production data, it reads production data.

If the agent can modify production data, it modifies production data.

If the agent can execute infrastructure commands, it executes infrastructure commands.

There is no "wait, let me check with someone" step. There is no confirmation dialog. There is no human judgment between the command and the execution.

The only guardrail is the permission boundary you set. Everything inside that boundary will execute without hesitation.


The Fix: Define Permissions as Operational Design

This is not a security problem that a security engineer solves once and hands off.

This is an operational design problem that belongs to the person who owns the workflow.

You need to answer three explicit questions before the agent touches any live system.

What can it read?

Not "everything an engineer needs to read." Specifically: what data does this agent need to access to do its job, and nothing more? Customer records? Transaction history? Inventory levels? List them. Those are the only read permissions.

What can it write?

Again, not "the same level as the engineer." Specifically: what can the agent change? Update status fields? Append to logs? Create records? Write only what the workflow requires. Everything else stays locked.

What requires a human to confirm?

Some actions are too consequential to execute without a human checking first. Deleting anything. Charging a customer. Modifying system configuration. Triggering external API calls that cannot be undone. Define those boundaries explicitly. The agent can prepare the action. A human must confirm the execution.

What cannot it touch?

This is the hardest one to define because it requires you to think about what could go catastrophically wrong. The agent's permission scope is a box. Everything outside that box is off limits, regardless of what the workflow might theoretically need.

Database backups. Off limits.

Infrastructure configuration. Off limits.

Sensitive PII that the workflow does not actually need. Off limits.

Write it down. Make it testable. Before the agent goes live, run a scenario: "If the agent receives an ambiguous instruction, what is the worst thing it could do?" If the answer is "modify the backup system," then the agent should not have access to the backup system.


The Permission Scope Is a Live Decision

You do not set permissions once and forget about them.

As the agent operates, you will discover edge cases. You will find workflows that almost work but need access to one more data source. You will feel pressure to broaden the scope.

Each time you expand the permission boundary, you expand the blast radius.

Each time you tell yourself, "Just this one more read permission," or "It will only take this one system modification," you are making a trade-off between convenience and incident severity.

Make that trade consciously. Not under deadline pressure. Not as a favor to whoever built the agent.

If the agent's job changes, the permission scope changes. If you add a second agent to the same workflow, the permission scopes compound. You cannot just add agents and let them share access. Each agent needs its own minimal scope. The more agents in the workflow, the more explicitly you need to define what each one can and cannot do.


You Need This Before Deployment

Before an agent executes anything in a live system, you should be able to write down, in plain terms:

"This agent can read X, Y, and Z. It can write to A. It cannot touch B, C, or D. If it needs to do anything outside this scope, a human must confirm first."

If you cannot write that down clearly, the agent is not ready for production.

Not because of the agent. Because your operational design is not done.

The agent will work fine inside a well-defined permission boundary. The agent will cause an incident inside a boundary that is too broad, ambiguous, or inherited from human access patterns.

The difference between a contained incident and a catastrophic one is the permission scope you define before the agent ever touches a live system.

That is an operational design decision, not a security configuration task. It belongs to whoever owns the workflow, and it has to be made deliberately before anything goes wrong.

If you are already operating agents in live systems and have not explicitly defined what they can access, that work needs to happen now. Audit what permissions each agent actually has. Ask yourself: did we design this scope, or did we just give the agent what seemed convenient? Then narrow it to what the workflow actually requires, plus nothing else.


Acrein Group builds and operates agentic workflows inside live business systems. The permission model is designed as a core operational dependency, built before deployment, not patched after an incident forces the issue.

Building, stuck, or ready to scale?

The right conversation at the right moment changes everything. Let's have it.

Talk to us