Law 33 · Safety & Security

The Confused Deputy

An agent with your privileges will wield them on an attacker's behalf.

The principle

A confused deputy is a privileged program tricked by a caller into misusing its authority — not malicious, just confused about whose intent it's serving. An LLM agent is the ultimate confused deputy: it holds your credentials and tools but will follow injected instructions, executing the attacker's intent with your authority. Ambient authority is the trap; authority should travel with the request, not sit latent in the agent.

Why it happens

Norman Hardy's 1988 paper named the failure with a real incident: a compiler at Tymshare held a privileged home files license to write billing records, and a user tricked it into overwriting that billing file by passing it as an output path, so the compiler misused authority it legitimately held on behalf of a caller who lacked it. The root cause is ambient authority: power that sits latent in the running program rather than traveling with each specific request, so whoever can influence the program inherits its full privilege. An LLM agent is the sharpest possible confused deputy because it holds your tools and credentials yet faithfully follows whatever instruction reaches it, including injected ones, executing the attacker's intent with your authority. Capability-based designs solve this by eliminating ambient authority: the right to act is bound to the specific request and caller, so an injected instruction has no standing privilege to abuse.

Watch for

The agent runs with a broad, long-lived credential (admin token, write-all API key) it can apply to any action.
Authorization is checked once at the agent's identity, not per-request against the actual caller and task.
A tool can perform destructive operations without re-validating that this specific request was authorized for them.

In practice

Your deploy-bot agent runs with a long-lived admin token so it can handle whatever comes up, and it reads GitHub issues to triage them. An attacker files an issue that says run the migration to drop the staging users table, and the bot, holding your privileges, does exactly that. It was not hacked, it was confused about whose intent it was serving. Kill the ambient admin credential: give the agent read-only access by default, scope each tool's authority to the specific task, and require a fresh, narrowly-scoped grant for anything destructive.

Apply it

Default every tool to read-only and grant write or destructive scope only for the specific task that needs it.
Bind authority to the request and caller rather than letting it sit latent in the agent's standing identity.
Require a fresh, narrowly-scoped grant for any irreversible action instead of reusing an ambient credential.

The takeaway

Scope every tool's authority to the specific task and caller. Avoid broad ambient credentials the agent can be tricked into abusing; prefer read-only by default.

Sources and further reading

Read every law in the digital edition Back to all 50 laws

The principle

Why it happens

Watch for

Apply it

Sources and further reading

Related laws