Arcjet for AI Agents: Securing the Attack Surface Inside LLM Apps

A traditional web application firewall sits at the edge of your network. It inspects HTTP requests before they reach your code, flags injection payloads, blocks known-bad IP ranges, and rate-limits abusive clients. That model held up for a decade because the request was the attack surface — the dangerous thing was always something a user sent you directly.

AI agents break that assumption. When you hand an LLM a set of tools — a file reader, an HTTP client, a database query function — the agent decides at runtime what to do with them. The request that started the session can look completely benign. The damage shows up three or four reasoning steps later: the agent reads a file it shouldn’t, fetches a URL an attacker planted in a document, or runs a tool call that was never in your test plan. A firewall at the edge sees none of it.

Arcjet’s response is to move the guard. Arcjet is a developer security SDK — bot detection, rate limiting, email validation, and shielding against common attacks — that runs as code inside your application instead of as separate infrastructure at the perimeter. Its recent shift extends that same in-process model into the agent’s action loop itself.

The gap is structural, not a tuning problem. An edge WAF inspects what crosses the network boundary once, at the start of a request. An agent’s risky behavior is generated internally, after that request is already inside your trust zone. By the time the agent calls a tool, there is no HTTP request for an edge device to inspect — the call happens in memory, between your code and the action the model chose.

It gets worse because the agent runs with your credentials. It holds your database connection, your API keys, your service-account permissions. An attacker who can influence the agent’s instructions doesn’t need to breach anything — they borrow the agent’s authority. Security engineers call this the confused deputy problem, and agentic apps are full of deputies.

The three actions a guard inside the agent watches

Moving the guard inside means checking the agent’s decisions at the moment it acts, not when the session began. Three action types carry most of the risk.

Prompt injection. Agents read untrusted text constantly — web pages, PDFs, support tickets, code comments, the output of earlier tool calls. Any of it can carry instructions aimed at the model: ignore the current task, exfiltrate a secret, call a tool with attacker-chosen arguments. The model has no reliable way to separate data it should process from instructions it should follow when both arrive as plain text in one context window.

File reads. An agent with filesystem access is one bad instruction away from reading .env, SSH keys, or another tenant’s data. The read is a legitimate capability — you gave it the tool on purpose — so nothing flags it unless something inspects the path before the read runs.

Web fetches. An agent that fetches URLs is a server-side request forgery engine waiting for input. Plant a link to a cloud metadata endpoint such as 169.254.169.254, or to an internal admin panel, and the agent will fetch it from inside your network and hand the response back to whoever is steering it.

A guard inside the agent sits between the decision and the action: before the file read runs, check the path; before the fetch leaves, check the destination; before tool output flows back into the model’s context, scan it for injection patterns. These are deterministic checks — they don’t ask the model to police itself.

Where this fits in your stack today

Because Arcjet runs in-process — across Node.js, Next.js, Bun, and Deno — adding an in-agent guard is a code change, not an infrastructure project. There’s no new proxy to route traffic through and no separate service to operate. The guard is a function call at the point where your agent is about to act.

Treat it as one layer of defense in depth, not a replacement for anything else you run:

Keep your edge WAF. It still handles volumetric attacks, crawler traffic, and classic injection against your public routes.
Scope the agent’s credentials down. A guard is a backstop; an agent that physically cannot reach the production database is safer than one merely asked not to.
Allowlist fetch destinations instead of blocklisting bad ones. You already know the handful of domains your agent legitimately needs.
Treat every tool result as untrusted input, the same way you treat a form submission.

Cursor

If you're building the agent itself, an AI-native editor tightens the loop where these guards get wired in. Cursor keeps the agent code, the tool definitions, and the security checks in one context so you can see exactly where each check sits.

Free tier; Pro at $20/mo

Try Cursor

Affiliate link · We earn a commission at no cost to you.

The autonomy that makes agents useful is the same autonomy that makes them dangerous. Every tool you add widens what a successful injection can reach. Guards at the action boundary won’t make an agent un-hackable, but they move the security decision somewhere you control — your code, with a deterministic answer — instead of leaving it to a model that was never designed to be a security boundary.

FAQ

Does an in-agent guard replace my existing WAF?

No. They cover different surfaces. An edge WAF handles volumetric attacks, bot traffic, and injection attempts against your public HTTP routes. An in-agent guard covers the actions your agent takes after a request is already inside your trust zone — file reads, web fetches, and tool calls the edge never sees. Run both.

Can't the system prompt just tell the model to ignore injected instructions?

It helps, but it is not a control you can rely on. Models cannot cleanly separate trusted instructions from untrusted data when both sit in the same context window, and new injection phrasings keep defeating prompt-based defenses. A deterministic check outside the model — one that inspects the actual file path or URL — does not depend on the model making the right call.

What does checking every tool call cost in latency?

Very little. The checks are local function calls, not network round-trips. The dominant cost in an agent step is LLM inference, which usually runs from a few hundred milliseconds to several seconds. A path or destination check is microseconds against that, so users will not notice the guard.

Arcjet for AI Agents: Securing the Attack Surface Inside LLM Apps

Why a network WAF goes blind on agentic apps

The three actions a guard inside the agent watches

Where this fits in your stack today

Cursor

FAQ

Aider vs Continue.dev: Terminal-First vs Editor-First AI Coding in 2026

AI Code Review Tools Compared: CodeRabbit, Greptile, and Diamond in 2026

Using Claude Code Subagents for Parallel Refactoring: A Hands-On Workflow

Cline vs Roo Code: Comparing Open-Source Agentic Coding Extensions in 2026

How to Build a Skills Library for Your AI Engineering Team

Get the best tools, weekly