pickuma.
AI & Dev Tools

Anthropic Splits Agent SDK Billing: What Devs Need to Know About New Credit Pools

Anthropic is moving programmatic Agent SDK traffic to a new monthly credit pool, separate from standard Claude API billing. Here's what to audit in your integration before the split affects forecasting and rate limits.

6 min read

Anthropic is changing how it bills Agent SDK traffic, and if you’ve been treating those costs as just another line on your standard Claude API invoice, that’s about to stop being accurate. The company is separating programmatic Agent SDK usage — the traffic generated by third-party apps built on the SDK — from standard Claude billing, routing it instead into a new monthly credit pool. We pulled apart what’s been announced to figure out which integrations need attention before this lands.

What actually changed

Until now, Agent SDK usage and direct Claude API calls drew from the same meter. You burned tokens, you paid per token, you watched a single number tick up on your dashboard. The new structure splits that meter in two.

Direct API calls — the ones you make from your own backend, with your own auth, hitting Claude endpoints directly — continue to behave the way they always have. Pay-as-you-go token pricing, the rate limits tied to your usage tier, the standard invoice line.

Agent SDK traffic, on the other hand, routes into a separate monthly credit pool. This is the usage generated when third-party apps wrap the SDK and your account is on the hook for the resulting calls. The credits are allocated per month, not per token, and that single change cascades into a different cost model, different rate limit semantics, and different forecasting math.

If your product imports the Agent SDK in production, this is the integration to audit first. Direct API consumers don’t need to do anything yet, but anyone shipping on top of the SDK needs to revisit assumptions.

Why credit pools change your forecasting math

Pay-as-you-go billing is mostly linear: 2x the requests means 2x the bill. Credit pools are not linear. They’re bounded by a monthly allocation, which changes both your ceiling and your burst behavior.

A few things shift in practice:

Burst days eat the pool faster. A single viral day or a noisy production bug can chew through credits that you assumed would last the month. Smoothing across the calendar is no longer something the billing system does for you — it’s something you need to plan for in product behavior.

Forecasting becomes a question about pool fit, not just token spend. The right question stops being “how many tokens will we use next quarter” and becomes “which pool tier do we need, and how much headroom do we want for spikes.” Tier upgrades become a discrete decision point rather than a continuous slope.

Exhaustion behavior matters now. When you run out of pool credits, what does your app do? Return a 429 to your users? Fail open to a fallback model? Queue requests until next month? That was a theoretical question under pay-as-you-go (you’d just pay more); under pool billing, it’s a real one.

Internal attribution gets messier. If multiple features or teams share one Agent SDK pool, you now need to track usage at the feature level to know who’s consuming the budget. Observability that just sums tokens won’t tell you which integration is the cost driver.

What to audit in your integration

Five things worth a pass before this lands:

1. Tag your Agent SDK call sites. Every code path that touches the SDK should be tagged in your observability so you can attribute spend to the right pool. If you’re running both direct API calls and SDK calls from the same service, you want to know which one is which without grepping logs.

2. Re-read your rate limit handling. Pool credits and per-minute rate limits aren’t the same thing, but they interact. Your retry logic should distinguish “rate limited, back off and retry” from “out of credits, fail fast.” Lumping them together leads to retry storms when the pool is dry.

3. Forecast against the pool, not against tokens. Build a small dashboard that shows credits consumed against credits allocated for the month, with a projection line. If your current burn rate trends past 100% before month-end, you want to know on day 10, not day 28.

4. Revisit your unit economics. If you charge customers per request and the cost basis just changed shape, your margin assumptions may not hold. Per-pool pricing rewards predictable workloads and penalizes bursty ones — your pricing should reflect that if it doesn’t already.

5. Update your runbook. What happens at 80% pool usage? At 95%? At exhaustion? The answer should not be “page someone and figure it out.” Pre-decide the throttling, fallback, or upgrade path so the on-call engineer doesn’t have to.

The pattern across all five: treat the credit pool as a finite resource with an SLA, not as a flexible meter that absorbs spikes for free.

Cursor

If you're shipping fast on Agent SDK integrations, a model-aware IDE keeps the iteration loop tight while you reshape your integration around the new billing structure.

$20/mo Pro

Try Cursor

Affiliate link · We earn a commission at no cost to you.

What to check before this affects you

Three concrete steps for this week, regardless of how much SDK traffic you currently run:

  • Pull your last 90 days of usage and split it by call type. If you can’t tell which calls would land in the new pool versus the standard API, that’s the first observability gap to close.
  • Find the Anthropic billing documentation for the new pool structure and read it end to end before your finance team asks. The general shape of the change is public, but the specifics (exact tiers, credit-to-token conversion, overage behavior) live in the official docs.
  • Estimate which pool tier your current usage would land in, and price the next tier up as a planning input. You want to know the cost of headroom before you need it.

Most teams will not be dramatically affected by this in the first month. The teams that get burned are the ones who assume billing structure changes don’t apply to them and discover the difference at quarter-end.

FAQ

Does this affect direct Claude API usage? +
No — direct API calls from your own backend continue to use standard pay-as-you-go token billing and your existing rate limit tier. The new credit pool only covers programmatic Agent SDK traffic from third-party apps.
What happens when the monthly credit pool runs out? +
Pool exhaustion behavior depends on your tier configuration in the official docs, but plan for it as a real failure mode. Your app should handle a 'no credits available' response gracefully rather than treating it as a transient rate limit.
Should I migrate Agent SDK calls back to direct API to avoid the pool? +
Only if your usage is small and predictable enough that pay-as-you-go is cheaper than the smallest pool tier. For most teams shipping production apps on the SDK, the pool is the intended path — the work is forecasting against it, not avoiding it.

Related reading

See all AI & Dev Tools articles →

Get the best tools, weekly

One email every Friday. No spam, unsubscribe anytime.