What should I send support for cache token cost spike?

Send timestamp, endpoint path, model slug, exact error text, whether a minimal request works, and whether the dashboard balance changed.

Cache cost spikes Troubleshooting

Direct answer

A short visible prompt can still be expensive when the client sends a large conversation, file context, or cached context behind it. Cache read and write tokens belong to the full request context, not only the words typed into the input box.

Use this page for public troubleshooting only.

Private order, key, and balance details belong in the customer portal or support. Public docs can explain the diagnostic path, not reveal account-specific state.

Error phrases this guide covers

Search tools, logs, and support tickets do not always use the same wording. Treat these phrases as the same troubleshooting family before changing unrelated settings.

cache token spike cache write cost cache read cost short continue expensive unexpected balance drop

Fast check before changing everything

Run the smallest check that isolates the failing layer. If the small request works, the problem is usually the client configuration, hidden context, permissions, or advanced feature path rather than the whole account.

Cost isolation checklist

visible_prompt_cost != total_request_cost
check: input_tokens + output_tokens + cache_read_tokens + cache_write_tokens

Common causes

The user typed a short follow-up such as continue, but the client resent a large prior context or project context.
The app enabled cache creation or cache reads for a long system prompt, tool schema, or file set.
An agent loop retried a large context after a partial failure.
The customer compared cost against visible chat text instead of full input, output, cache read, and cache write fields.

Fix steps

Check whether the client resends full chat history, files, or hidden tool schemas on every message.
Use the cost calculator to estimate input, output, cache read, and cache write separately.
Trim old context, summarize long threads, or start a new small session before retrying.
For production apps, log token usage fields so support can separate visible prompt size from real request size.

Verify before retrying production traffic

Compare the request with and without project files or long chat history.
Check whether the balance drop correlates with cache write or cache read fields rather than output length.
Retry a new empty-session prompt to confirm the base model route is not inherently expensive for short input.

Do not use expensive retry loops as a diagnostic tool.

Use one small request first. Large retries can spend balance, hide the original cause, and create confusing logs.

Diagnostic decision tree

Work through these checks in order. The goal is to isolate the failing layer before editing unrelated settings or sending another expensive request.

Check	Action	Pass result	Fail result
Minimal request	Run the smallest check from this page with the same key, endpoint shape, and one public model slug.	The account and basic route probably work; move to client settings, hidden context, tools, or retries.	Fix auth, base URL, balance, model slug, or current route health before testing advanced features.
Client final URL	Inspect the actual URL or provider profile the client sends, not only the visible settings field.	Continue with request body, model slug, payload size, and feature compatibility checks.	Correct host/base/full-endpoint confusion before changing keys or model families.
Balance movement	Compare dashboard balance before and after one tiny diagnostic request.	If charged and no answer arrives, collect the support packet before retrying large prompts.	If not charged, focus first on request rejection, wrong endpoint, auth, or client-side failure.
Feature isolation	Disable streaming, tools, images, file context, long history, and automation loops for one retry.	Re-enable one feature at a time until the failing layer is identified.	Keep the request small and do not use production retries as the diagnostic method.
Route health	Check Service Status and try a tiny prompt on one nearby public model row if your workflow allows it.	Use a documented fallback only if quality and cost are acceptable.	Wait, switch safely, or contact support with timestamps instead of hammering the failing route.

Prevent it next time

Budget cache-heavy workflows separately from normal chat. Teach users that continue can still carry a large hidden context if the app automatically includes previous conversation, repo files, or tool definitions.

Minimum support packet

Collect these details before opening support. This avoids exposing secrets while giving enough context to match logs and reproduce the public failure path.

Field	Why support needs it
Timestamp	Use UTC or include timezone so logs can be matched accurately.
Endpoint path	Include /v1, /anthropic, or the exact client route shape involved.
Public model slug	Send the customer-facing slug, not a private key, upstream account name, or hidden route.
Exact error text	Include the visible cache token cost spike message and any HTTP status shown by the client.
Minimal request result	State whether the tiny check on this page works with the same key.
Balance movement	State whether balance changed after the failed request or only after retries.
Client and feature flags	Name the tool, SDK, streaming setting, image input, tools, file context, or automation loop involved.

When to contact support

Contact support when a minimal reproducible check still fails, when the dashboard history does not match what your client received, or when usage appears charged but no usable answer reached the client.

Include timestamp, endpoint path, public model slug, exact error wording, and whether the same key works on a minimal request.
Include whether the dashboard balance changed and whether the client retried in the background.
Do not send secrets, full API keys, regulated data, or private production prompts in public support messages.

Open the support bot after collecting the reproducible details.

Use these pages to verify the exact base URL, model slug, billing behavior, service status, or broader troubleshooting route before changing unrelated settings.

Fix cache token cost spikes

Direct answer

Error phrases this guide covers

Fast check before changing everything

Common causes

Fix steps

Verify before retrying production traffic

Diagnostic decision tree

Prevent it next time

Minimum support packet

When to contact support

Continue with the right source

Fix cache token cost spikes

Direct answer

Error phrases this guide covers

Fast check before changing everything

Common causes

Fix steps

Verify before retrying production traffic

Diagnostic decision tree

Prevent it next time

Minimum support packet

When to contact support

Related sources

Continue with the right source