Direct answer
A short visible prompt can still be expensive when the client sends a large conversation, file context, or cached context behind it. Cache read and write tokens belong to the full request context, not only the words typed into the input box.
Private order, key, and balance details belong in the customer portal or support. Public docs can explain the diagnostic path, not reveal account-specific state.
Error phrases this guide covers
Search tools, logs, and support tickets do not always use the same wording. Treat these phrases as the same troubleshooting family before changing unrelated settings.
Fast check before changing everything
Run the smallest check that isolates the failing layer. If the small request works, the problem is usually the client configuration, hidden context, permissions, or advanced feature path rather than the whole account.
visible_prompt_cost != total_request_cost
check: input_tokens + output_tokens + cache_read_tokens + cache_write_tokens
Common causes
- The user typed a short follow-up such as continue, but the client resent a large prior context or project context.
- The app enabled cache creation or cache reads for a long system prompt, tool schema, or file set.
- An agent loop retried a large context after a partial failure.
- The customer compared cost against visible chat text instead of full input, output, cache read, and cache write fields.
Fix steps
- Check whether the client resends full chat history, files, or hidden tool schemas on every message.
- Use the cost calculator to estimate input, output, cache read, and cache write separately.
- Trim old context, summarize long threads, or start a new small session before retrying.
- For production apps, log token usage fields so support can separate visible prompt size from real request size.
Verify before retrying production traffic
- Compare the request with and without project files or long chat history.
- Check whether the balance drop correlates with cache write or cache read fields rather than output length.
- Retry a new empty-session prompt to confirm the base model route is not inherently expensive for short input.
Use one small request first. Large retries can spend balance, hide the original cause, and create confusing logs.
Diagnostic decision tree
Work through these checks in order. The goal is to isolate the failing layer before editing unrelated settings or sending another expensive request.
| Check | Action | Pass result | Fail result |
|---|---|---|---|
| Minimal request | Run the smallest check from this page with the same key, endpoint shape, and one public model slug. | The account and basic route probably work; move to client settings, hidden context, tools, or retries. | Fix auth, base URL, balance, model slug, or current route health before testing advanced features. |
| Client final URL | Inspect the actual URL or provider profile the client sends, not only the visible settings field. | Continue with request body, model slug, payload size, and feature compatibility checks. | Correct host/base/full-endpoint confusion before changing keys or model families. |
| Balance movement | Compare dashboard balance before and after one tiny diagnostic request. | If charged and no answer arrives, collect the support packet before retrying large prompts. | If not charged, focus first on request rejection, wrong endpoint, auth, or client-side failure. |
| Feature isolation | Disable streaming, tools, images, file context, long history, and automation loops for one retry. | Re-enable one feature at a time until the failing layer is identified. | Keep the request small and do not use production retries as the diagnostic method. |
| Route health | Check Service Status and try a tiny prompt on one nearby public model row if your workflow allows it. | Use a documented fallback only if quality and cost are acceptable. | Wait, switch safely, or contact support with timestamps instead of hammering the failing route. |
Prevent it next time
Budget cache-heavy workflows separately from normal chat. Teach users that continue can still carry a large hidden context if the app automatically includes previous conversation, repo files, or tool definitions.
Minimum support packet
Collect these details before opening support. This avoids exposing secrets while giving enough context to match logs and reproduce the public failure path.
| Field | Why support needs it |
|---|---|
| Timestamp | Use UTC or include timezone so logs can be matched accurately. |
| Endpoint path | Include /v1, /anthropic, or the exact client route shape involved. |
| Public model slug | Send the customer-facing slug, not a private key, upstream account name, or hidden route. |
| Exact error text | Include the visible cache token cost spike message and any HTTP status shown by the client. |
| Minimal request result | State whether the tiny check on this page works with the same key. |
| Balance movement | State whether balance changed after the failed request or only after retries. |
| Client and feature flags | Name the tool, SDK, streaming setting, image input, tools, file context, or automation loop involved. |
When to contact support
Contact support when a minimal reproducible check still fails, when the dashboard history does not match what your client received, or when usage appears charged but no usable answer reached the client.
- Include timestamp, endpoint path, public model slug, exact error wording, and whether the same key works on a minimal request.
- Include whether the dashboard balance changed and whether the client retried in the background.
- Do not send secrets, full API keys, regulated data, or private production prompts in public support messages.
Open the support bot after collecting the reproducible details.
Related sources
Use these pages to verify the exact base URL, model slug, billing behavior, service status, or broader troubleshooting route before changing unrelated settings.