Cost troubleshooting

Fix cache token cost spikes

A short visible prompt can still be expensive when the client sends a large conversation, file context, or cached context behind it. Cache read and write tokens belong to the full request context, not only the words typed into the input box.

cache token spike cache write cost cache read cost short continue expensive unexpected balance drop

Direct answer

A short visible prompt can still be expensive when the client sends a large conversation, file context, or cached context behind it. Cache read and write tokens belong to the full request context, not only the words typed into the input box.

Use this page for public troubleshooting only.

Private order, key, and balance details belong in the customer portal or support. Public docs can explain the diagnostic path, not reveal account-specific state.

Error phrases this guide covers

Search tools, logs, and support tickets do not always use the same wording. Treat these phrases as the same troubleshooting family before changing unrelated settings.

cache token spike cache write cost cache read cost short continue expensive unexpected balance drop

Fast check before changing everything

Run the smallest check that isolates the failing layer. If the small request works, the problem is usually the client configuration, hidden context, permissions, or advanced feature path rather than the whole account.

Cost isolation checklist
visible_prompt_cost != total_request_cost
check: input_tokens + output_tokens + cache_read_tokens + cache_write_tokens

Common causes

  • The user typed a short follow-up such as continue, but the client resent a large prior context or project context.
  • The app enabled cache creation or cache reads for a long system prompt, tool schema, or file set.
  • An agent loop retried a large context after a partial failure.
  • The customer compared cost against visible chat text instead of full input, output, cache read, and cache write fields.

Fix steps

  1. Check whether the client resends full chat history, files, or hidden tool schemas on every message.
  2. Use the cost calculator to estimate input, output, cache read, and cache write separately.
  3. Trim old context, summarize long threads, or start a new small session before retrying.
  4. For production apps, log token usage fields so support can separate visible prompt size from real request size.

Verify before retrying production traffic

  • Compare the request with and without project files or long chat history.
  • Check whether the balance drop correlates with cache write or cache read fields rather than output length.
  • Retry a new empty-session prompt to confirm the base model route is not inherently expensive for short input.
Do not use expensive retry loops as a diagnostic tool.

Use one small request first. Large retries can spend balance, hide the original cause, and create confusing logs.

Diagnostic decision tree

Work through these checks in order. The goal is to isolate the failing layer before editing unrelated settings or sending another expensive request.

Check Action Pass result Fail result
Minimal request Run the smallest check from this page with the same key, endpoint shape, and one public model slug. The account and basic route probably work; move to client settings, hidden context, tools, or retries. Fix auth, base URL, balance, model slug, or current route health before testing advanced features.
Client final URL Inspect the actual URL or provider profile the client sends, not only the visible settings field. Continue with request body, model slug, payload size, and feature compatibility checks. Correct host/base/full-endpoint confusion before changing keys or model families.
Balance movement Compare dashboard balance before and after one tiny diagnostic request. If charged and no answer arrives, collect the support packet before retrying large prompts. If not charged, focus first on request rejection, wrong endpoint, auth, or client-side failure.
Feature isolation Disable streaming, tools, images, file context, long history, and automation loops for one retry. Re-enable one feature at a time until the failing layer is identified. Keep the request small and do not use production retries as the diagnostic method.
Route health Check Service Status and try a tiny prompt on one nearby public model row if your workflow allows it. Use a documented fallback only if quality and cost are acceptable. Wait, switch safely, or contact support with timestamps instead of hammering the failing route.

Prevent it next time

Budget cache-heavy workflows separately from normal chat. Teach users that continue can still carry a large hidden context if the app automatically includes previous conversation, repo files, or tool definitions.

Minimum support packet

Collect these details before opening support. This avoids exposing secrets while giving enough context to match logs and reproduce the public failure path.

Field Why support needs it
Timestamp Use UTC or include timezone so logs can be matched accurately.
Endpoint path Include /v1, /anthropic, or the exact client route shape involved.
Public model slug Send the customer-facing slug, not a private key, upstream account name, or hidden route.
Exact error text Include the visible cache token cost spike message and any HTTP status shown by the client.
Minimal request result State whether the tiny check on this page works with the same key.
Balance movement State whether balance changed after the failed request or only after retries.
Client and feature flags Name the tool, SDK, streaming setting, image input, tools, file context, or automation loop involved.

When to contact support

Contact support when a minimal reproducible check still fails, when the dashboard history does not match what your client received, or when usage appears charged but no usable answer reached the client.

  • Include timestamp, endpoint path, public model slug, exact error wording, and whether the same key works on a minimal request.
  • Include whether the dashboard balance changed and whether the client retried in the background.
  • Do not send secrets, full API keys, regulated data, or private production prompts in public support messages.

Open the support bot after collecting the reproducible details.

Use these pages to verify the exact base URL, model slug, billing behavior, service status, or broader troubleshooting route before changing unrelated settings.

Topic map

Continue with the right source

Open the exact setup, model, billing, and troubleshooting pages instead of guessing configuration values.

docs bills against the customer key balance and stops at zero. Billing, Balance & Cache: How prepaid balance works, how same-key top-ups work, how usage deductions, out-of-balance behavior. pricing LLM API cost calculator The CorvusLLM cost calculator estimates request cost from input, output, cache read. pricing AI API Pricing Tracker The pricing tracker compares official provider API prices with CorvusLLM public prepaid rates and links the model data used fo. docs Set up CorvusLLM without guessing. Overview: The clean start page: base URLs, model overview, environment overview, and where to begin. docs Most CorvusLLM issues are the same four mistakes. Troubleshooting: Clear fixes for wrong base URLs, bad model slugs, out-of-balance errors, delivery questions. docs API base URLs and request paths. API Overview: Base URLs, authentication, request formats, OpenAI-compatible vs Anthropic-native paths. docs Use the canonical customer slug and keep it simple. Models & Slugs: Every customer-facing model with one customer slug, provider family, pricing. docs Choose the path once, then stay consistent. Environment Overview: Every supported environment at a glance: which base URL to use, where to paste the key. landing Claude API Pricing Comparison CorvusLLM lists public Claude-family rows at 35% of tracked official input, output, cache-read. landing GPT API Pricing Comparison CorvusLLM lists public GPT-family rows through an OpenAI-compatible access layer with public prepaid rates derived from tracke. landing GLM API Pricing Comparison CorvusLLM lists public GLM-family rows for buyers who want cost-sensitive API options, but exact row availability. landing AI API Cache Token Pricing Cache-heavy requests can cost very differently from short prompts because cache read and cache write fields may dominate the b. models AI Models The CorvusLLM model catalog directory helps users find current customer-facing model families, public slugs, pricing context. trust Trust Center The Trust Center explains affiliation, data handling, support, refunds, compatibility evidence, pricing methodology.
Browse docs
On this page