Documentation Index
Fetch the complete documentation index at: https://docs.requesty.ai/llms.txt
Use this file to discover all available pages before exploring further.
The router allows you to manually control prompt caching with Anthropic through cache_control blocks in message content.
This allows you to explicitly mark specific portions of your prompts for caching with custom TTL (time-to-live) settings,
giving you precise control over what gets cached and for how long.
How Manual Caching Works
Manual caching uses cache_control blocks embedded within message content to explicitly specify which portions of your prompt should be cached.
This approach gives you fine-grained control over:
- What gets cached: Mark specific content blocks for caching
- Cache duration: Set custom TTL (time-to-live) values
Content Structure with cache_control
To use manual caching, structure your message content as an array of content blocks, and add the cache_control field as desired:
{
"role": "system",
"content": [
{
"type": "text",
"text": "Your prompt content here...",
"cache_control": {
"type": "ephemeral",
"ttl": "1h"
}
}
]
}
TTL Field
The ttl (time-to-live) field specifies how long the content should remain cached. The default value is 5m.
You can also set this to 1h when using Anthropic and Vertex only. Other providers do not support that.
Example with Manual Caching
This example demonstrates how to use manual caching with the OpenAI Python client to cache a large knowledge base.
Python
import openai
requesty_api_key = "YOUR_REQUESTY_API_KEY" # Safely load your API key
client = openai.OpenAI(
api_key=requesty_api_key,
base_url="https://router.requesty.ai/v1",
)
# Large system prompt to be cached
knowledgebase = "YOUR ENTIRE KNOWLEDGEBASE..." # Replace with your actual long prompt
# First request - this will cache the system prompt
system_message = {
"role": "system",
"content": [
{
"type": "text",
"text": f"Your role is to answer questions based on: {knowledgebase}",
"cache_control": {"type": "ephemeral", "ttl": "1h"}
}
]
}
response1 = client.chat.completions.create(
model="anthropic/claude-3-7-sonnet-latest",
messages=[
system_message,
{"role": "user", "content": "What is covered in the knowledge base?"}
]
)
print("Response:", response1.choices[0].message.content)
print("Caching tokens (first request):", response1.usage.prompt_tokens_details.caching_tokens)
print("Cached tokens (first request):", response1.usage.prompt_tokens_details.cached_tokens)
# Second request - this will use the cached system prompt
response2 = client.chat.completions.create(
model="anthropic/claude-3-7-sonnet-latest",
messages=[
system_message,
{"role": "user", "content": "Tell me more details."}
]
)
print("Response:", response2.choices[0].message.content)
print("Caching tokens (second request):", response2.usage.prompt_tokens_details.caching_tokens)
print("Cached tokens (second request):", response2.usage.prompt_tokens_details.cached_tokens)
JavaScript
import OpenAI from 'openai';
const client = new OpenAI({
apiKey: "YOUR_REQUESTY_API_KEY",
baseURL: "https://router.requesty.ai/v1",
});
const knowledgebase = "YOUR ENTIRE KNOWLEDGEBASE...";
// First request - caches the system prompt
const systemMessage = {
role: "system",
content: [
{
type: "text",
text: `Your role is to answer questions based on: ${knowledgebase}`,
cache_control: { type: "ephemeral", ttl: "1h" }
}
]
};
const response1 = await client.chat.completions.create({
model: "anthropic/claude-3-7-sonnet-latest",
messages: [
systemMessage,
{ role: "user", content: "What is covered in the knowledge base?" }
]
});
console.log("Response:", response1.choices[0].message.content);
console.log("Caching tokens:", response1.usage.prompt_tokens_details.caching_tokens);
// Second request - uses cached system prompt
const response2 = await client.chat.completions.create({
model: "anthropic/claude-3-7-sonnet-latest",
messages: [
systemMessage,
{ role: "user", content: "Tell me more details." }
]
});
console.log("Response:", response2.choices[0].message.content);
console.log("Cached tokens:", response2.usage.prompt_tokens_details.cached_tokens);