Documentation Index
Fetch the complete documentation index at: https://docs.requesty.ai/llms.txt
Use this file to discover all available pages before exploring further.
If you’re using the Anthropic SDK directly against Requesty’s /v1/messages endpoint, you can place cache_control breakpoints on individual content blocks — exactly like the native Anthropic prompt caching API. This gives you fine-grained control over which parts of your request are cached.
cache_control can be placed on any content block type: system text, user text, images, documents, tool definitions, tool_use, and tool_result.
Python
import anthropic
client = anthropic.Anthropic(
api_key="YOUR_REQUESTY_API_KEY",
base_url="https://router.requesty.ai",
)
response = client.messages.create(
model="anthropic/claude-sonnet-4-20250514",
max_tokens=4096,
system=[
{
"type": "text",
"text": "You are a helpful coding assistant.",
"cache_control": {"type": "ephemeral"}
},
{
"type": "text",
"text": "Here is the full codebase:\n<codebase>... (large content) ...</codebase>",
"cache_control": {"type": "ephemeral"}
}
],
tools=[
{
"name": "search_code",
"description": "Search the codebase for a pattern",
"input_schema": {
"type": "object",
"properties": {
"query": {"type": "string"}
},
"required": ["query"]
},
"cache_control": {"type": "ephemeral"}
}
],
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": "How does the authentication middleware work?"
}
]
}
],
)
print(response.content[0].text)
# Cache usage is reported in the response:
print(f"Cache creation tokens: {response.usage.cache_creation_input_tokens}")
print(f"Cache read tokens: {response.usage.cache_read_input_tokens}")
TypeScript
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic({
apiKey: "YOUR_REQUESTY_API_KEY",
baseURL: "https://router.requesty.ai",
});
const response = await client.messages.create({
model: "anthropic/claude-sonnet-4-20250514",
max_tokens: 4096,
system: [
{
type: "text",
text: "You are a helpful coding assistant.",
cache_control: { type: "ephemeral" },
},
{
type: "text",
text: "Here is the full codebase:\n<codebase>... (large content) ...</codebase>",
cache_control: { type: "ephemeral" },
},
],
tools: [
{
name: "search_code",
description: "Search the codebase for a pattern",
input_schema: {
type: "object" as const,
properties: {
query: { type: "string" },
},
required: ["query"],
},
cache_control: { type: "ephemeral" },
},
],
messages: [
{
role: "user",
content: [
{
type: "text",
text: "How does the authentication middleware work?",
},
],
},
],
});
console.log(response.content[0]);
In agentic flows with tool calls, place cache_control on the last content block of each turn to cache the conversation prefix:
response = client.messages.create(
model="anthropic/claude-sonnet-4-20250514",
max_tokens=8192,
system=[
{
"type": "text",
"text": "You are an AI coding assistant.",
"cache_control": {"type": "ephemeral"}
}
],
messages=[
{"role": "user", "content": "Find all usages of the deprecated API."},
{
"role": "assistant",
"content": [
{"type": "text", "text": "I'll search for the deprecated API usage."},
{
"type": "tool_use",
"id": "toolu_01ABC",
"name": "search_code",
"input": {"query": "deprecated_api"},
"cache_control": {"type": "ephemeral"}
}
]
},
{
"role": "user",
"content": [
{
"tool_use_id": "toolu_01ABC",
"type": "tool_result",
"content": "Found 3 usages in src/api.py, src/handler.py, src/utils.py",
"cache_control": {"type": "ephemeral"}
}
]
},
{"role": "user", "content": "Now refactor all of them to use the new API."}
],
)
Best practices
- Place breakpoints on the first and last system prompt blocks to cache combinations of system instructions.
- Place a breakpoint on the last tool definition so your tool schema is cached.
- In multi-turn conversations, place a breakpoint on the last content block of each turn to incrementally cache the conversation history.
- The cached prefix must be at least 1,024 tokens for Anthropic (2,048 for Claude 3.5 Haiku). Content shorter than that will not be cached.