Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.requesty.ai/llms.txt

Use this file to discover all available pages before exploring further.

If you’re using the Anthropic SDK directly against Requesty’s /v1/messages endpoint, you can place cache_control breakpoints on individual content blocks — exactly like the native Anthropic prompt caching API. This gives you fine-grained control over which parts of your request are cached. cache_control can be placed on any content block type: system text, user text, images, documents, tool definitions, tool_use, and tool_result.

Python

import anthropic

client = anthropic.Anthropic(
    api_key="YOUR_REQUESTY_API_KEY",
    base_url="https://router.requesty.ai",
)

response = client.messages.create(
    model="anthropic/claude-sonnet-4-20250514",
    max_tokens=4096,
    system=[
        {
            "type": "text",
            "text": "You are a helpful coding assistant.",
            "cache_control": {"type": "ephemeral"}
        },
        {
            "type": "text",
            "text": "Here is the full codebase:\n<codebase>... (large content) ...</codebase>",
            "cache_control": {"type": "ephemeral"}
        }
    ],
    tools=[
        {
            "name": "search_code",
            "description": "Search the codebase for a pattern",
            "input_schema": {
                "type": "object",
                "properties": {
                    "query": {"type": "string"}
                },
                "required": ["query"]
            },
            "cache_control": {"type": "ephemeral"}
        }
    ],
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "How does the authentication middleware work?"
                }
            ]
        }
    ],
)

print(response.content[0].text)

# Cache usage is reported in the response:
print(f"Cache creation tokens: {response.usage.cache_creation_input_tokens}")
print(f"Cache read tokens: {response.usage.cache_read_input_tokens}")

TypeScript

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({
  apiKey: "YOUR_REQUESTY_API_KEY",
  baseURL: "https://router.requesty.ai",
});

const response = await client.messages.create({
  model: "anthropic/claude-sonnet-4-20250514",
  max_tokens: 4096,
  system: [
    {
      type: "text",
      text: "You are a helpful coding assistant.",
      cache_control: { type: "ephemeral" },
    },
    {
      type: "text",
      text: "Here is the full codebase:\n<codebase>... (large content) ...</codebase>",
      cache_control: { type: "ephemeral" },
    },
  ],
  tools: [
    {
      name: "search_code",
      description: "Search the codebase for a pattern",
      input_schema: {
        type: "object" as const,
        properties: {
          query: { type: "string" },
        },
        required: ["query"],
      },
      cache_control: { type: "ephemeral" },
    },
  ],
  messages: [
    {
      role: "user",
      content: [
        {
          type: "text",
          text: "How does the authentication middleware work?",
        },
      ],
    },
  ],
});

console.log(response.content[0]);

Multi-turn with tool results

In agentic flows with tool calls, place cache_control on the last content block of each turn to cache the conversation prefix:
response = client.messages.create(
    model="anthropic/claude-sonnet-4-20250514",
    max_tokens=8192,
    system=[
        {
            "type": "text",
            "text": "You are an AI coding assistant.",
            "cache_control": {"type": "ephemeral"}
        }
    ],
    messages=[
        {"role": "user", "content": "Find all usages of the deprecated API."},
        {
            "role": "assistant",
            "content": [
                {"type": "text", "text": "I'll search for the deprecated API usage."},
                {
                    "type": "tool_use",
                    "id": "toolu_01ABC",
                    "name": "search_code",
                    "input": {"query": "deprecated_api"},
                    "cache_control": {"type": "ephemeral"}
                }
            ]
        },
        {
            "role": "user",
            "content": [
                {
                    "tool_use_id": "toolu_01ABC",
                    "type": "tool_result",
                    "content": "Found 3 usages in src/api.py, src/handler.py, src/utils.py",
                    "cache_control": {"type": "ephemeral"}
                }
            ]
        },
        {"role": "user", "content": "Now refactor all of them to use the new API."}
    ],
)

Best practices

  • Place breakpoints on the first and last system prompt blocks to cache combinations of system instructions.
  • Place a breakpoint on the last tool definition so your tool schema is cached.
  • In multi-turn conversations, place a breakpoint on the last content block of each turn to incrementally cache the conversation history.
  • The cached prefix must be at least 1,024 tokens for Anthropic (2,048 for Claude 3.5 Haiku). Content shorter than that will not be cached.
Last modified on May 21, 2026