Messages API Caching

If you’re using the Anthropic SDK directly against Requesty’s /v1/messages endpoint, you can place cache_control breakpoints on individual content blocks — exactly like the native Anthropic prompt caching API. This gives you fine-grained control over which parts of your request are cached. cache_control can be placed on any content block type: system text, user text, images, documents, tool definitions, tool_use, and tool_result.

Python

import anthropic

client = anthropic.Anthropic(
    api_key="YOUR_REQUESTY_API_KEY",
    base_url="https://router.requesty.ai",
)

response = client.messages.create(
    model="anthropic/claude-sonnet-4-20250514",
    max_tokens=4096,
    system=[
        {
            "type": "text",
            "text": "You are a helpful coding assistant.",
            "cache_control": {"type": "ephemeral"}
        },
        {
            "type": "text",
            "text": "Here is the full codebase:\n<codebase>... (large content) ...</codebase>",
            "cache_control": {"type": "ephemeral"}
        }
    ],
    tools=[
        {
            "name": "search_code",
            "description": "Search the codebase for a pattern",
            "input_schema": {
                "type": "object",
                "properties": {
                    "query": {"type": "string"}
                },
                "required": ["query"]
            },
            "cache_control": {"type": "ephemeral"}
        }
    ],
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "How does the authentication middleware work?"
                }
            ]
        }
    ],
)

print(response.content[0].text)

# Cache usage is reported in the response:
print(f"Cache creation tokens: {response.usage.cache_creation_input_tokens}")
print(f"Cache read tokens: {response.usage.cache_read_input_tokens}")

TypeScript

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({
  apiKey: "YOUR_REQUESTY_API_KEY",
  baseURL: "https://router.requesty.ai",
});

const response = await client.messages.create({
  model: "anthropic/claude-sonnet-4-20250514",
  max_tokens: 4096,
  system: [
    {
      type: "text",
      text: "You are a helpful coding assistant.",
      cache_control: { type: "ephemeral" },
    },
    {
      type: "text",
      text: "Here is the full codebase:\n<codebase>... (large content) ...</codebase>",
      cache_control: { type: "ephemeral" },
    },
  ],
  tools: [
    {
      name: "search_code",
      description: "Search the codebase for a pattern",
      input_schema: {
        type: "object" as const,
        properties: {
          query: { type: "string" },
        },
        required: ["query"],
      },
      cache_control: { type: "ephemeral" },
    },
  ],
  messages: [
    {
      role: "user",
      content: [
        {
          type: "text",
          text: "How does the authentication middleware work?",
        },
      ],
    },
  ],
});

console.log(response.content[0]);

Multi-turn with tool results

In agentic flows with tool calls, place cache_control on the last content block of each turn to cache the conversation prefix:

response = client.messages.create(
    model="anthropic/claude-sonnet-4-20250514",
    max_tokens=8192,
    system=[
        {
            "type": "text",
            "text": "You are an AI coding assistant.",
            "cache_control": {"type": "ephemeral"}
        }
    ],
    messages=[
        {"role": "user", "content": "Find all usages of the deprecated API."},
        {
            "role": "assistant",
            "content": [
                {"type": "text", "text": "I'll search for the deprecated API usage."},
                {
                    "type": "tool_use",
                    "id": "toolu_01ABC",
                    "name": "search_code",
                    "input": {"query": "deprecated_api"},
                    "cache_control": {"type": "ephemeral"}
                }
            ]
        },
        {
            "role": "user",
            "content": [
                {
                    "tool_use_id": "toolu_01ABC",
                    "type": "tool_result",
                    "content": "Found 3 usages in src/api.py, src/handler.py, src/utils.py",
                    "cache_control": {"type": "ephemeral"}
                }
            ]
        },
        {"role": "user", "content": "Now refactor all of them to use the new API."}
    ],
)

Best practices

Place breakpoints on the first and last system prompt blocks to cache combinations of system instructions.
Place a breakpoint on the last tool definition so your tool schema is cached.
In multi-turn conversations, place a breakpoint on the last content block of each turn to incrementally cache the conversation history.
The cached prefix must be at least 1,024 tokens for Anthropic (2,048 for Claude 3.5 Haiku). Content shorter than that will not be cached.

🚀 Getting Started

🌟 Features

🏢 Organization

🔗 Integrations

⚡ Frameworks

📡 Inference APIs

🔧 Management APIs

Python

TypeScript

Multi-turn with tool results

Best practices

🚀 Getting Started

🌟 Features

🏢 Organization

🔗 Integrations

⚡ Frameworks

📡 Inference APIs

🔧 Management APIs

Documentation Index

​Python

​TypeScript

​Multi-turn with tool results

​Best practices

Python

TypeScript

Multi-turn with tool results

Best practices