> ## Documentation Index
> Fetch the complete documentation index at: https://docs.requesty.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Messages API Caching

> Fine-grained cache control with Anthropic SDK using cache_control breakpoints on content blocks

If you're using the Anthropic SDK directly against Requesty's `/v1/messages` endpoint, you can place `cache_control` breakpoints on individual content blocks, exactly like the [native Anthropic prompt caching API](https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching). This gives you fine-grained control over which parts of your request are cached.

<Frame caption="You set the cache breakpoint on the latest user message. Everything up to and including it is cached and reused on the next turn.">
  <img src="https://mintcdn.com/requesty/qjPoKXyN196jjWse/images/prompt_caching.png?fit=max&auto=format&n=qjPoKXyN196jjWse&q=85&s=a25b4276e308ce005939d5988a39503c" alt="Prompt caching in a multi-turn conversation: a cache breakpoint is set on the latest user message each turn, and the cached prefix is reused on the following turn." width="1536" height="1024" data-path="images/prompt_caching.png" />
</Frame>

<Note>
  **[View cache analytics](https://app.requesty.ai/analytics/cache)** in the Requesty Console.
</Note>

`cache_control` can be placed on any content block type: **system text**, **user text**, **images**, **documents**, **tool definitions**, **tool\_use**, and **tool\_result**.

## Python

```python theme={"dark"}
import anthropic

client = anthropic.Anthropic(
    api_key="YOUR_REQUESTY_API_KEY",
    base_url="https://router.requesty.ai",
)

response = client.messages.create(
    model="anthropic/claude-sonnet-4-20250514",
    max_tokens=4096,
    system=[
        {
            "type": "text",
            "text": "You are a helpful coding assistant.",
            "cache_control": {"type": "ephemeral"}
        },
        {
            "type": "text",
            "text": "Here is the full codebase:\n<codebase>... (large content) ...</codebase>",
            "cache_control": {"type": "ephemeral"}
        }
    ],
    tools=[
        {
            "name": "search_code",
            "description": "Search the codebase for a pattern",
            "input_schema": {
                "type": "object",
                "properties": {
                    "query": {"type": "string"}
                },
                "required": ["query"]
            },
            "cache_control": {"type": "ephemeral"}
        }
    ],
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "How does the authentication middleware work?"
                }
            ]
        }
    ],
)

print(response.content[0].text)

# Cache usage is reported in the response:
print(f"Cache creation tokens: {response.usage.cache_creation_input_tokens}")
print(f"Cache read tokens: {response.usage.cache_read_input_tokens}")
```

## TypeScript

```typescript theme={"dark"}
import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({
  apiKey: "YOUR_REQUESTY_API_KEY",
  baseURL: "https://router.requesty.ai",
});

const response = await client.messages.create({
  model: "anthropic/claude-sonnet-4-20250514",
  max_tokens: 4096,
  system: [
    {
      type: "text",
      text: "You are a helpful coding assistant.",
      cache_control: { type: "ephemeral" },
    },
    {
      type: "text",
      text: "Here is the full codebase:\n<codebase>... (large content) ...</codebase>",
      cache_control: { type: "ephemeral" },
    },
  ],
  tools: [
    {
      name: "search_code",
      description: "Search the codebase for a pattern",
      input_schema: {
        type: "object" as const,
        properties: {
          query: { type: "string" },
        },
        required: ["query"],
      },
      cache_control: { type: "ephemeral" },
    },
  ],
  messages: [
    {
      role: "user",
      content: [
        {
          type: "text",
          text: "How does the authentication middleware work?",
        },
      ],
    },
  ],
});

console.log(response.content[0]);
```

## Multi-turn with tool results

You set the cache breakpoint on the latest user message. Everything up to and including that message is cached and reused on the next turn, so only the new content below the previous breakpoint is processed.

In agentic flows with tool calls, place `cache_control` on the last content block of each turn to cache the conversation prefix:

```python theme={"dark"}
response = client.messages.create(
    model="anthropic/claude-sonnet-4-20250514",
    max_tokens=8192,
    system=[
        {
            "type": "text",
            "text": "You are an AI coding assistant.",
            "cache_control": {"type": "ephemeral"}
        }
    ],
    messages=[
        {"role": "user", "content": "Find all usages of the deprecated API."},
        {
            "role": "assistant",
            "content": [
                {"type": "text", "text": "I'll search for the deprecated API usage."},
                {
                    "type": "tool_use",
                    "id": "toolu_01ABC",
                    "name": "search_code",
                    "input": {"query": "deprecated_api"},
                    "cache_control": {"type": "ephemeral"}
                }
            ]
        },
        {
            "role": "user",
            "content": [
                {
                    "tool_use_id": "toolu_01ABC",
                    "type": "tool_result",
                    "content": "Found 3 usages in src/api.py, src/handler.py, src/utils.py",
                    "cache_control": {"type": "ephemeral"}
                }
            ]
        },
        {"role": "user", "content": "Now refactor all of them to use the new API."}
    ],
)
```

## Best practices

* Place breakpoints on the **first and last system prompt blocks** to cache combinations of system instructions.
* Place a breakpoint on the **last tool definition** so your tool schema is cached.
* In multi-turn conversations, place a breakpoint on the **last content block of each turn** to incrementally cache the conversation history.
* The cached prefix must be at least 1,024 tokens for Anthropic (2,048 for Claude 3.5 Haiku). Content shorter than that will not be cached.