> ## Documentation Index
> Fetch the complete documentation index at: https://docs.requesty.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Manual Caching for Anthropic Models

> Fine-grained control over prompt caching with Anthropic using cache_control blocks

The router allows you to manually control prompt caching with Anthropic through `cache_control` blocks in message content.
This allows you to explicitly mark specific portions of your prompts for caching with custom TTL (time-to-live) settings,
giving you precise control over what gets cached and for how long.

<Note>
  **[View cache analytics](https://app.requesty.ai/analytics/cache)** in the Requesty Console.
</Note>

## How Manual Caching Works

Manual caching uses `cache_control` blocks embedded within message content to explicitly specify which portions of your prompt should be cached.
This approach gives you fine-grained control over:

* **What gets cached**: Mark specific content blocks for caching
* **Cache duration**: Set custom TTL (time-to-live) values

## Content Structure with cache\_control

To use manual caching, structure your message content as an array of content blocks, and add the `cache_control` field as desired:

```json theme={"dark"}
{
  "role": "system",
  "content": [
    {
      "type": "text",
      "text": "Your prompt content here...",
      "cache_control": {
        "type": "ephemeral",
        "ttl": "1h"
      }
    }
  ]
}
```

### TTL Field

The `ttl` (time-to-live) field specifies how long the content should remain cached. The default value is `5m`.
You can also set this to `1h` when using Anthropic and Vertex only. Other providers do not support that.

## Example with Manual Caching

This example demonstrates how to use manual caching with the OpenAI Python client to cache a large knowledge base.

### Python

```python theme={"dark"}
import openai

requesty_api_key = "YOUR_REQUESTY_API_KEY"  # Safely load your API key

client = openai.OpenAI(
    api_key=requesty_api_key,
    base_url="https://router.requesty.ai/v1",
)

# Large system prompt to be cached
knowledgebase = "YOUR ENTIRE KNOWLEDGEBASE..."  # Replace with your actual long prompt

# First request - this will cache the system prompt
system_message = {
    "role": "system",
    "content": [
        {
            "type": "text",
            "text": f"Your role is to answer questions based on: {knowledgebase}",
            "cache_control": {"type": "ephemeral", "ttl": "1h"}
        }
    ]
}

response1 = client.chat.completions.create(
    model="anthropic/claude-3-7-sonnet-latest",
    messages=[
        system_message,
        {"role": "user", "content": "What is covered in the knowledge base?"}
    ]
)

print("Response:", response1.choices[0].message.content)
print("Caching tokens (first request):", response1.usage.prompt_tokens_details.caching_tokens)
print("Cached tokens (first request):", response1.usage.prompt_tokens_details.cached_tokens)

# Second request - this will use the cached system prompt
response2 = client.chat.completions.create(
    model="anthropic/claude-3-7-sonnet-latest",
    messages=[
        system_message,
        {"role": "user", "content": "Tell me more details."}
    ]
)

print("Response:", response2.choices[0].message.content)
print("Caching tokens (second request):", response2.usage.prompt_tokens_details.caching_tokens)
print("Cached tokens (second request):", response2.usage.prompt_tokens_details.cached_tokens)
```

### JavaScript

```javascript theme={"dark"}
import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: "YOUR_REQUESTY_API_KEY",
  baseURL: "https://router.requesty.ai/v1",
});

const knowledgebase = "YOUR ENTIRE KNOWLEDGEBASE...";

// First request - caches the system prompt
const systemMessage = {
  role: "system",
  content: [
    {
      type: "text",
      text: `Your role is to answer questions based on: ${knowledgebase}`,
      cache_control: { type: "ephemeral", ttl: "1h" }
    }
  ]
};

const response1 = await client.chat.completions.create({
  model: "anthropic/claude-3-7-sonnet-latest",
  messages: [
    systemMessage,
    { role: "user", content: "What is covered in the knowledge base?" }
  ]
});

console.log("Response:", response1.choices[0].message.content);
console.log("Caching tokens:", response1.usage.prompt_tokens_details.caching_tokens);

// Second request - uses cached system prompt
const response2 = await client.chat.completions.create({
  model: "anthropic/claude-3-7-sonnet-latest",
  messages: [
    systemMessage,
    { role: "user", content: "Tell me more details." }
  ]
});

console.log("Response:", response2.choices[0].message.content);
console.log("Cached tokens:", response2.usage.prompt_tokens_details.cached_tokens);
```
