The router provides an auto_cache flag that allows you to explicitly control the caching behavior for your requests on supported providers. This gives you finer-grained control over when a request’s response should be cached or retrieved from cache.

How Auto Cache Works

The auto_cache flag is a boolean parameter that can be sent within a custom requesty field in your request payload.

  • "auto_cache": true: This will instruct the router to attempt to cache the response from the provider. If a similar request has been cached previously, it might be served from the cache (depending on the provider’s caching strategy and TTL).
  • "auto_cache": false: This will instruct the router to bypass any automatic caching logic for this specific request and always fetch a fresh response from the provider.
  • If auto_cache is not provided: The router falls back to a default caching behavior which can depend on the origin of the request (e.g., calls from Cline or Roo Code default to caching).

This flag provides an explicit override to the default caching logic determined by the request origin or other implicit factors.

How to Use Auto Cache

To use the auto_cache flag, include it within the requesty object in your request.

{
  "model": "openai/gpt-4",
  "messages": [{"role": "user", "content": "Tell me a joke."}],
  "requesty": {
    "auto_cache": true
  }
}

Example with Auto Cache

This example demonstrates how to set the auto_cache flag using the OpenAI Python client. The requesty field is passed as an additional parameter.

Python

import openai

requesty_api_key = "YOUR_REQUESTY_API_KEY"  # Safely load your API key

client = openai.OpenAI(
    api_key=requesty_api_key,
    base_url="https://router.requesty.ai/v1",
)

system_prompt = "YOUR ENTIRE KNOWLEDGEBASE"  # Replace this with you actual long prompt

response = client.chat.completions.create(
    model="vertex/anthropic/claude-3-7-sonnet",
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": "What is the capital of France?"}
    ],
    extra_body={
        "requesty": {
            "auto_cache": True
        }
    }
)
print("Response:", response.choices[0].message.content)

Javascript

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: "YOUR_REQUESTY_API_KEY",
  baseURL: "https://router.requesty.ai/v1",
});

// Make request with auto_cache enabled
const response = await client.chat.completions.create({
  model: "anthropic/claude-3-7-sonnet-latest",
  messages: [
    { role: "system", content: "YOUR ENTIRE KNOWLEDGEBASE" },
    { role: "user", content: "What is the capital of France?" }
  ],
  requesty: {
    auto_cache: true
  }
});

console.log("Response:", response.choices[0].message.content);

Important Notes

  1. Explicit Control: auto_cache provides explicit control. true attempts to cache, false prevent caching for providers where cache writes incur extra costs.
  2. Default Behavior: If auto_cache is not specified in the requesty field, the caching behavior reverts to defaults.
  3. Provider Support: This flag is respected by providers/models where cache writes incur extra costs, e.g. Anthropic and Gemini.