Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.requesty.ai/llms.txt

Use this file to discover all available pages before exploring further.

Fallback Policies automatically retry your requests with different models if one fails, ensuring your application stays reliable even when individual providers have issues.
Set up fallback policies in the Requesty Console.

How It Works

1

Request sent to primary model

Your request goes to the primary model first.
2

Automatic failover on failure

If it fails (timeout, rate limit, error, etc.), the router immediately tries the next model in the chain.
3

Transparent response

Your application receives the successful response without knowing about the failures.

Benefits

Higher Success Rates

No more failed requests due to provider issues.

Zero Downtime

Automatic failover without code changes.

Cost Optimization

Start with cheaper models, fall back to premium ones only when needed.

No Stalled Workflows

Your users never see “model unavailable” errors.

Creating a Fallback Policy

1

Create the Policy

Go to Routing Policies, click Create Policy, and select Fallback Chain as the policy type.Create Policy
2

Configure Your Fallback Chain

Set up your models in priority order. For example:
PriorityModelRetries
1stanthropic/claude-sonnet-4-51 retry
2ndbedrock/claude-sonnet-4-5@eu-central-11 retry
The router will try each model in order, retrying the configured number of times before moving to the next.
3

Use the Policy in Your Code

Change your model parameter to reference your policy:
from openai import OpenAI

client = OpenAI(
    base_url="https://router.requesty.ai/v1",
    api_key="your-requesty-api-key"
)

response = client.chat.completions.create(
    model="policy/sonnet",
    messages=[{"role": "user", "content": "Hello!"}]
)
To find your policy reference, go to Routing Policies, click the copy button next to your policy name, and paste it directly into your model parameter.

Use Cases

Start with cheaper models, only use expensive ones if needed:
PriorityModelRetries
1stopenai/gpt-4o-mini2 retries
2ndopenai/gpt-4o1 retry
3rdopenai/gpt-5.21 retry
Distribute across providers for maximum uptime:
PriorityModelRetries
1stopenai/gpt-5.21 retry
2ndanthropic/claude-sonnet-4-51 retry
3rdgoogle/gemini-2.5-pro1 retry
Try regional endpoints before falling back to global:
PriorityModelRetries
1stbedrock/claude-sonnet-4-5@eu-central-12 retries
2ndanthropic/claude-sonnet-4-52 retries

How Retries Work

Each model in the chain can have 0 to 10 retries. The router uses:
MechanismDescription
Exponential backoffWait time increases between retries (500ms → 1s → 2s → 4s)
JitterRandom variation (±10%) to prevent thundering herd
Immediate failoverOn non-retryable errors (invalid request, auth failure)
Make sure all models in your fallback chain support your request parameters (context length, streaming, tool calling, etc.). If a model cannot handle the request, the policy will skip to the next model.

Key Selection (BYOK)

For each model, you can choose which API key to use:
OptionDescription
Requesty provided keyUse Requesty’s managed keys (default)
My own keyUse your BYOK credentials
Requesty first, then mineFallback to BYOK if Requesty key fails
Mine first, then RequestyPrefer BYOK, fallback to Requesty

Monitoring and Debugging

1

Open Analytics

Go to Analytics.
2

Filter by policy

Filter by your policy name to see which models succeeded, failed, and how often fallback occurred.

FAQ

The request returns an error with details about the last model attempted. You will see all the failures in your request logs.
Yes. A fallback policy can reference another policy as one of its fallback options. For example, your second priority could be policy/multi-provider-backup instead of a single model.
No. You only pay for successful requests that return tokens. Failed attempts do not incur costs.
Click the edit icon next to your policy in the Routing Policies page. Changes take effect immediately, no code deployment needed.
Last modified on May 26, 2026