Enable latency routing in the Requesty Console.
How It Works
Benefits
Fastest Responses
Always use the quickest model available.
Automatic Adaptation
Router adjusts when model performance changes.
No Manual Tuning
Latency optimization happens automatically.
Regional Optimization
Automatically prefer nearby endpoints.
Creating a Latency-Based Policy
Create the Policy
Go to Routing Policies, click Create Policy, and select Latency as the policy type.

Select Models
Add your models. For example:
The router will automatically choose whichever is faster at request time.
| Model | Description |
|---|---|
anthropic/claude-sonnet-4-5 | Direct API access |
bedrock/claude-sonnet-4-5@eu-central-1 | Regional endpoint |
How Latency Tracking Works
Requesty measures time-to-first-token (TTFT) for streaming requests and total response time for non-streaming.| Metric | Description |
|---|---|
| Streaming | Time from request sent to first token received |
| Non-streaming | Time from request sent to complete response |
| Scope | Per-model, organization-scoped (your traffic, not global) |
| Window | Rolling average of recent requests (last ~1 hour) |
Models with no recent latency data are tried occasionally to gather performance metrics. After 5 to 10 requests, the router has enough data for optimal routing.
Key Selection Strategies
For each model, you can configure which API key to try first:| Strategy | Description |
|---|---|
| Requesty provided key | Use Requesty’s managed keys only (default) |
| My own key | Use your BYOK credentials only |
| Requesty first, then BYOK | Try Requesty’s key first, fall back to BYOK |
| BYOK first, then Requesty | Try your key first, fall back to Requesty |
Use Cases
Regional Optimization
Regional Optimization
Route to the fastest regional endpoint:
Users in Europe automatically get
| Model | Region |
|---|---|
anthropic/claude-sonnet-4-5 | Global |
bedrock/claude-sonnet-4-5@us-east-1 | US East |
bedrock/claude-sonnet-4-5@eu-central-1 | Europe |
bedrock/claude-sonnet-4-5@ap-southeast-1 | Asia Pacific |
eu-central-1, users in Asia get ap-southeast-1.Provider Performance
Provider Performance
Let the router pick the fastest provider:
If OpenAI is experiencing slowdowns, traffic shifts to Anthropic or Google automatically.
| Model |
|---|
openai/gpt-5.2 |
anthropic/claude-sonnet-4-5 |
google/gemini-2.5-pro |
Cost + Speed Optimization
Cost + Speed Optimization
Combine similar-priced models and route to fastest:
All three are low-cost. Requesty picks whichever responds fastest.
| Model |
|---|
openai/gpt-4o-mini |
anthropic/claude-3-5-haiku |
google/gemini-1.5-flash |
Combining with Other Policies
Latency routing works great with load balancing and fallback.| Combination | How It Works |
|---|---|
| Latency + Load Balancing | Each sub-policy uses latency routing, parent policy does A/B testing |
| Latency + Fallback | Try latency-optimized policy first, fall back to known-good model if all fail |
Monitoring Latency
Open Performance Monitoring
Go to Performance Monitoring.
FAQ
What if a model has no latency data?
What if a model has no latency data?
Models without recent data are assigned max latency. They will be tried occasionally (~5 to 10% of traffic) to gather data. Once they have metrics, they compete fairly.
Does latency routing consider cost?
Does latency routing consider cost?
No. Latency routing only considers speed. If you want cost optimization, use load balancing to prefer cheaper models, or manually order a fallback chain by price.
Can I force a specific model for some requests?
Can I force a specific model for some requests?
Yes. Instead of using the latency policy, pass a direct model name (e.g.,
openai/gpt-5.2) for requests where you need a specific model.How often does latency data update?
How often does latency data update?
Continuously. Latency metrics are updated after every request. The router uses a rolling average of recent requests to smooth out spikes.
What happens if the fastest model fails?
What happens if the fastest model fails?
Latency routing tries models in speed order. If the fastest model fails, it tries the second-fastest, and so on.
Can I see which model was selected?
Can I see which model was selected?
Yes. Check the response headers or request logs in Analytics. You will see which model handled each request.
Unlike load balancing, latency routing does not guarantee the same user gets the same model. If your use case requires consistency, use load balancing with
trace_id instead.