Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.requesty.ai/llms.txt

Use this file to discover all available pages before exploring further.

Monitor the performance of every AI model you use, latency, error rates, throughput, and reliability, all from the Requesty analytics dashboard.
Monitor model performance in the Requesty Console.

Latency Tracking

The General tab shows a real-time latency chart with three views:
MetricWhat it measures
AverageMean response time across all requests
P50Median. 50% of requests are faster than this
P9090th percentile, only 10% of requests are slower
Switch between Average, P50, and P90 using the latency selector on the chart.

What Latency Includes

Total request latency measures the full round-trip: your request hitting Requesty → routed to the provider → model inference → response streamed back. This is the real end-to-end time your users experience.

Advanced Performance Analysis

Use the Advanced tab for deeper analysis:

Latency by Model

  • Set Metric to latency_ms
  • Set Group By to model
  • Set Calculation to P50, P90, P95, or P99
This shows you which models are fastest and which have the worst tail latency.

Latency Over Time

  • Set Time Grouping to hour or day
  • Watch for latency spikes that correlate with peak traffic or provider issues

Error Rate Analysis

  • Set Metric to requests
  • Filter by error status to see failure patterns
  • Group by model or provider to identify unreliable providers

Using Performance Data to Optimize

Set Up Latency-Based Routing

If you see that one provider is consistently faster, create a Latency Routing Policy to automatically use the fastest provider:
Model
anthropic/claude-sonnet-4-5
bedrock/claude-sonnet-4-5-v2@us-east-1
bedrock/claude-sonnet-4-5-v2@eu-central-1
Requesty automatically routes to whichever is fastest at request time.

Set Up Fallback for Reliability

If a provider has high error rates, create a Fallback Policy to automatically retry with another provider:
PriorityModelRetries
1stanthropic/claude-sonnet-4-52 retries
2ndbedrock/claude-sonnet-4-5-v2@eu-central-12 retries

Reduce Latency with Caching

Auto Caching can eliminate latency entirely for repeated requests. Check the Savings tab to see your cache hit rate, cached responses return in single-digit milliseconds.

Use EU Routing for European Users

If your users are in Europe, route through the EU endpoint (https://router.eu.requesty.ai/v1) to reduce network latency by 30-50%.

Export Performance Data

From the Advanced tab:
  1. Set Metric to latency_ms, Calculation to P90, Group By to model
  2. Set time range and grouping
  3. Click Export CSV to download the data

Integration

Last modified on May 26, 2026