Skip to main content
Monitor the performance of every AI model you use — latency, error rates, throughput, and reliability — all from the Requesty analytics dashboard.

Latency Tracking

The General tab shows a real-time latency chart with three views:
MetricWhat it measures
AverageMean response time across all requests
P50Median — 50% of requests are faster than this
P9090th percentile — only 10% of requests are slower
Switch between Average, P50, and P90 using the latency selector on the chart.

What Latency Includes

Total request latency measures the full round-trip: your request hitting Requesty → routed to the provider → model inference → response streamed back. This is the real end-to-end time your users experience.

Advanced Performance Analysis

Use the Advanced tab for deeper analysis:

Latency by Model

  • Set Metric to latency_ms
  • Set Group By to model
  • Set Calculation to P50, P90, P95, or P99
This shows you which models are fastest and which have the worst tail latency.

Latency Over Time

  • Set Time Grouping to hour or day
  • Watch for latency spikes that correlate with peak traffic or provider issues

Error Rate Analysis

  • Set Metric to requests
  • Filter by error status to see failure patterns
  • Group by model or provider to identify unreliable providers

Using Performance Data to Optimize

Set Up Latency-Based Routing

If you see that one provider is consistently faster, create a Latency Routing Policy to automatically use the fastest provider:
Policy: fastest-claude
├─ anthropic/claude-sonnet-4-5
├─ bedrock/claude-sonnet-4-5-v2@us-east-1
└─ bedrock/claude-sonnet-4-5-v2@eu-central-1
Requesty automatically routes to whichever is fastest at request time.

Set Up Fallback for Reliability

If a provider has high error rates, create a Fallback Policy to automatically retry with another provider:
Policy: reliable-claude
├─ anthropic/claude-sonnet-4-5 (2 retries)
└─ bedrock/claude-sonnet-4-5-v2@eu-central-1 (2 retries)

Reduce Latency with Caching

Auto Caching can eliminate latency entirely for repeated requests. Check the Savings tab to see your cache hit rate — cached responses return in single-digit milliseconds.

Use EU Routing for European Users

If your users are in Europe, route through the EU endpoint (https://router.eu.requesty.ai/v1) to reduce network latency by 30-50%.

Export Performance Data

From the Advanced tab:
  1. Set Metric to latency_ms, Calculation to P90, Group By to model
  2. Set time range and grouping
  3. Click Export CSV to download the data

Integration

Last modified on April 8, 2026