Load Balancing

Define your models and decide how many requests go to each.
Set percentages or weights to control the distribution.
Send a single request, and let the Requesty router handle the rest.

How It Works

Create a Load Balancing Policy in your Requesty dashboard.
Configure your model distribution with assigned weights (e.g., 50% to Model A, 30% to Model B, 20% to Model C).
Each incoming request gets routed to one of the models based on the distribution you set.

How Does This Help?

A/B Testing of model configurations
Smarter resource usage
Improved overall performance

Get Started

Go to the API Keys Page
Add a Load Balancing Policy (see screenshot).
Specify how you want to split the traffic among your models.

Here’s an example setup:

Policy Name: Reasoning-experiment
Distribution:
- openai/o1 : 50%
- openai/o3-mini : 25%
- deepseek/reasoner : 25%

Double-check that each model in your Load Balancing Policy can handle your specific request parameters (e.g., context length, token limits). If a model is incompatible, requests sent to that model may fail without fallback.

Get Started

Features

Enterprise Features

Applications

Frameworks

API Reference

How It Works

How Does This Help?

Get Started

Get Started

Features

Enterprise Features

Applications

Frameworks

API Reference

​How It Works

​How Does This Help?

​Get Started

How It Works

How Does This Help?

Get Started