Features
Load Balancing
Distribute your requests across multiple models
- Define your models and decide how many requests go to each.
- Set percentages or weights to control the distribution.
- Send a single request, and let the Requesty router handle the rest.
How It Works
- Create a Load Balancing Policy in your Requesty dashboard.
- Configure your model distribution with assigned weights (e.g., 50% to Model A, 30% to Model B, 20% to Model C).
- Each incoming request gets routed to one of the models based on the distribution you set.
How Does This Help?
- A/B Testing of model configurations
- Smarter resource usage
- Improved overall performance
Get Started
- Go to Manage API
- Add a Load Balancing Policy (see screenshot).
- Specify how you want to split the traffic among your models.
Here’s an example setup:
-
Policy Name:
Reasoning-experiment
-
Distribution:
-
openai/o1
: 50% -
openai/o3-mini
: 25% -
deepseek/reasoner
: 25%
-
Double-check that each model in your Load Balancing Policy can handle your specific request parameters (e.g., context length, token limits). If a model is incompatible, requests sent to that model may fail without fallback.