• Define your models and decide how many requests go to each.
  • Set percentages or weights to control the distribution.
  • Send a single request, and let the Requesty router handle the rest.

How It Works

  1. Create a Load Balancing Policy in your Requesty dashboard.
  2. Configure your model distribution with assigned weights (e.g., 50% to Model A, 30% to Model B, 20% to Model C).
  3. Each incoming request gets routed to one of the models based on the distribution you set.

How Does This Help?

  • A/B Testing of model configurations
  • Smarter resource usage
  • Improved overall performance

Get Started

  1. Go to Manage API
  2. Add a Load Balancing Policy (see screenshot).
  3. Specify how you want to split the traffic among your models.

Here’s an example setup:

  • Policy Name:

    Reasoning-experiment

  • Distribution:

    • openai/o1 : 50%

    • openai/o3-mini : 25%

    • deepseek/reasoner : 25%

Double-check that each model in your Load Balancing Policy can handle your specific request parameters (e.g., context length, token limits). If a model is incompatible, requests sent to that model may fail without fallback.