Specify how you want to split the traffic among your models.
Here’s an example setup:
Policy Name:Reasoning-experiment
Distribution:
openai/o1 : 50%
openai/o3-mini : 25%
deepseek/reasoner : 25%
Double-check that each model in your Load Balancing Policy can handle your specific request parameters (e.g., context length, token limits). If a model is incompatible, requests sent to that model may fail without fallback.