Skip to main content
Requesty offers two methods to control and limit spending: project-based limits (recommended) and per-API key limits. Choose the method that best fits your organization’s setup.
Looking for rate limits? Requesty does not impose its own rate limits on your requests. If you hit a rate limit from an upstream provider (HTTP 429), the best solution is to create a Routing Policy that automatically fails over to another model or provider.
Use this method when: Your team members have access to the Requesty web platform (they have accounts on https://requesty.ai and are part of your organization).

How it works:

  • Each user gets a ‘Private’ project where they can create their own API keys
  • Admins can create shared projects. Regular users cannot create shared projects
  • Organization admins can set spend limits per project, effectively controlling the overall spend per user/project
  • This provides better visibility and control over spending at the user level

Setting up project-based limits:

  1. Go to the Projects Page in your organization dashboard
  2. Select the project you want to limit (or a user’s Private project)
  3. Set the monthly spending limit for that project
  4. All API keys created within that project will be subject to this limit

Per-API Key Spend Limits

Use this method when: Your team members do NOT have access to the Requesty web platform, and you need to distribute API keys directly.

How it works:

  • Organization admins generate API keys and share them with users
  • Each API key has its own monthly spend cap
  • Spending can be monitored via the dashboard or management API endpoints
  • This method is ideal for external integrations or when you don’t want to give users platform access

Setting up per-key limits:

  1. Go to API Keys Page
  2. Create a new API key or edit an existing one
  3. Set a monthly spending limit for that specific API key
  4. Share the API key with the intended user

Monitoring and Management

Both methods allow you to:
  • Monitor spending in real-time through the dashboard
  • Receive alerts when limits are approached
  • Use the Management API to programmatically check usage
  • Adjust limits as needed based on usage patterns

Handling Provider Rate Limits

When an upstream provider (OpenAI, Anthropic, Google, etc.) returns a 429 rate limit error, Requesty can automatically retry with a different model or provider. The solution is to create a Routing Policy.

Option 1: Fallback Policy

Create a Fallback Policy that tries the same model on a different provider, or falls back to an alternative model:
Policy: rate-limit-safe
├─ anthropic/claude-sonnet-4-5 (2 retries)
├─ bedrock/claude-sonnet-4-5-v2@eu-central-1 (2 retries)
└─ openai/gpt-4.1 (1 retry)
If the first model is rate limited, Requesty automatically tries the next one — your application never sees the 429 error.

Option 2: Load Balancing Policy

Spread your traffic across multiple providers to stay under each provider’s rate limits with a Load Balancing Policy:
Policy: spread-traffic
├─ anthropic/claude-sonnet-4-5: 50%
└─ bedrock/claude-sonnet-4-5-v2@us-east-1: 50%

Option 3: Latency Routing

Use Latency-Based Routing to automatically pick the fastest available provider — rate-limited providers will have higher latency and be deprioritized.

How to Create a Routing Policy

  1. Go to Routing Policies in the Requesty dashboard
  2. Click Create Policy
  3. Choose your policy type: Fallback, Load Balancing, or Latency
  4. Give it a name (e.g., rate-limit-safe)
  5. Add models — search and select from 300+ models, then drag to reorder
  6. For fallback: set retry counts per model. For load balancing: set weight percentages (must total 100%)
  7. Save the policy
Then use the policy in your API calls by setting the model to policy/your-policy-name:
response = client.chat.completions.create(
    model="policy/rate-limit-safe",  # Use your routing policy
    messages=[{"role": "user", "content": "Hello!"}]
)
You can also create policies scoped to a specific API key from the API Keys page. Organization-wide policies are managed from the Routing Policies page.

Best Practices

  • For internal teams: Use project-based limits to give users autonomy while maintaining control
  • For external partners: Use per-API key limits for simpler distribution and management
  • Set reasonable buffers: Consider setting limits slightly above expected usage to avoid interruptions
  • Regular monitoring: Check usage patterns monthly to optimize limit settings
  • For rate limits: Create fallback policies across multiple providers to maximize throughput
Last modified on April 8, 2026