Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.requesty.ai/llms.txt

Use this file to discover all available pages before exploring further.

Requesty offers two methods to control and limit spending: project-based limits (recommended) and per-API key limits. Choose the method that best fits your organization’s setup.
Configure spend and rate limits in the Requesty Console.
Looking for rate limits? Requesty does not impose its own rate limits on your requests. If you hit a rate limit from an upstream provider (HTTP 429), the best solution is to create a Routing Policy that automatically fails over to another model or provider.
Use this method when: Your team members have access to the Requesty web platform (they have accounts on https://requesty.ai and are part of your organization).

How it works:

  • Each user gets a ‘Private’ project where they can create their own API keys
  • Admins can create shared projects. Regular users cannot create shared projects
  • Organization admins can set spend limits per project, effectively controlling the overall spend per user/project
  • This provides better visibility and control over spending at the user level

Setting up project-based limits:

  1. Go to the Projects Page in your organization dashboard
  2. Select the project you want to limit (or a user’s Private project)
  3. Set the monthly spending limit for that project
  4. All API keys created within that project will be subject to this limit

Per-API Key Spend Limits

Use this method when: Your team members do NOT have access to the Requesty web platform, and you need to distribute API keys directly.

How it works:

  • Organization admins generate API keys and share them with users
  • Each API key has its own monthly spend cap
  • Spending can be monitored via the dashboard or management API endpoints
  • This method is ideal for external integrations or when you don’t want to give users platform access

Setting up per-key limits:

  1. Go to API Keys Page
  2. Create a new API key or edit an existing one
  3. Set a monthly spending limit for that specific API key
  4. Share the API key with the intended user

Monitoring and Management

Both methods allow you to:
  • Monitor spending in real-time through the dashboard
  • Receive alerts when limits are approached
  • Use the Management API to programmatically check usage
  • Adjust limits as needed based on usage patterns

Handling Provider Rate Limits

When an upstream provider (OpenAI, Anthropic, Google, etc.) returns a 429 rate limit error, Requesty can automatically retry with a different model or provider. The solution is to create a Routing Policy.

Option 1: Fallback Policy

Create a Fallback Policy that tries the same model on a different provider, or falls back to an alternative model:
PriorityModelRetries
1stanthropic/claude-sonnet-4-52 retries
2ndbedrock/claude-sonnet-4-5-v2@eu-central-12 retries
3rdopenai/gpt-4.11 retry
If the first model is rate limited, Requesty automatically tries the next one, your application never sees the 429 error.

Option 2: Load Balancing Policy

Spread your traffic across multiple providers to stay under each provider’s rate limits with a Load Balancing Policy:
ModelWeight
anthropic/claude-sonnet-4-550%
bedrock/claude-sonnet-4-5-v2@us-east-150%

Option 3: Latency Routing

Use Latency-Based Routing to automatically pick the fastest available provider, rate-limited providers will have higher latency and be deprioritized.

How to Create a Routing Policy

  1. Go to Routing Policies in the Requesty dashboard
  2. Click Create Policy
  3. Choose your policy type: Fallback, Load Balancing, or Latency
  4. Give it a name (e.g., rate-limit-safe)
  5. Add models, search and select from 300+ models, then drag to reorder
  6. For fallback: set retry counts per model. For load balancing: set weight percentages (must total 100%)
  7. Save the policy
Then use the policy in your API calls by setting the model to policy/your-policy-name:
response = client.chat.completions.create(
    model="policy/rate-limit-safe",  # Use your routing policy
    messages=[{"role": "user", "content": "Hello!"}]
)
You can also create policies scoped to a specific API key from the API Keys page. Organization-wide policies are managed from the Routing Policies page.

Best Practices

  • For internal teams: Use project-based limits to give users autonomy while maintaining control
  • For external partners: Use per-API key limits for simpler distribution and management
  • Set reasonable buffers: Consider setting limits slightly above expected usage to avoid interruptions
  • Regular monitoring: Check usage patterns monthly to optimize limit settings
  • For rate limits: Create fallback policies across multiple providers to maximize throughput
Last modified on May 26, 2026