# Requesty — Full Documentation > Unified LLM gateway & secure API for 300+ models. Best-in-class observability, caching, failover, guardrails, and cost optimization. Requesty routes, secures, and optimizes your LLM traffic through a single OpenAI-compatible API endpoint. Used by 70,000+ developers, processing 90+ billion tokens daily. --- ## Getting Started Source: https://docs.requesty.ai/quickstart.md > Start routing AI requests through Requesty in under 2 minutes > **Info:** Requesty is a unified gateway for 300+ AI models. Access OpenAI, Anthropic, Google, and more through a single API endpoint with built-in optimization, caching, and cost tracking. ## Quick Start **Create Your Account** Sign up for free at [app.requesty.ai](https://app.requesty.ai/sign-up) to get instant access to the platform. **Generate API Key** Navigate to [API Keys](https://app.requesty.ai/getting-started) and create your first API key with one click. **Update Your Code** Change your base URL to `https://router.requesty.ai/v1` and start routing through Requesty. ## Platform Walkthrough ## Integration Examples ### OpenAI SDK Drop-in replacement for OpenAI - just change the base URL and API key. ### Anthropic SDK Works seamlessly with Anthropic's client library. ### REST API Direct HTTP calls to our unified endpoint. ### Using the OpenAI SDK The easiest way to get started is using the OpenAI SDK with Requesty's endpoint: ```python Python import os import openai requesty_api_key = "YOUR_REQUESTY_API_KEY" # Safely load your API key try: # Initialize OpenAI client client = openai.OpenAI( api_key=requesty_api_key, base_url="https://router.requesty.ai/v1", default_headers: { "HTTP-Referer": "", # Optional "X-Title": "", # Optional }, ) # Example request response = client.chat.completions.create( model="openai/gpt-4o", messages=[{"role": "user", "content": "Hello, who are you?"}] ) # Check if the response is successful if not response.choices: raise Exception("No response choices found.") # Print the result print(response.choices[0].message.content) except openai.OpenAIError as e: print(f"OpenAI API error: {e}") except Exception as e: print(f"An unexpected error occurred: {e}") ``` ```typescript Typescript import OpenAI from 'openai'; const openai = new OpenAI({ base_url: 'https://router.requesty.ai/v1', api_key: '', default_headers: { 'HTTP-Referer': '', // Optional 'X-Title': '', // Optional }, }); async function main() { const response = await openai.chat.completions.create({ model: 'openai/gpt-4o', messages: [ { role: 'User', content: 'Hello, who are you?', }, ], }); console.log(response.choices[0].message); } main(); ``` > **Note:** The `HTTP-Referer` and `X-Title` headers help with analytics and improve your app's discoverability in our platform. ## Why Choose Requesty? Intelligent caching and routing automatically reduce your AI costs Access all major providers through one unified API Automatic failover ensures your app stays online Monitor usage, costs, and performance in one dashboard Automatically route to the best model for each request SOC2 compliant with RBAC, SSO, and audit logs ## Next Steps Browse our catalog of 300+ available models See how to use Requesty with your favorite tools Set up advanced routing and optimization Learn about our comprehensive analytics suite --- ## EU Routing Source: https://docs.requesty.ai/features/eu-routing.md > Route AI requests through Requesty's EU endpoint for GDPR compliance — EU processing, storage, and optional EU-only model inference Route your AI traffic through Requesty's EU infrastructure in **Frankfurt, Germany (AWS `eu-central-1`)**. All processing and storage by Requesty stays in the EU. Combine with EU-only approved models for full end-to-end data residency. ## How It Works There are **two layers** to consider for EU data residency: ```mermaid graph LR A[Your App] --> B[Requesty EU Router
Frankfurt, Germany] B --> C{Model Inference} C --> D[EU Model
e.g. bedrock/@eu-central-1] C --> E[Global Model
e.g. anthropic/claude] style B fill:#4F46E5,color:#fff style D fill:#059669,color:#fff style E fill:#D97706,color:#fff ``` | Layer | What it covers | How to enable | |-------|---------------|---------------| | **Requesty Processing** | Request routing, logging, caching, analytics — all in EU | Use `router.eu.requesty.ai` as your base URL | | **Model Inference** | Where the AI model actually runs | Approve only EU-region models via the Model Library | > **Info:** Using the EU endpoint guarantees that **Requesty's processing and storage** stays in the EU. To also guarantee that **model inference** stays in the EU, you need to approve only EU-region models (see below). ## EU Endpoint | Protocol | EU Endpoint | |----------|-------------| | **OpenAI-compatible** | `https://router.eu.requesty.ai/v1` | | **Anthropic-compatible** | `https://router.eu.requesty.ai` | Same API key, same request format, same features — just swap the base URL. ## Quick Start ### OpenAI SDK (Python) ```python import openai client = openai.OpenAI( api_key="YOUR_REQUESTY_API_KEY", base_url="https://router.eu.requesty.ai/v1", # EU endpoint ) response = client.chat.completions.create( model="anthropic/claude-sonnet-4-5-20250514", messages=[{"role": "user", "content": "Hello from Europe!"}] ) print(response.choices[0].message.content) ``` ### OpenAI SDK (TypeScript) ```typescript import OpenAI from 'openai'; const client = new OpenAI({ apiKey: 'YOUR_REQUESTY_API_KEY', baseURL: 'https://router.eu.requesty.ai/v1', // EU endpoint }); const response = await client.chat.completions.create({ model: 'anthropic/claude-sonnet-4-5-20250514', messages: [{ role: 'user', content: 'Hello from Europe!' }], }); console.log(response.choices[0].message); ``` ### Anthropic SDK ```python import anthropic client = anthropic.Anthropic( api_key="YOUR_REQUESTY_API_KEY", base_url="https://router.eu.requesty.ai", # EU endpoint (no /v1) ) message = client.messages.create( model="anthropic/claude-sonnet-4-5-20250514", max_tokens=1024, messages=[{"role": "user", "content": "Hello from Europe!"}] ) print(message.content[0].text) ``` ### cURL ```bash curl https://router.eu.requesty.ai/v1/chat/completions \ -H "Authorization: Bearer YOUR_REQUESTY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "anthropic/claude-sonnet-4-5-20250514", "messages": [{"role": "user", "content": "Hello from Europe!"}] }' ``` ## Approve EU-Only Models By default, the EU endpoint can route to **any** model — including models hosted outside the EU. The Requesty processing stays in the EU, but the model inference might not. To guarantee that **both** Requesty processing and model inference stay in the EU, restrict your organization to EU-region models only: **Open the Model Library** Go to [Model Library](https://app.requesty.ai/model-library) and switch to **Table** view for the best overview. **Filter by EU Regions** Click the **Regions** filter and select EU regions: **EU**, **FRANCECENTRAL**, **SWEDENCENTRAL**, or any other European region. **Select and Approve** Select all the EU models you want to allow, then click **Add Selected to Approved Models**. This restricts your organization to only use these models. ![Filter and approve EU-only models in the Model Library](/images/approve-eu-models.png) The Model Library shows you all available EU models with their provider, pricing, context window, and capabilities (Vision, Reasoning, Tools, Cache). EU-region models include: - **AWS Bedrock**: Models with `@eu-central-1`, `@eu-west-1`, `@eu-north-1` suffixes (Claude, Kimi, Llama, Mistral) - **Google**: Models with `@europe-west1`, `@europe-west4`, `@europe-central2` suffixes (Gemini) - **Azure**: Models with `@francecentral`, `@swedencentral` suffixes - **Mistral**: Hosted in EU by default > **Warning:** If you enable [Approved Models](/features/approved-models) with only EU-region models, any request using a non-approved model will be rejected. This is by design — it ensures no data leaves the EU for model inference. ## Use with AI Coding Tools ### Claude Code ```bash export ANTHROPIC_BASE_URL=https://router.eu.requesty.ai export ANTHROPIC_API_KEY=YOUR_REQUESTY_API_KEY claude ``` ### Cline / Roo Code In your VS Code settings, set the API base URL to: ``` https://router.eu.requesty.ai/v1 ``` See the [Cline](/integrations/cline) or [Roo Code](/integrations/roo-code) integration guides for full setup. ## Combine with Routing Policies EU routing works with all Requesty features. Create routing policies that stay entirely within EU infrastructure: ### EU Failover Policy Create a [Fallback Policy](/features/fallback-policies) across EU regions: ``` Policy: eu-claude ├─ bedrock/claude-sonnet-4-5-v2@eu-central-1 (2 retries) ├─ bedrock/claude-sonnet-4-5-v2@eu-west-1 (2 retries) └─ bedrock/claude-3-5-haiku@eu-central-1 (1 retry) ``` ### EU Load Balancing Distribute across EU regions with a [Load Balancing Policy](/features/load-balancing-policies): ``` Policy: eu-balanced ├─ bedrock/claude-sonnet-4-5-v2@eu-central-1: 50% └─ bedrock/claude-sonnet-4-5-v2@eu-west-1: 50% ``` ## Other Regional Endpoints Requesty also offers regional endpoints outside the EU: | Region | Endpoint | |--------|----------| | **EU (Frankfurt)** | `https://router.eu.requesty.ai/v1` | | **US (Global)** | `https://router.requesty.ai/v1` | > **Note:** All regional endpoints use the same API key. No additional configuration needed — just change the base URL. ## Compliance Summary | Requirement | How Requesty Covers It | |------------|----------------------| | **GDPR Data Residency** | EU endpoint processes and stores all data in Frankfurt (AWS `eu-central-1`) | | **EU-Only Model Inference** | Approve only EU-region models in the [Model Library](https://app.requesty.ai/model-library) | | **PII Protection** | Enable [Guardrails](/features/guardrails) for automatic PII detection and redaction | | **Access Control** | Use [RBAC](/features/rbac) to restrict who can change model approvals | | **Audit Trail** | Full request logging via [Usage Analytics](/features/usage-analytics) | | **Data Minimization** | Use [Request Metadata](/features/request-metadata) to tag and track data flows | --- ## Fallback Policies Source: https://docs.requesty.ai/features/fallback-policies.md > Automatic failover between models for maximum reliability Fallback Policies automatically retry your requests with different models if one fails, ensuring your application stays reliable even when individual providers have issues. ## How It Works 1. Your request goes to the **primary model** first 2. If it fails (timeout, rate limit, error, etc.), the router **immediately tries the next model** 3. This continues down the chain until a model successfully responds 4. Your application receives the successful response without knowing about the failures ## Benefits - **Higher success rates** - No more failed requests due to provider issues - **Zero downtime** - Automatic failover without code changes - **Cost optimization** - Start with cheaper models, fall back to premium ones only when needed - **No stalled workflows** - Your users never see "model unavailable" errors ## Creating a Fallback Policy ### Step 1: Create the Policy 1. Go to [Routing Policies](https://app.requesty.ai/routing-policies) 2. Click "**Create Policy**" 3. Select "**Fallback Chain**" as the policy type ![Create Policy](/images/create_policy.png) ### Step 2: Configure Your Fallback Chain **Example Setup:** - **Policy Name:** `sonnet` - **Fallback Chain:** - `anthropic/claude-sonnet-4-5` (1 retry) - `bedrock/claude-sonnet-4-5@eu-central-1` (1 retry) Each model can have multiple retries. The router will: 1. Try `anthropic/claude-sonnet-4-5` once 2. If it fails, retry `anthropic/claude-sonnet-4-5` one more time 3. If still failing, move to `bedrock/claude-sonnet-4-5@eu-central-1` and try twice 4. Continue down the chain until success ### Step 3: Use the Policy in Your Code **This is the critical step:** You need to change your `model` parameter to reference your policy. After creating a policy named `sonnet`, you'll see it in your models list as: ``` policy/sonnet ``` **Update your code to use this model identifier:** ```python Python from openai import OpenAI client = OpenAI( base_url="https://router.requesty.ai/v1", api_key="your-requesty-api-key" ) response = client.chat.completions.create( model="policy/sonnet", # ← Use your policy name here messages=[{"role": "user", "content": "Hello!"}] ) ``` ```typescript TypeScript const client = new OpenAI({ baseURL: 'https://router.requesty.ai/v1', apiKey: 'your-requesty-api-key' }); const response = await client.chat.completions.create({ model: 'policy/sonnet', // ← Use your policy name here messages: [{ role: 'user', content: 'Hello!' }] }); ``` ```bash cURL curl https://router.requesty.ai/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer your-requesty-api-key" \ -d '{ "model": "policy/sonnet", "messages": [{"role": "user", "content": "Hello!"}] }' ``` > **Info:** **How to find your policy reference:** 1. Go to your [Routing Policies](https://app.requesty.ai/routing-policies) 2. Click the **copy button** next to your policy name 3. Paste it directly into your `model` parameter ## Use Cases ### Cost-Effective GPT Chain Start with cheaper models, only use expensive ones if needed: ``` Policy: cost-effective-gpt ├─ openai/gpt-4o-mini (2 retries) ├─ openai/gpt-4o (1 retry) └─ openai/gpt-5.2 (1 retry) ``` ### Multi-Provider Reliability Distribute across providers for maximum uptime: ``` Policy: multi-provider-safe ├─ openai/gpt-5.2 (1 retry) ├─ anthropic/claude-sonnet-4-5 (1 retry) └─ google/gemini-2.5-pro (1 retry) ``` ### Regional Failover Try regional endpoints before falling back to global: ``` Policy: regional-claude ├─ bedrock/claude-sonnet-4-5@eu-central-1 (2 retries) └─ anthropic/claude-sonnet-4-5 (2 retries) ``` ## How Retries Work Each model in the chain can have **0-10 retries**. The router uses: - **Exponential backoff** - Wait time increases between retries (500ms → 1s → 2s → 4s) - **Jitter** - Random variation (±10%) to prevent thundering herd - **Immediate failover** - On non-retryable errors (invalid request, auth failure) > **Warning:** **Model Compatibility:** Make sure all models in your fallback chain support your request parameters (context length, features like streaming, tool calling, etc.). If a model can't handle the request, the policy will skip to the next model without warning. ## Key Selection (BYOK) For each model, you can choose which API key to use: - **Requesty provided key** - Use Requesty's managed keys (default) - **My own key** - Use your Bring-Your-Own-Key (BYOK) credentials - **Try Requesty provided key first, then use my own** - Fallback to BYOK if Requesty key fails - **Try my own key first, then Requesty's** - Prefer BYOK, fallback to Requesty ## Monitoring & Debugging Track your fallback policy performance: 1. Go to [Analytics](https://app.requesty.ai/analytics) 2. Filter by your policy name 3. See which models succeeded, failed, and how often fallback occurred ## FAQ The request returns an error with details about the last model attempted. You'll see all the failures in your request logs. Yes! A fallback policy can reference another policy as one of its fallback options. For example: ``` Policy A (fallback): ├─ openai/gpt-4 └─ policy/multi-provider-backup ← Another policy ``` No. You only pay for successful requests that return tokens. Failed attempts don't incur costs. Click the edit icon next to your policy in the [Routing Policies](https://app.requesty.ai/routing-policies) page. Changes take effect immediately - no code deployment needed. --- ## Load Balancing Policies Source: https://docs.requesty.ai/features/load-balancing-policies.md > Distribute traffic across models with weighted routing Load Balancing Policies distribute your requests across multiple models based on weights you define. Perfect for A/B testing, gradual rollouts, and resource optimization. ## How It Works 1. You assign **weights** to each model (e.g., 70%, 20%, 10%) 2. Each incoming request is **consistently routed** to one model based on the distribution 3. Requests with the same `trace_id` or `user_id` always go to the **same model** (consistency guaranteed) ## Benefits - **A/B Testing** - Compare model performance with real traffic - **Gradual Rollouts** - Send 10% to new model, 90% to stable model - **Cost Optimization** - Route most traffic to cheaper models - **Consistent Experiences** - Same user always gets same model (maintains conversation context) - **Policy Rollouts** - Load balance between entire routing policies, not just models ## Creating a Load Balancing Policy ### Step 1: Create the Policy 1. Go to [Routing Policies](https://app.requesty.ai/routing-policies) 2. Click "**Create Policy**" 3. Select "**Load Balancing**" as the policy type ![Load Balancing Policy](/images/load_balancing.png) ### Step 2: Configure Weights **Example Setup:** - **Policy Name:** `sonnet-distribution` - **Load Balancing:** - `anthropic/claude-sonnet-4-5`: **50%** (weight: 50) - `bedrock/claude-sonnet-4-5@eu-central-1`: **50%** (weight: 50) The total weights must add up to 100% (you can use any numbers - they're normalized). ### Step 3: Use the Policy in Your Code After creating the policy, reference it with `policy/your-policy-name`: ```python Python from openai import OpenAI client = OpenAI( base_url="https://router.requesty.ai/v1", api_key="your-requesty-api-key" ) response = client.chat.completions.create( model="policy/sonnet-distribution", # ← Your load balancing policy messages=[{"role": "user", "content": "Hello!"}] ) ``` ```typescript TypeScript const client = new OpenAI({ baseURL: 'https://router.requesty.ai/v1', apiKey: 'your-requesty-api-key' }); const response = await client.chat.completions.create({ model: 'policy/sonnet-distribution', // ← Your load balancing policy messages: [{ role: 'user', content: 'Hello!' }] }); ``` ```bash cURL curl https://router.requesty.ai/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer your-requesty-api-key" \ -d '{ "model": "policy/sonnet-distribution", "messages": [{"role": "user", "content": "Hello!"}] }' ``` ## Consistency Guarantee Load balancing uses **deterministic hashing** to ensure the same user always gets the same model: - **With `trace_id`**: All requests with the same `trace_id` route to the same model - **Without `trace_id`**: Requesty generates a unique `request_id` for each request This means: - ✅ **Multi-turn conversations** stay on the same model (preserves context) - ✅ **User sessions** get consistent behavior - ✅ **A/B test groups** are stable ### Maintaining Consistency Across Requests To keep a user on the same model across multiple requests, pass a `trace_id`: ```python Python # All requests with the same trace_id go to the same model response = client.chat.completions.create( model="policy/sonnet-distribution", messages=[{"role": "user", "content": "Hello!"}], extra_body={ "requesty": { "trace_id": "user-12345" # ← Same user, same model } } ) ``` ```typescript TypeScript // All requests with the same trace_id go to the same model const response = await client.chat.completions.create({ model: 'policy/sonnet-distribution', messages: [{ role: 'user', content: 'Hello!' }], extra_body: { requesty: { trace_id: 'user-12345' // ← Same user, same model } } }); ``` > **Info:** **Pro Tip:** Use your internal user ID as the `trace_id` to ensure each user gets a consistent model experience while still benefiting from A/B testing. ## Load Balancing Between Policies You can **load balance between entire routing policies**, not just individual models. This is powerful for: - **Canary deployments** of policy changes - **A/B testing different routing strategies** - **Gradual migration** from one policy to another ### Example: Policy Rollout Let's say you have two fallback policies: **Policy A (stable):** ``` policy/production-fallback ├─ openai/gpt-5.2 └─ anthropic/claude-sonnet-4-5 ``` **Policy B (experimental):** ``` policy/experimental-fallback ├─ google/gemini-2.5-pro └─ openai/gpt-5.2 ``` Create a load balancing policy to send **20% to experimental, 80% to stable:** ``` policy/gradual-rollout (Load Balancing) ├─ policy/production-fallback: 80% └─ policy/experimental-fallback: 20% ``` Now use `policy/gradual-rollout` in your code. As you gain confidence, adjust the weights to 50/50, then 0/100. > **Warning:** When load balancing between policies, each policy must be compatible with your request parameters. For example, don't mix embedding policies with chat completion policies. ## Use Cases ### A/B Testing New Models Compare GPT-5.2 vs Gemini 2.5 Pro on real traffic: ``` Policy: ab-test-frontier ├─ openai/gpt-5.2: 50% └─ google/gemini-2.5-pro: 50% ``` Track performance in [Analytics](https://app.requesty.ai/analytics) and see which model performs better. ### Gradual Model Rollout Carefully introduce a new model: ``` Policy: careful-rollout ├─ openai/gpt-4o: 90% ← Stable, proven └─ openai/gpt-5.2: 10% ← New, testing ``` Increase the weight of `gpt-5.2` as you validate quality. ### Cost-Optimized Distribution Route most traffic to cheaper models, some to premium: ``` Policy: cost-optimized ├─ openai/gpt-4o-mini: 70% ├─ openai/gpt-4o: 20% └─ openai/gpt-5.2: 10% ``` ### Multi-Provider Redundancy Distribute across providers for resilience: ``` Policy: multi-provider ├─ openai/gpt-5.2: 40% ├─ anthropic/claude-sonnet-4-5: 40% └─ google/gemini-2.5-pro: 20% ``` ## Key Selection (BYOK) For each model in your load balancing policy, you can choose: - **Requesty provided key** - Use Requesty's managed keys (default) - **My own key** - Use your BYOK credentials ## Monitoring & Analytics Track your load balancing performance: 1. Go to [Analytics](https://app.requesty.ai/analytics) 2. Filter by your policy name 3. See the **actual distribution** of requests across models 4. Compare **latency, cost, and success rates** between models The distribution should match your configured weights (±2% variance is normal due to caching). ## FAQ Requesty uses the **xxhash algorithm** on your `trace_id` (or `request_id` if no trace_id) to deterministically select a model. The same ID always produces the same hash, which maps to the same model. Changing weights will **re-distribute** traffic. Some users may switch to different models. If you need stability, avoid changing weights frequently, or use separate policies for stable vs experimental traffic. Yes! Create a load balancing policy that points to **fallback policies**: ``` policy/lb-with-fallback (Load Balancing) ├─ policy/openai-fallback: 50% └─ policy/anthropic-fallback: 50% ``` This gives you both load balancing AND automatic failover. Yes. All models in a load balancing policy should support the same request format and features. Don't mix chat models with embedding models, or models with different context lengths. Use a stable `trace_id` (like user ID). With 100+ unique users, the distribution will converge to your configured weights (e.g., 20%). With small sample sizes, expect ±5% variance. --- ## Latency-Based Routing Source: https://docs.requesty.ai/features/latency-routing.md > Automatically route to the fastest available model Latency-Based Routing automatically selects the **fastest model** for each request based on real-time performance data. Requesty continuously monitors response times and routes to the lowest-latency option. ## How It Works 1. Requesty **tracks latency** for every model in your policy 2. When a request arrives, the router **sorts models by speed** (fastest first) 3. Your request goes to the **currently fastest model** 4. Latency data updates in real-time based on recent performance ## Benefits - **Fastest responses** - Always use the quickest model available - **Automatic adaptation** - Router adjusts when model performance changes - **No manual tuning** - Latency optimization happens automatically - **Regional optimization** - Automatically prefer nearby endpoints ## Creating a Latency-Based Policy ### Step 1: Create the Policy 1. Go to [Routing Policies](https://app.requesty.ai/routing-policies) 2. Click "**Create Policy**" 3. Select "**Latency**" as the policy type ![Latency Routing Policy](/images/latency.png) ### Step 2: Select Models **Example Setup:** - **Policy Name:** `fastest-sonnet` - **Models:** - `anthropic/claude-sonnet-4-5` - `bedrock/claude-sonnet-4-5@eu-central-1` The router will automatically choose whichever is faster at request time. ### Step 3: Use the Policy in Your Code Reference the policy in your model parameter: ```python Python from openai import OpenAI client = OpenAI( base_url="https://router.requesty.ai/v1", api_key="your-requesty-api-key" ) response = client.chat.completions.create( model="policy/fastest-sonnet", # ← Automatically uses fastest model messages=[{"role": "user", "content": "Hello!"}] ) ``` ```typescript TypeScript const client = new OpenAI({ baseURL: 'https://router.requesty.ai/v1', apiKey: 'your-requesty-api-key' }); const response = await client.chat.completions.create({ model: 'policy/fastest-sonnet', // ← Automatically uses fastest model messages: [{ role: 'user', content: 'Hello!' }] }); ``` ```bash cURL curl https://router.requesty.ai/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer your-requesty-api-key" \ -d '{ "model": "policy/fastest-sonnet", "messages": [{"role": "user", "content": "Hello!"}] }' ``` ## How Latency Tracking Works Requesty measures **time-to-first-token** (TTFT) for streaming requests and **total response time** for non-streaming: - **Streaming**: Time from request sent → first token received - **Non-streaming**: Time from request sent → complete response received Latency data is: - **Per-model** - Each model tracked independently - **Rolling window** - Based on recent requests (last ~1 hour) - **Organization-scoped** - Your traffic patterns, not global averages > **Info:** **Cold Start Behavior:** Models with no recent latency data are **tried occasionally** to gather performance metrics. After 5-10 requests, the router has enough data for optimal routing. ## Key Selection Strategies For each model, you can configure **which API key to try first**: ### Requesty Provided Key (Default) Use Requesty's managed keys only. ### My Own Key Use your BYOK credentials only. ### Requesty First, Then BYOK Try Requesty's key first. If it's slower or unavailable, try your BYOK. ### BYOK First, Then Requesty Try your BYOK first. If it's slower or unavailable, try Requesty's key. **Example:** If `anthropic/claude-sonnet-4-5` with Requesty key is faster than with BYOK, the policy will automatically prefer Requesty's key. ## Use Cases ### Regional Optimization Route to the fastest regional endpoint: ``` Policy: regional-claude ├─ anthropic/claude-sonnet-4-5 (global) ├─ bedrock/claude-sonnet-4-5@us-east-1 ├─ bedrock/claude-sonnet-4-5@eu-central-1 └─ bedrock/claude-sonnet-4-5@ap-southeast-1 ``` Users in Europe automatically get `eu-central-1`, users in Asia get `ap-southeast-1`. ### Provider Performance Let the router pick the fastest provider: ``` Policy: fastest-frontier ├─ openai/gpt-5.2 ├─ anthropic/claude-sonnet-4-5 └─ google/gemini-2.5-pro ``` If OpenAI is experiencing slowdowns, traffic shifts to Anthropic or Google automatically. ### Cost + Speed Optimization Combine similar-priced models and route to fastest: ``` Policy: fast-and-cheap ├─ openai/gpt-4o-mini ├─ anthropic/claude-3-5-haiku └─ google/gemini-1.5-flash ``` All three are low-cost. Requesty picks whichever responds fastest. ## Combining with Other Policies Latency routing works great with **load balancing** and **fallback**: ### Latency + Load Balancing ``` Policy: lb-to-latency (Load Balancing) ├─ policy/fastest-openai: 50% └─ policy/fastest-anthropic: 50% ``` Each sub-policy uses latency routing, parent policy does A/B testing. ### Latency + Fallback ``` Policy: fast-with-fallback (Fallback) ├─ policy/fastest-frontier ← Latency-based └─ openai/gpt-4o ← Stable fallback ``` Try latency-optimized policy first, fall back to known-good model if all fail. ## Monitoring Latency Track which models are fastest for your traffic: 1. Go to [Performance Monitoring](https://app.requesty.ai/analytics) 2. View **time-to-first-token** and **total latency** by model 3. See how latency routing distributes traffic You'll see traffic automatically shift to faster models over time. ## FAQ Models without recent data are assigned **max latency** (infinite). They'll be tried occasionally (~5-10% of traffic) to gather data. Once they have metrics, they compete fairly. No. Latency routing **only** considers speed. If you want cost optimization, use **load balancing** to prefer cheaper models, or manually order a fallback chain by price. Yes. Instead of using the latency policy, pass a direct model name (e.g., `openai/gpt-5.2`) for requests where you need a specific model. Continuously. Latency metrics are updated after every request. The router uses a **rolling average** of recent requests (last ~1 hour) to smooth out spikes. Latency routing tries models in **speed order**. If the fastest model fails, it tries the second-fastest, and so on. This is different from fallback policies where order is manually configured. Yes! Check the response headers or request logs in [Analytics](https://app.requesty.ai/analytics). You'll see which model handled each request. ## Technical Details ### Latency Calculation ``` Latency Score = Weighted Average([recent requests]) - Streaming: Time to first token - Non-streaming: Total response time - Window: Last ~100 requests or 60 minutes ``` ### Sorting Algorithm ``` 1. Fetch latency data for all models in policy 2. Sort models by latency (ascending) 3. Models without data → bottom of list 4. Route to lowest latency model 5. If that fails, try next lowest latency ``` ### Consistency Unlike load balancing, latency routing does **not** guarantee the same user gets the same model. If your use case requires consistency, use **load balancing** with `trace_id` instead. --- ## Auto Caching Source: https://docs.requesty.ai/features/auto-caching.md > Reduce costs by up to 90% with automatic prompt caching — works with Anthropic and Gemini models out of the box Requesty's auto caching automatically caches long system prompts and repeated content to reduce costs on providers that support prompt caching (Anthropic, Gemini). This is especially effective for applications with large knowledge bases or system prompts — cache hits are billed at a fraction of the normal input token cost. The router provides an `auto_cache` flag that allows you to explicitly control the caching behavior for your requests on supported providers. ## How Auto Cache Works The `auto_cache` flag is a boolean parameter that can be sent within a custom `requesty` field in your request payload. * **`"auto_cache": true`**: This will instruct the router to attempt to cache the response from the provider. If a similar request has been cached previously, it might be served from the cache (depending on the provider's caching strategy and TTL). * **`"auto_cache": false`**: This will instruct the router to bypass any automatic caching logic for this specific request and always fetch a fresh response from the provider. * **If `auto_cache` is not provided**: The router falls back to a default caching behavior which can depend on the origin of the request (e.g., calls from Cline or Roo Code default to caching). This flag provides an explicit override to the default caching logic determined by the request origin or other implicit factors. ## How to Use Auto Cache To use the `auto_cache` flag, include it within the `requesty` object in your request. ```json { "model": "openai/gpt-4", "messages": [{"role": "user", "content": "Tell me a joke."}], "requesty": { "auto_cache": true } } ``` ## Example with Auto Cache This example demonstrates how to set the `auto_cache` flag using the OpenAI Python client. The `requesty` field is passed as an additional parameter. ### Python ```python requesty_api_key = "YOUR_REQUESTY_API_KEY" # Safely load your API key client = openai.OpenAI( api_key=requesty_api_key, base_url="https://router.requesty.ai/v1", ) system_prompt = "YOUR ENTIRE KNOWLEDGEBASE" # Replace this with you actual long prompt response = client.chat.completions.create( model="vertex/anthropic/claude-3-7-sonnet", messages=[ {"role": "system", "content": system_prompt}, {"role": "user", "content": "What is the capital of France?"} ], extra_body={ "requesty": { "auto_cache": True } } ) print("Response:", response.choices[0].message.content) ``` ### Javascript ```javascript const client = new OpenAI({ apiKey: "YOUR_REQUESTY_API_KEY", baseURL: "https://router.requesty.ai/v1", }); // Make request with auto_cache enabled const response = await client.chat.completions.create({ model: "anthropic/claude-3-7-sonnet-latest", messages: [ { role: "system", content: "YOUR ENTIRE KNOWLEDGEBASE" }, { role: "user", content: "What is the capital of France?" } ], requesty: { auto_cache: true } }); console.log("Response:", response.choices[0].message.content); ``` ## Important Notes 1. **Explicit Control**: `auto_cache` provides explicit control. `true` attempts to cache, `false` prevents caching for providers where cache writes incur extra costs. 2. **Default Behavior**: If `auto_cache` is not specified in the `requesty` field, the caching behavior reverts to defaults. 3. **Provider Support**: This flag is respected by providers/models where cache writes incur extra costs, e.g. Anthropic and Gemini. ## Managed Caching If you want Requesty to manage caching on your behalf — including custom TTL, cache warming, or advanced caching strategies — reach out to [support@requesty.ai](mailto:support@requesty.ai) and we'll help you set it up. --- ## Spend Limits & Rate Limits Source: https://docs.requesty.ai/features/api-limits.md > Control spending with project or API key limits, and handle provider rate limits with routing policies Requesty offers two methods to control and limit spending: **project-based limits** (recommended) and **per-API key limits**. Choose the method that best fits your organization's setup. > **Info:** **Looking for rate limits?** Requesty does not impose its own rate limits on your requests. If you hit a rate limit from an upstream provider (HTTP 429), the best solution is to create a [Routing Policy](#handling-provider-rate-limits) that automatically fails over to another model or provider. ## Project-Based Spend Limits (Recommended) **Use this method when:** Your team members have access to the Requesty web platform (they have accounts on https://requesty.ai and are part of your organization). ### How it works: - Each user gets a 'Private' project where they can create their own API keys - Admins can create shared projects. Regular users cannot create shared projects - Organization admins can set spend limits per project, effectively controlling the overall spend per user/project - This provides better visibility and control over spending at the user level ### Setting up project-based limits: 1. Go to the [Projects Page](https://app.requesty.ai/projects) in your organization dashboard 2. Select the project you want to limit (or a user's Private project) 3. Set the monthly spending limit for that project 4. All API keys created within that project will be subject to this limit ![](/images/project-limit.png) ## Per-API Key Spend Limits **Use this method when:** Your team members do NOT have access to the Requesty web platform, and you need to distribute API keys directly. ### How it works: - Organization admins generate API keys and share them with users - Each API key has its own monthly spend cap - Spending can be monitored via the dashboard or management API endpoints - This method is ideal for external integrations or when you don't want to give users platform access ### Setting up per-key limits: 1. Go to [API Keys Page](https://app.requesty.ai/api-keys) 2. Create a new API key or edit an existing one 3. Set a monthly spending limit for that specific API key 4. Share the API key with the intended user ![](/images/api-key-limit.png) ## Monitoring and Management Both methods allow you to: - Monitor spending in real-time through the dashboard - Receive alerts when limits are approached - Use the [Management API](/features/key-management-api) to programmatically check usage - Adjust limits as needed based on usage patterns ## Handling Provider Rate Limits When an upstream provider (OpenAI, Anthropic, Google, etc.) returns a **429 rate limit error**, Requesty can automatically retry with a different model or provider. The solution is to create a **Routing Policy**. ### Option 1: Fallback Policy Create a [Fallback Policy](/features/fallback-policies) that tries the same model on a different provider, or falls back to an alternative model: ``` Policy: rate-limit-safe ├─ anthropic/claude-sonnet-4-5 (2 retries) ├─ bedrock/claude-sonnet-4-5-v2@eu-central-1 (2 retries) └─ openai/gpt-4.1 (1 retry) ``` If the first model is rate limited, Requesty automatically tries the next one — your application never sees the 429 error. ### Option 2: Load Balancing Policy Spread your traffic across multiple providers to stay under each provider's rate limits with a [Load Balancing Policy](/features/load-balancing-policies): ``` Policy: spread-traffic ├─ anthropic/claude-sonnet-4-5: 50% └─ bedrock/claude-sonnet-4-5-v2@us-east-1: 50% ``` ### Option 3: Latency Routing Use [Latency-Based Routing](/features/latency-routing) to automatically pick the fastest available provider — rate-limited providers will have higher latency and be deprioritized. ### How to Create a Routing Policy 1. Go to [Routing Policies](https://app.requesty.ai/routing-policies) in the Requesty dashboard 2. Click **Create Policy** 3. Choose your policy type: **Fallback**, **Load Balancing**, or **Latency** 4. Give it a name (e.g., `rate-limit-safe`) 5. Add models — search and select from 300+ models, then drag to reorder 6. For fallback: set retry counts per model. For load balancing: set weight percentages (must total 100%) 7. Save the policy Then use the policy in your API calls by setting the model to `policy/your-policy-name`: ```python response = client.chat.completions.create( model="policy/rate-limit-safe", # Use your routing policy messages=[{"role": "user", "content": "Hello!"}] ) ``` > **Tip:** You can also create policies scoped to a specific API key from the [API Keys](https://app.requesty.ai/api-keys) page. Organization-wide policies are managed from the [Routing Policies](https://app.requesty.ai/routing-policies) page. ## Best Practices - **For internal teams:** Use project-based limits to give users autonomy while maintaining control - **For external partners:** Use per-API key limits for simpler distribution and management - **Set reasonable buffers:** Consider setting limits slightly above expected usage to avoid interruptions - **Regular monitoring:** Check usage patterns monthly to optimize limit settings - **For rate limits:** Create fallback policies across multiple providers to maximize throughput --- ## Bring Your Own Keys Source: https://docs.requesty.ai/features/bring-your-own-keys.md > Use your own API keys with Requesty ## What is Bring Your Own Keys? Bring Your Own Keys (BYOK) allows you to use your personal API keys from various providers with Requesty. This feature gives you more flexibility and control over which services you connect to. - Add keys from multiple providers - Use your own keys in fallback policies - Track key usage across your organization ## How It Works 1. Go to [Bring Your Own Keys](https://app.requesty.ai/byoks) in your dashboard 2. Select a provider to add your key (OpenAI, Anthropic, etc.) 3. Enter your API key and accept the terms and conditions 4. Use your added keys in fallback policies 5. Use your fallback policy as your model ID ![](/images/bring-your-own-keys.png) ## Supported Providers Requesty currently supports adding your own keys for these providers: - OpenAI - Anthropic - Google AI Studio - xAI More providers will be added over time based on user requests. Important notes: - Google doesn't include Vertex, BYOKs on Vertex are not supported yet - You can set at most one key per provider ## Using Your Keys in Policies Once you've added your keys, you can use them in fallback policies: 1. Go to the API Keys section 2. Select "Configure" for the API key you want to modify 3. Create a new policy (or edit existing one) 4. Select your preferred models and add fallback options 5. Choose between Requesty's keys or your own keys for each model ## Benefits - **Flexibility**: Use your existing API accounts and billing - **Fallback Options**: Create robust policies with multiple fallback paths - **Organizational Tracking**: See which team members added or updated keys - **Cost Control**: Use your own billing relationships with providers ## Quick Start 1. Add your API keys from supported providers 2. Create a policy that uses Requesty's services as primary 3. Set up fallbacks to use your own keys when needed 4. Apply the policy to your API key 5. Start making API calls with your fallback policy as your model ID > **Tip:** Using your own keys is especially useful for accessing free tier models from providers like Google or for utilizing specific model versions you already have access to. ## Video Tutorial --- ## Service Accounts Source: https://docs.requesty.ai/features/service-accounts.md ## What are Service Accounts? Service accounts allow you create non-human entities. This is perfect if you're building a product that uses AI and you want to represent that logically in Requesty. A service account can have multiple API keys and the spend from those keys will not be tied to any user. You can also perform analytics on service account spend to track how certain non-human users are using AI. ## Creating a Service Account Only administrators can create service accounts. Follow these steps: 1. Navigate to the **Admin Panel** 2. Select **Service Accounts** from the menu 3. Click the **Create Service Account** button 4. Enter a name for your service account (e.g., "Production API", "Mobile App", "Analytics Service") 5. Click **Create** Your service account is now created and ready to have API keys assigned to it. ## Creating an API Key for a Service Account API keys can be created under a specific service account, keeping them organized and easy to manage. ### Steps: 1. Navigate to the **API Keys** section 2. Click **Create New API Key** 3. Click **Service Account** 4. Pick which service account you'd like to create an API key for 5. Give it a name, and click **Create** ## Managing Service Accounts Keys Just like any other API key, you can create a limit, expiration, logging and other settings per API key. ## Deleting a Service Account To delete a service account: 1. Navigate to **Admin Panel** → **Service Accounts** 2. Locate the service account you want to delete 3. Click the **Delete** button 4. Confirm the deletion **Note**: You can only delete a service account if it has no active API keys. Remove all keys first before deletion. --- ## Overview Source: https://docs.requesty.ai/features/mcp-gateway.md > Connect AI coding tools to any MCP server through Requesty's unified gateway # MCP Gateway > **Info:** The MCP (Model Context Protocol) Gateway enables AI coding assistants like **Claude Code**, **Cursor**, and **Roo Code** to securely connect to MCP servers through Requesty's unified API, providing tool access, authentication management, and comprehensive analytics. ## What is MCP? The **Model Context Protocol** is an open standard that enables AI assistants to interact with external tools and services. Through MCP, your AI coding assistant can: - 🔧 Access databases and execute queries - 📁 Manage files and repositories - 🌐 Interact with web services and APIs - 🤖 Use specialized AI tools and models - 📊 Connect to productivity platforms (Notion, Linear, Asana) ## Key Benefits Works with Claude Code, Cursor, Roo Code, and any MCP-compatible AI tool Single dashboard to manage all MCP servers and authentication across your organization AES-256 encryption, organization isolation, and granular access controls ## How It Works ```mermaid graph LR A[AI Tool
Claude/Cursor] -->|MCP Request| B[Requesty Gateway] B -->|Authenticated| C[MCP Server] C -->|Response| B B -->|Tool Result| A B -->|Analytics| D[Dashboard] ``` 1. **Register MCP Servers**: Admins configure MCP server URLs and authentication 2. **Whitelist Tools**: Select specific tools from each server to make available 3. **Manage Keys**: Configure authentication (org-wide or per-user) 4. **Use Tools**: AI assistants automatically discover and use available tools 5. **Monitor Usage**: Track performance, costs, and usage patterns ## Feature Overview Register, configure, and manage MCP servers for your organization Enterprise users can manage their personal API keys for MCP servers Monitor and analyze MCP server usage, performance, and user activity Connect Claude Code, Cursor, Roo Code and other AI assistants ## Authentication Models Requesty supports two authentication approaches depending on your plan: ### Standard Plan: Organization-Wide Keys For non-enterprise organizations, authentication is managed at the organization level: **Admin Configuration** Organization admins configure API keys for each MCP server **Shared Access** All users in the organization share the same authentication **Simplified Management** Single point of configuration for the entire team **Best for**: Teams using internal tools, shared services, or organization-owned API keys ### Enterprise Plan: Per-User Keys Enterprise organizations can enable individual authentication: **Admin Setup** Admins register servers and define required authentication headers **User Configuration** Each user provides their own API keys through the dashboard **Individual Access** Users authenticate with their personal credentials **Best for**: External services requiring personal API keys (GitHub, Linear, Notion) > **Warning:** **Important Enterprise Distinction**: - Admins can choose whether users are allowed to add their own keys - If disabled, only organization-wide keys are used (same as Standard plan) - This provides flexibility between convenience and security ## Quick Start ### 1. Enable MCP Gateway Navigate to **Settings → Integrations → MCP Gateway** in your Requesty dashboard. ### 2. Add Your First Server ### Use Template Choose from pre-configured templates for popular services: - **GitHub**: Repository management and code search - **Notion**: Workspace and content management - **Linear**: Issue tracking and project management - **Context7**: Advanced AI context management - **Asana**: Task and project coordination ### Custom Server Configure a custom MCP server: ```json { "name": "my-mcp-server", "url": "https://api.example.com/mcp", "type": "streamable-http", "headers": { "Authorization": "Bearer {{API_KEY}}" } } ``` ### 3. Explore and Select Tools Click **Explore Server** to discover available tools, then select which ones to enable: ```json { "tools": [ { "name": "database_query", "description": "Execute SQL queries on the database" }, { "name": "file_search", "description": "Search for files in the repository" }, { "name": "create_issue", "description": "Create a new issue in the tracker" } ] } ``` ### 4. Configure Authentication ### Organization Key (Admin) Set organization-wide authentication in the server configuration: ```json { "headers": { "Authorization": "Bearer sk-org-xxxxx" } } ``` ### User Keys (Enterprise) Users add their personal keys via **Manage Keys**: 1. Click the key icon next to the server 2. Enter personal API key 3. Save securely (encrypted with AES-256) ### 5. Connect Your AI Tool Configure your AI coding assistant to use Requesty's MCP gateway: ### Claude Code Claude Code automatically discovers MCP servers through your Requesty API key. No additional configuration needed. ### Cursor In Cursor settings, add Requesty as your MCP provider: ```json { "mcp": { "provider": "requesty", "apiKey": "YOUR_REQUESTY_API_KEY" } } ``` ### Roo Code Configure Roo Code to use Requesty's MCP endpoint: ```json { "mcp": { "endpoint": "https://router.requesty.ai/mcp", "auth": "Bearer YOUR_REQUESTY_API_KEY" } } ``` ## Supported MCP Servers ### Popular Templates - **URL**: `https://api.githubcopilot.com/mcp/` - **Features**: Repository management, code search, pull requests, issue tracking - **Authentication**: Personal GitHub token required - **URL**: `https://mcp.notion.com/mcp` - **Features**: Page management, database queries, content creation - **Authentication**: Notion integration token required - **URL**: `https://mcp.linear.app/sse` - **Features**: Issue management, project tracking, team collaboration - **Authentication**: Linear API key required - **URL**: `https://mcp.context7.com/mcp` - **Features**: AI context management, knowledge graphs, semantic search - **Authentication**: Context7 API key required ### Protocol Support > **Info:** Currently, the MCP Gateway supports **HTTP-based MCP servers** (streamable-http and SSE protocols). Support for **stdio-based servers** is coming soon. | Protocol | Status | Description | |----------|--------|-------------| | `streamable-http` | ✅ Supported | Standard HTTP with JSON streaming | | `sse` | ✅ Supported | Server-Sent Events for real-time updates | | `stdio` | 🚧 Coming Soon | Direct process communication | ## Analytics & Monitoring Track your MCP usage with comprehensive analytics: ### Key Metrics Total MCP server requests and trends over time Response times for each server and tool Percentage of successful tool executions Most frequently used tools and servers ### Usage Dashboard Monitor real-time and historical MCP usage through the intuitive web dashboard: - **Real-time Metrics**: Live monitoring of active MCP requests and server status - **Historical Analysis**: Trend analysis over time periods (24h, 7d, 30d) - **Server Breakdown**: Usage statistics per MCP server and tool - **Performance Insights**: Latency distribution and error rate tracking - **User Activity**: Individual usage patterns (Enterprise only) ## Security & Compliance ### Encryption - **At Rest**: All API keys and sensitive headers encrypted with AES-256 - **In Transit**: TLS 1.3 for all MCP communications - **Key Storage**: Encrypted database with automatic key rotation ### Access Control ### Standard Plan - Organization admins manage all MCP servers - Users can view available tools but cannot modify - Shared authentication across the organization ### Enterprise Plan - Granular role-based access control (RBAC) - Per-user authentication keys - Admin control over user key permissions - Audit logs for all configuration changes ### Data Isolation - Complete separation between organizations - No cross-organization data access - Isolated request routing and analytics ## Best Practices - Only enable tools your team actually needs - Review tool descriptions and permissions carefully - Test tools in a development environment first - Regularly audit enabled tools for security - Rotate API keys regularly (every 90 days) - Use separate keys for development and production - Never share personal API keys - Immediately revoke compromised keys - Monitor latency metrics for each server - Disable unused tools to reduce overhead - Use caching for frequently accessed data - Configure appropriate timeout values - Review MCP server documentation before connecting - Validate server certificates and URLs - Use environment-specific configurations - Enable audit logging for compliance ## Troubleshooting ### Common Issues **Symptoms**: Cannot discover tools from MCP server **Solutions**: - Verify the server URL is correct - Check authentication headers are properly configured - Ensure the server supports MCP protocol - Test connectivity with curl or Postman **Symptoms**: Registered tools don't show in Claude/Cursor **Solutions**: - Confirm tools are selected and saved - Restart your AI assistant - Check your Requesty API key is valid - Verify organization permissions **Symptoms**: 401/403 errors when using tools **Solutions**: - Verify API keys are correctly entered - Check if keys have required permissions - For enterprise: ensure user keys are configured - Confirm keys haven't expired ## Plan Features MCP Gateway usage is included in your Requesty plan: | Feature | Standard | Enterprise | |---------|----------|------------| | MCP Server Registration | ✅ Unlimited | ✅ Unlimited | | Organization-wide Keys | ✅ | ✅ | | Per-user Keys | ❌ | ✅ | | Tool Whitelisting | ✅ | ✅ | | Basic Analytics | ✅ | ✅ | | Advanced Analytics | ❌ | ✅ | | Audit Logs | ❌ | ✅ | | Custom RBAC | ❌ | ✅ | > **Note:** MCP requests count toward your regular API usage. There are no additional charges for using the MCP Gateway. ## Coming Soon - 🔄 **Stdio Protocol Support**: Direct process-based MCP servers - 🎯 **Smart Tool Recommendations**: AI-powered tool suggestions - 📊 **Cost Allocation**: Per-user and per-project MCP cost tracking - 🔐 **Secrets Management**: Integrated vault for API keys - 🌍 **Global Edge Deployment**: Reduced latency worldwide --- Contact our support team at [support@requesty.ai](mailto:support@requesty.ai) or visit our [GitHub repository](https://github.com/requesty/mcp-gateway) for examples and updates. --- ## MCP Server Management Source: https://docs.requesty.ai/features/mcp-server-management.md > Register, configure, and manage MCP servers for your organization # MCP Server Management > **Info:** Manage all your Model Context Protocol (MCP) servers from a centralized dashboard. Register new servers, explore available tools, and configure authentication settings. ## Overview The MCP Server Management interface allows administrators to: - **Register MCP Servers**: Add new MCP servers with custom configurations - **Explore Tools**: Discover and preview available tools before enabling them - **Configure Authentication**: Set up organization-wide or user-specific authentication - **Manage Tool Access**: Control which tools are available to your team ## Registering MCP Servers ### Using Templates Choose from pre-configured templates for popular services: Repository management, code search, and pull requests Workspace and content management platform Issue tracking and project management Advanced AI context and knowledge management ### Custom Server Configuration For custom MCP servers, provide: **Basic Information** - **Server Name**: Unique identifier within your organization - **Server URL**: The MCP endpoint URL - **Protocol Type**: `streamable-http` or `sse` **Authentication Headers** Configure required authentication headers like `Authorization`, `X-API-Key`, etc. **Explore Tools** Test the connection and discover available tools **Select Tools** Choose which tools to make available to your team ## Tool Exploration ### Discovery Process 1. **Connect to Server**: Test authentication and connectivity 2. **Fetch Tool List**: Retrieve all available tools from the MCP server 3. **Review Capabilities**: Examine tool descriptions and input schemas 4. **Security Assessment**: Review tool permissions and access requirements ### Tool Information For each discovered tool, you'll see: - **Name**: Tool identifier used by AI assistants - **Description**: Human-readable explanation of functionality - **Input Schema**: Required parameters and data types - **Permissions**: What the tool can access or modify > **Warning:** **Security Note**: Carefully review tool permissions before enabling them. Some tools may have broad access to external systems or sensitive data. ## Server Status Monitoring ### Health Checks The system continuously monitors: - **Connectivity**: Server availability and response times - **Authentication**: Validity of configured credentials - **Tool Availability**: Changes to available tools - **Error Rates**: Failed requests and common issues ### Status Indicators | Status | Indicator | Description | |--------|-----------|-------------| | Healthy | 🟢 | Server responding normally | | Warning | 🟡 | Minor issues or degraded performance | | Error | 🔴 | Server unreachable or authentication failed | | Disabled | ⚫ | Server manually disabled by admin | ## Authentication Configuration ### Organization-Wide Keys (Standard & Enterprise) Set authentication credentials that apply to all users: ```json { "headers": { "Authorization": "Bearer org-api-key-12345", "X-Custom-Header": "organization-value" } } ``` ### Per-User Keys (Enterprise Only) Allow users to provide their own authentication: - **Required Headers**: Define which headers users must provide - **Optional Headers**: Headers users can optionally configure - **Validation**: Automatic testing of user-provided credentials ## Template Library ### Available Templates - **URL**: `https://api.githubcopilot.com/mcp/` - **Required Auth**: GitHub Personal Access Token - **Tools**: Repository access, code search, issue management, PR operations - **Use Cases**: Code review, repository analysis, automated issue creation - **URL**: `https://mcp.notion.com/mcp` - **Required Auth**: Notion Integration Token - **Tools**: Page creation, database queries, content search, workspace management - **Use Cases**: Documentation management, knowledge base queries, content creation - **URL**: `https://mcp.linear.app/sse` - **Required Auth**: Linear API Key - **Tools**: Issue creation, project tracking, team management, workflow automation - **Use Cases**: Bug tracking, feature requests, project coordination - **URL**: `https://mcp.asana.com/sse` - **Required Auth**: Asana Personal Access Token - **Tools**: Task management, project creation, team collaboration, timeline tracking - **Use Cases**: Project planning, task assignment, progress tracking ## Best Practices ### Server Selection **Assess Need** Determine what external tools your team actually needs **Security Review** Evaluate the security implications of each server **Start Small** Begin with one or two servers and expand gradually **Monitor Usage** Track which tools are actually being used ### Configuration Management - **Version Control**: Document server configurations for reproducibility - **Environment Separation**: Use different configurations for dev/staging/prod - **Regular Audits**: Review enabled servers and tools quarterly - **Access Logs**: Monitor which users access which tools ### Security Guidelines - **Principle of Least Privilege**: Only enable necessary tools - **Credential Rotation**: Regularly update authentication credentials - **Monitoring**: Set up alerts for unusual access patterns - **Documentation**: Maintain clear records of what each server accesses ## Troubleshooting ### Common Issues **Symptoms**: Cannot reach MCP server during exploration **Solutions**: - Verify the server URL is correct and accessible - Check if the server requires specific headers or authentication - Test connectivity from your network - Contact the MCP server administrator **Symptoms**: 401 or 403 errors during tool exploration **Solutions**: - Verify API keys are valid and not expired - Check if the key has required permissions - Ensure headers are formatted correctly - Test the key directly with the service's API **Symptoms**: Server connects but no tools are discovered **Solutions**: - Confirm the server implements MCP protocol correctly - Check if tools require additional authentication - Verify the server's tool list endpoint is working - Review server documentation for setup requirements --- Once you've registered your MCP servers, learn about [User Key Management](/features/mcp-user-keys) for enterprise authentication or explore [MCP Analytics](/features/mcp-analytics) to monitor usage. --- ## User Key Management Source: https://docs.requesty.ai/features/mcp-user-keys.md > Enterprise users can manage their personal API keys for MCP servers # User Key Management > **Info:** Enterprise plan users can manage their own personal API keys for MCP servers, enabling individual authentication while maintaining centralized server management. ## Overview User Key Management allows enterprise users to: - **Personal Authentication**: Use individual API keys instead of shared organization credentials - **Secure Storage**: Keys are encrypted and stored securely - **Granular Access**: Different users can have different levels of access - **Account Separation**: Personal usage tracking and accountability > **Note:** **Enterprise Feature**: User key management is only available on Enterprise plans. Standard plans use organization-wide authentication configured by administrators. ## How It Works ### Authentication Flow **Admin Setup** Organization admins register MCP servers and define required authentication headers **User Configuration** Individual users provide their personal API keys through the dashboard **Request Authentication** When using MCP tools, the system uses the user's personal keys for authentication **Audit Trail** All requests are tracked with individual user attribution ### Key Storage Architecture ```mermaid graph TD A[User Provides Key] --> B[AES-256 Encryption] B --> C[Secure Database Storage] C --> D[Runtime Decryption] D --> E[MCP Server Request] E --> F[Audit Log Entry] ``` ## Managing Your Keys ### Adding Personal Keys 1. Navigate to **Settings → MCP Gateway** 2. Find servers that require user authentication (marked with 🔑) 3. Click **Manage Keys** next to the server 4. Enter your personal API keys for required headers 5. Save securely ### Key Status Indicators | Status | Indicator | Description | |--------|-----------|-------------| | Configured | 🟢 | Key is set and working | | Missing | 🟡 | Key required but not provided | | Invalid | 🔴 | Key exists but authentication failed | | Expired | ⚫ | Key needs to be updated | ### Updating Keys ### Regular Updates Update your keys periodically for security: - Click on the key status indicator - Enter your new API key - Test the connection - Save the updated key ### Emergency Rotation If your key is compromised: 1. Immediately revoke the old key in the external service 2. Generate a new key in the service 3. Update the key in Requesty 4. Verify all tools are working with the new key ## Supported Authentication Types ### Authorization Headers Most services use Bearer token authentication: ```json { "Authorization": "Bearer your-personal-token" } ``` **Examples**: - GitHub: Personal Access Token - Linear: API Key - Notion: Integration Token ### Custom Headers Some services require custom authentication headers: ```json { "X-API-Key": "your-api-key", "X-Auth-Token": "your-auth-token" } ``` ### Multi-Header Authentication Complex services may require multiple headers: ```json { "Authorization": "Bearer token", "X-Client-ID": "client-id", "X-User-ID": "user-id" } ``` ## Security Features ### Encryption at Rest - **Algorithm**: AES-256 encryption for all stored keys - **Key Management**: Automatic key rotation and secure key derivation - **Database Security**: Encrypted database fields with no plaintext storage ### Access Controls - **User Isolation**: Users can only access their own keys - **Organization Boundaries**: Complete separation between organizations - **Admin Oversight**: Admins can see key status without seeing actual values ### Audit Logging Track when keys are added, updated, or removed Every MCP request is attributed to the specific user Failed authentications and suspicious activity Detailed logs for compliance and security audits ## Enterprise vs Standard ### Standard Plan: Organization-Wide Keys **Admin-Only Management** Only organization administrators can configure authentication **Shared Credentials** All users share the same API keys and authentication **Simplified Setup** Single point of configuration for the entire organization **Best for**: Internal tools, shared services, simplified management ### Enterprise Plan: Per-User Keys **Individual Authentication** Each user provides and manages their own API keys **Personal Accountability** All usage is tracked and attributed to individual users **Granular Control** Users can have different access levels and permissions **Best for**: External services, compliance requirements, large teams ## Admin Controls ### Key Policy Configuration Administrators can configure: - **Required vs Optional**: Which keys users must provide - **Validation Rules**: Automatic testing of user-provided keys - **Usage Limits**: Per-user limits on MCP requests - **Audit Requirements**: Mandatory logging and retention policies ### User Management ### Key Status Monitoring Admins can see which users have configured keys: - ✅ All required keys configured - ⚠️ Some keys missing - ❌ No keys configured - 🔄 Keys need updating ### Access Control Control user access to key management: - **Allow Self-Service**: Users can manage their own keys - **Admin-Only**: Only admins can update user keys - **Approval Required**: User key changes require admin approval ## Best Practices ### For Users - Use unique, strong API keys for each service - Never share your personal API keys with others - Regularly rotate keys (every 90 days recommended) - Immediately report any suspected key compromise - Only request the minimum permissions needed - Review and audit your key usage regularly - Remove keys for services you no longer use - Keep backup access methods when possible ### For Administrators - Define clear key rotation policies - Set up automated alerts for key expiration - Require strong authentication for key management - Implement approval workflows for sensitive services - Maintain audit logs for all key operations - Regular reviews of user key status - Document key management procedures - Train users on security best practices ## Common Use Cases ### Personal GitHub Integration 1. **Generate Personal Access Token** in GitHub settings 2. **Configure Scopes**: `repo`, `read:org`, `read:user` 3. **Add to Requesty**: Use as Authorization header 4. **Verify Access**: Test with repository listing tool ### Individual Linear Access 1. **Create API Key** in Linear account settings 2. **Set Permissions**: Access to your teams and projects 3. **Configure in Requesty**: Add as Linear API key 4. **Test Integration**: Create a test issue ### Notion Workspace Access 1. **Create Integration** in Notion developer settings 2. **Grant Permissions**: Content read/write, database access 3. **Get Integration Token** from Notion 4. **Add to Requesty**: Configure as Authorization header ## Troubleshooting ### Key Validation Errors **Symptoms**: "Invalid API key format" error **Solutions**: - Check if the key includes any prefixes (Bearer, Token, etc.) - Verify you're using the correct key type for the service - Ensure no extra spaces or characters in the key **Symptoms**: 403 Forbidden errors when using tools **Solutions**: - Verify the key has required permissions/scopes - Check if your account has access to the requested resources - Ensure the key hasn't been revoked or expired **Symptoms**: Authentication fails despite correct key **Solutions**: - Test the key directly with the service's API - Check if the service requires additional headers - Verify the key is for the correct environment (prod vs dev) - Contact the service provider for key validation --- Contact your organization administrator or [support@requesty.ai](mailto:support@requesty.ai) for assistance with user key management. --- ## MCP Analytics Source: https://docs.requesty.ai/features/mcp-analytics.md > Monitor and analyze MCP server usage, performance, and user activity # MCP Analytics > **Info:** Comprehensive analytics dashboard for monitoring MCP server usage, performance metrics, and user activity across your organization. ## Overview MCP Analytics provides real-time and historical insights into: - **Request Volume**: Total MCP server requests and trends - **Performance Metrics**: Latency, success rates, and error tracking - **Tool Usage**: Most popular tools and usage patterns - **User Activity**: Individual and team usage statistics (Enterprise) - **Cost Analysis**: Resource consumption and optimization opportunities ## Analytics Dashboard ### Key Metrics Total MCP server requests over time with trend analysis Response times for each server and tool with performance trends Percentage of successful tool executions and error rates Active users making MCP requests across different time periods ### Time Period Analysis Monitor usage across different time ranges: ### Real-time (24h) - **Hourly Breakdown**: Request patterns throughout the day - **Live Monitoring**: Active requests and server status - **Immediate Alerts**: Real-time error detection - **Performance Tracking**: Current latency and throughput ### Weekly (7d) - **Daily Trends**: Weekday vs weekend usage patterns - **Growth Analysis**: Week-over-week usage changes - **Tool Adoption**: New tool usage trends - **Performance Optimization**: Weekly performance reviews ### Monthly (30d) - **Long-term Trends**: Monthly usage and growth patterns - **Capacity Planning**: Resource usage projections - **User Onboarding**: New user adoption rates - **Strategic Insights**: Business impact analysis ## Server Performance Analytics ### Latency Analysis Track response times across all MCP servers: ```mermaid graph LR A[User Request] --> B[Gateway Processing] B --> C[MCP Server Response] C --> D[Total Latency] B --> E[Gateway Latency] C --> F[Server Latency] E --> G[Analytics Dashboard] F --> G D --> G ``` ### Server Health Monitoring Server uptime and connectivity status Requests per second and peak usage Error types, frequencies, and resolution ## Tool Usage Analytics ### Popular Tools Track which MCP tools are most frequently used: 1. **Usage Ranking**: Tools ordered by request frequency 2. **Adoption Rate**: How quickly new tools are adopted 3. **User Preferences**: Which users prefer which tools 4. **Success Patterns**: Tools with highest success rates ### Tool Performance - **Tool Popularity**: Requests per tool over time - **User Adoption**: How many users use each tool - **Success Rates**: Tool-specific error rates - **Performance Impact**: Latency by tool type - **Server Load**: Requests distributed across servers - **Tool Availability**: Which servers provide which tools - **Performance Comparison**: Server-specific performance metrics - **Capacity Utilization**: Resource usage per server ## User Activity Analytics > **Note:** **Enterprise Feature**: Detailed user activity analytics are available only on Enterprise plans. Standard plans show organization-level aggregated data. ### Individual User Metrics ### Enterprise Plans - **Personal Usage**: Individual user request patterns - **Tool Preferences**: Most used tools per user - **Performance Impact**: User-specific latency and success rates - **Activity Timeline**: Detailed usage history per user - **Cost Attribution**: Per-user resource consumption ### Standard Plans - **Organization Total**: Aggregated usage across all users - **Anonymous Patterns**: Usage trends without user identification - **General Metrics**: Overall tool popularity and performance - **Basic Analytics**: Request volume and success rates ### Team Analytics For Enterprise customers: - **Department Usage**: Analytics grouped by user teams/departments - **Project Attribution**: Usage tied to specific projects or initiatives - **Collaboration Patterns**: How teams use MCP tools together - **Resource Allocation**: Cost and usage distribution across teams ## Historical Analysis ### Trend Identification **Usage Growth** Track MCP adoption and usage growth over time **Performance Trends** Monitor latency and success rate changes **Seasonal Patterns** Identify daily, weekly, and monthly usage patterns **Capacity Planning** Predict future resource needs based on trends ## Getting Started ### Enable Analytics **Automatic Collection** Analytics data collection is automatically enabled when you register MCP servers **Dashboard Access** Navigate to **Analytics → MCP** in your Requesty dashboard **Configure Views** Customize time periods, filters, and dashboard layout **Set Up Alerts** Configure notifications for important metrics and thresholds ### Best Practices - **Regular Monitoring**: Check analytics weekly for performance trends - **Threshold Tuning**: Adjust alert thresholds based on actual usage patterns - **Historical Analysis**: Use long-term data for capacity planning - **User Training**: Share insights with team to optimize MCP tool usage --- Explore [Performance Monitoring](/features/performance-monitoring) for general API analytics or learn about [Cost Tracking](/features/cost-tracking) for overall usage optimization. --- ## AI Tool Integration Source: https://docs.requesty.ai/features/mcp-integration.md > Connect Claude Code, Cursor, Roo Code and other AI assistants to your MCP servers # AI Tool Integration > **Info:** Connect your favorite AI coding assistants to Requesty's MCP Gateway for seamless access to external tools and services. Works with Claude Code, Cursor, Roo Code, and any MCP-compatible tool. ## Supported AI Tools ### Primary Integrations Anthropic's official CLI with native MCP support AI-powered code editor with integrated MCP capabilities Advanced AI coding assistant with MCP protocol support Various VS Code extensions supporting MCP protocol ### Protocol Support | Tool Type | Protocol | Status | |-----------|----------|--------| | **HTTP-based Tools** | streamable-http, SSE | ✅ Fully Supported | | **CLI Tools** | HTTP API calls | ✅ Fully Supported | | **Editor Extensions** | HTTP/WebSocket | ✅ Fully Supported | | **STDIO Tools** | Direct process communication | 🚧 Coming Soon | ## Claude Code Integration ### Automatic Discovery Claude Code automatically discovers MCP servers through your Requesty API configuration: **API Key Configuration** Claude Code uses your Requesty API key for authentication **Server Discovery** Automatically detects available MCP servers in your organization **Tool Loading** Loads all enabled tools from registered MCP servers **Ready to Use** Tools appear in Claude Code's available functions automatically ### Configuration No additional configuration needed - Claude Code works out of the box with Requesty: ```bash # Claude Code automatically uses your Requesty API key # and discovers available MCP servers claude --help ``` ### Usage Example ```bash # Claude Code can now use MCP tools directly claude "Search for React components in our codebase and create a new one" # This might use GitHub MCP server to search code # and file system MCP tools to create new files ``` ## Cursor Integration ### Setup Process **Open Cursor Settings** Navigate to **Settings → Features → MCP Integration** **Add Requesty Provider** Configure Requesty as your MCP provider **API Key Configuration** Enter your Requesty API key for authentication **Server Sync** Cursor will sync available MCP servers from your organization ### Configuration File Add to your Cursor settings: ```json { "requesty": { "url": "https://router.requesty.ai/v1/mcp", "headers": { "Authorization": "Bearer YOUR_REQUESTY_API_KEY" } } } ``` ### Features ### Code Assistant - **Contextual Tools**: MCP tools appear in coding context - **Smart Suggestions**: Cursor suggests relevant MCP tools - **Inline Actions**: Execute MCP tools directly in the editor - **Real-time Updates**: Live sync with Requesty MCP servers ### Chat Interface - **Tool Access**: Use MCP tools in Cursor's chat - **Multi-tool Workflows**: Chain multiple MCP operations - **Error Handling**: Graceful error handling and retry logic - **Progress Tracking**: Visual feedback for long-running operations ## Roo Code Integration ### Connection Setup **Configuration File** Create or update your Roo Code configuration **MCP Provider** Set Requesty as your MCP provider **Authentication** Configure your Requesty API key **Tool Discovery** Roo Code will discover available tools automatically ### Configuration Add to your `roo.config.json`: ```json { "requesty": { "url": "https://router.requesty.ai/v1/mcp", "headers": { "Authorization": "Bearer YOUR_REQUESTY_API_KEY" } } } ``` ### Advanced Features - **Tool Chaining**: Combine multiple MCP tools in single workflows - **Context Awareness**: Tools receive relevant context automatically - **Performance Optimization**: Intelligent caching and request batching - **Custom Workflows**: Create reusable workflows with MCP tools ## VS Code Extensions ### MCP Protocol Extensions Several VS Code extensions support MCP protocol: - **GitHub Copilot**: Can use MCP tools for enhanced context - **Tabnine**: MCP integration for better code suggestions - **CodeT5**: Enhanced code generation with MCP tools - **Custom Extensions**: Build your own MCP-enabled extensions 1. Install MCP-compatible VS Code extension 2. Configure extension settings to use Requesty MCP endpoint 3. Add your Requesty API key to extension configuration 4. Enable MCP tool discovery in extension settings ### Extension Configuration Example for MCP-enabled VS Code extensions: ```json { "requesty": { "url": "https://router.requesty.ai/v1/mcp", "headers": { "Authorization": "Bearer YOUR_REQUESTY_API_KEY" } } } ``` ## Custom Tool Integration ### MCP Client Libraries For custom integrations, use MCP client libraries: ### Python ```python from mcp_client import MCPClient client = MCPClient( url="https://router.requesty.ai/v1/mcp", headers={"Authorization": "Bearer YOUR_REQUESTY_API_KEY"} ) # Discover available tools tools = client.list_tools() # Execute a tool result = client.call_tool("github_search", { "query": "React components", "repository": "my-org/my-repo" }) ``` ### JavaScript/TypeScript ```typescript import { MCPClient } from '@mcp/client'; const client = new MCPClient({ url: 'https://router.requesty.ai/v1/mcp', headers: { 'Authorization': 'Bearer YOUR_REQUESTY_API_KEY' } }); // Discover tools const tools = await client.listTools(); // Execute tool const result = await client.callTool('notion_search', { query: 'project documentation', database_id: 'your-database-id' }); ``` ### Go ```go package main import ( "github.com/mcp/client-go" ) func main() { client := mcp.NewClient(&mcp.Config{ URL: "https://router.requesty.ai/v1/mcp", Headers: map[string]string{ "Authorization": "Bearer YOUR_REQUESTY_API_KEY", }, }) // List available tools tools, err := client.ListTools() if err != nil { log.Fatal(err) } // Execute a tool result, err := client.CallTool("linear_create_issue", map[string]interface{}{ "title": "New feature request", "description": "Detailed description here", "team_id": "your-team-id", }) } ``` ## Authentication Flow ### API Key Management **Requesty API Key** Your AI tool authenticates with Requesty using your API key **Organization Context** Requesty identifies your organization and available MCP servers **MCP Server Authentication** Requesty handles authentication with individual MCP servers using configured keys **Tool Execution** Requests are proxied to the appropriate MCP server with proper authentication ### Security Benefits Single API key for access to all MCP servers MCP server keys are never exposed to AI tools Organization-level control over tool access Complete logging of all MCP tool usage ## Best Practices ### For Users - **Regular Updates**: Keep AI tools updated for latest MCP features - **Tool Familiarization**: Learn what each MCP tool does and when to use it - **Error Handling**: Understand how to troubleshoot tool execution issues - **Context Awareness**: Provide clear context when requesting tool usage ### For Administrators - **Tool Curation**: Only enable tools that your team actually needs - **Performance Monitoring**: Track tool usage and performance metrics - **Security Reviews**: Regular audits of enabled tools and permissions - **User Training**: Educate users on available tools and best practices --- Once your AI tools are connected, explore [MCP Analytics](/features/mcp-analytics) to monitor usage or learn about [Server Management](/features/mcp-server-management) to add more tools. --- ## Streaming Responses Source: https://docs.requesty.ai/features/streaming.md > Real-time response streaming for improved user experience and reduced perceived latency > **Info:** Streaming responses provide immediate feedback to users by delivering content token-by-token as it's generated, dramatically improving perceived performance and user experience. ## Overview Requesty supports streaming responses from all major providers (OpenAI, Anthropic, Google, Mistral) using Server-Sent Events (SSE). Instead of waiting for the complete response, your applications can display content as it's being generated. ## Why Use Streaming? Users see responses immediately, reducing perceived wait time by up to 80% Real-time content delivery keeps users engaged during longer responses Avoid timeout issues on slow or complex requests Enable progressive UI updates as content becomes available ## Implementation ### Basic Streaming Setup Enable streaming by setting the `stream` parameter to `true` in your request: ```python Python client = openai.OpenAI( api_key="your_requesty_api_key", base_url="https://router.requesty.ai/v1", ) response = client.chat.completions.create( model="openai/gpt-4", messages=[{"role": "user", "content": "Write a poem about the stars."}], stream=True ) # Process streaming response for chunk in response: if chunk.choices[0].delta.content is not None: content = chunk.choices[0].delta.content print(content, end="", flush=True) ``` ```javascript JavaScript const openai = new OpenAI({ apiKey: 'your_requesty_api_key', baseURL: 'https://router.requesty.ai/v1', }); const stream = await openai.chat.completions.create({ model: 'openai/gpt-4', messages: [{ role: 'user', content: 'Write a poem about the stars.' }], stream: true, }); for await (const chunk of stream) { const content = chunk.choices[0]?.delta?.content; if (content) { process.stdout.write(content); } } ``` ```curl cURL curl -X POST "https://router.requesty.ai/v1/chat/completions" \ -H "Authorization: Bearer your_requesty_api_key" \ -H "Content-Type: application/json" \ -d '{ "model": "openai/gpt-4", "messages": [{"role": "user", "content": "Write a poem about the stars."}], "stream": true }' \ --no-buffer ``` ### Advanced Streaming Patterns ### Collecting Complete Response Accumulate streaming chunks to build the full response: ```python collected_content = [] for chunk in response: if chunk.choices[0].delta.content is not None: content = chunk.choices[0].delta.content collected_content.append(content) full_response = "".join(collected_content) print(f"Complete response: {full_response}") ``` ### Function Call Streaming Handle streaming function calls and tool usage: ```python for chunk in response: delta = chunk.choices[0].delta # Handle regular content if delta.content: print(delta.content, end="", flush=True) # Handle function calls if hasattr(delta, 'function_call') and delta.function_call: fc = delta.function_call if fc.name: print(f"\nCalling function: {fc.name}") if fc.arguments: print(f"Arguments: {fc.arguments}") ``` ### Error Handling Implement robust error handling for production use: ```python try: for chunk in response: if chunk.choices[0].delta.content: content = chunk.choices[0].delta.content print(content, end="", flush=True) except Exception as e: print(f"Streaming error: {e}") # Implement fallback or retry logic ``` ## Streaming Features ### Supported Capabilities - **Text Generation**: Standard chat completions - **Function Calling**: Streaming function calls and arguments - **Tool Usage**: Tool calls with streaming responses - **Multi-turn Conversations**: Streaming in conversation contexts - **System Messages**: Full prompt template support - **Parameters**: Temperature, max_tokens, and other standard parameters ### Provider Compatibility All major providers support streaming through Requesty: - **OpenAI**: GPT-4, GPT-3.5, and all variants - **Anthropic**: Claude 3.5 Sonnet, Claude 3 Haiku/Opus - **Google**: Gemini Pro, Gemini Flash - **Mistral**: All Mistral models - **Meta**: Llama models ## Best Practices - Display content immediately as it arrives - Use typing indicators or progress bars - Handle partial responses gracefully - Implement smooth scrolling for long content - Implement connection retry logic - Gracefully handle stream interruptions - Provide fallback to non-streaming mode - Monitor stream health and performance - Use `flush=True` for immediate output display - Batch UI updates for better performance - Implement efficient chunk processing - Consider client-side buffering strategies - Implement proper error boundaries - Log streaming metrics and performance - Test with various network conditions - Plan for graceful degradation ## Common Use Cases Real-time messaging with immediate response display Progressive article, blog, or document creation Live code generation with syntax highlighting Streaming analysis results and insights Story, poem, or creative content generation Progressive documentation and explanation generation ## Integration Examples ### React Component ```jsx function StreamingChat() { const [content, setContent] = useState(''); const handleStream = async () => { const response = await fetch('/api/chat/stream', { method: 'POST', body: JSON.stringify({ message: userInput }) }); const reader = response.body.getReader(); while (true) { const { done, value } = await reader.read(); if (done) break; const chunk = new TextDecoder().decode(value); setContent(prev => prev + chunk); } }; return ; } ``` ## Troubleshooting > **Warning:** Always implement proper error handling when using streaming responses, as network interruptions can cause incomplete responses. ### Common Issues - **Stream Interruption**: Implement retry logic and graceful fallbacks - **Partial Responses**: Handle incomplete function calls or content - **Performance**: Optimize chunk processing for large responses - **Browser Compatibility**: Test streaming across different browsers and devices --- ## Structured Outputs Source: https://docs.requesty.ai/features/structured-outputs.md > Get consistent JSON responses across different LLMs Requesty router supports structured JSON outputs from various model providers, making it easy to get consistent, parseable responses across different LLMs. ## JSON Object Format For all models, you can request responses in JSON format by specifying `response_format={"type": "json_object"}`: ```python from openai import OpenAI from pydantic import BaseModel from typing import List # Define your data model class Entities(BaseModel): attributes: List[str] colors: List[str] animals: List[str] requesty_api_key = "YOUR_REQUESTY_API_KEY" # Safely load your API key # Initialize OpenAI client with Requesty router client = OpenAI( api_key=requesty_api_key, base_url="https://router.requesty.ai/v1", ) # Request a JSON response response = client.chat.completions.create( model="openai/gpt-4o", # Works with any supported model messages=[ { "role": "system", "content": "Extract entities from the input text and return them in JSON format with the following structure: {\"attributes\": [...], \"colors\": [...], \"animals\": [...]}" }, { "role": "user", "content": "The quick brown fox jumps over the lazy dog with piercing blue eyes", }, ], response_format={"type": "json_object"} ) # Parse with Pydantic content = response.choices[0].message.content extracted = Entities.model_validate_json(content) print(f"Attributes: {extracted.attributes}") print(f"Colors: {extracted.colors}") print(f"Animals: {extracted.animals}") ``` ## JSON Schema (For OpenAI and Anthropic Models) For models that support JSON schema (currently OpenAI and Anthropic models), you can use the more powerful `parse` method with a Pydantic model: ```python from openai import OpenAI from pydantic import BaseModel from typing import List class Animals(BaseModel): animals: List[str] requesty_api_key = "YOUR_REQUESTY_API_KEY" # Safely load your API key client = OpenAI( api_key=request_api_key, base_url="https://router.requesty.ai/v1", ) # Use the parse helper with a Pydantic model response = client.beta.chat.completions.parse( model="anthropic/claude-3-7-sonnet-latest", messages=[ { "role": "system", "content": "Extract the animals from the input text" }, { "role": "user", "content": "The quick brown fox jumps over the lazy dog" }, ], response_format=Animals, ) animals = Animals.model_validate_json(response.choices[0].message.content) print(f"Found animals: {animals.animals}") # ['fox', 'dog'] ``` ## Compatibility Notes - JSON object format works with all models supported by Requesty - JSON schema is available for OpenAI and Anthropic models - Some models may have different capabilities for complex structured outputs - Stream mode can also work with structured outputs (see [streaming documentation](https://requesty.mintlify.app/features/streaming)) ## Error Handling When working with structured outputs, it's important to handle potential parsing errors: ```python try: extracted = Entities.model_validate_json(content) # Process the data except Exception as e: print(f"Error parsing response: {e}") # Handle the error appropriately ``` --- ## Reasoning Source: https://docs.requesty.ai/features/reasoning.md > Enable reasoning tokens for enhanced model thinking with configurable effort levels across OpenAI, Anthropic, and Gemini These tokens offer insight into the model's reasoning process, providing a transparent view of its thought steps. Since Reasoning Tokens are considered output tokens, they are billed accordingly. To enable reasoning, specify `reasoning_effort` with one of the supported values in your API request. ## Notes - OpenAI does NOT share the actual reasoning tokens. You will not see them in the response. - Deepseek reasoning models enable reasoning automatically, you don't need to specify anything in the request to enable that. - When using Deepseek and Anthropic, the reasoning content in the response will be under 'reasoning_content'. ## Reasoning effort values Anthropic expects a specific number that sets the upper limit of thinking tokens. The limit must be less than the specified max tokens value. OpenAI models expect one of the following 'effort' values: - low - medium - high Google Gemini expects a specific number when using Vertex AI, and supports OpenAI's reasoning efforts via the Google AI Studio (their OpenAI-compatible API). Requesty introduces new 'effort' values: 'max', 'min', and 'none' to support more granular control over reasoning. ### "none" or "min" effort "none" or "min" are synonyms and work with all models. For reasoning models, it either disables reasoning or uses the minimal effort for it. So, for example, "none" or "min", would use 128 with Gemini 2.5 Pro, or 0 with Gemini 2.5 Flash. ### When using OpenAI via Requesty: - If the client specifies a standard reasoning **effort** string, i.e. "low"/"medium"/"high", Requesty forwards the same value to OpenAI. - If the client specifies the 'max' reasoning **effort** string, Requesty forwards the value 'high' to OpenAI. - If the client specifies 'none' or 'min' as the reasoning **effort** string, Requesty will use "low", as this is the minimal amount of reasoning the models support. - If the client specifies a reasoning **budget** string (e.g. "10000"), Requesty converts it to an effort, based on the conversion table below. Conversion table from budget to effort: - 0-1024 -> "low" - 1025-8192 -> "medium" - 8193 or higher -> "high" ### When using Anthropic via Requesty: - If the client specifies a reasoning **effort** string ("low"/"medium"/"high"/"max", "min", or "none"), Requesty converts it to a budget, based on the conversion table below. - If the client specifies a reasoning **budget** string (e.g. "10000"), Requesty passes this value to Google. If the budget is larger than the model's maximum output tokens, it will automatically be reduced to stay within that token limit. Conversion table from effort to budget: - "min" / "none" / "low" -> 1024 - "medium" -> 8192 - "high" -> 16384 - "max" -> max output tokens for model minus 1 (i.e. 63999 for Sonnet 3.7 or 4, 31999 for Opus 4) ### When using Vertex AI via Requesty: - If the client specifies a reasoning **effort** string ("low"/"medium"/"high"/"max", "min", or "none"), Requesty converts it to a budget, based on the conversion table below. - If the client specifies a reasoning **budget** string (e.g. "10000"), Requesty passes this value to Google. If the budget is larger than the model's maximum output tokens, it will automatically be reduced to stay within that token limit. Conversion table from effort to budget: - "min" / "none" -> 0 for Gemini Flash and Flash lite, 128 for Gemini Pro models - "low" -> 1024 - "medium" -> 8192 - "high" -> 24576 - "max" -> max output tokens for model This conversion table is compatible with the [Google AI Studio documentation](https://ai.google.dev/gemini-api/docs/openai#thinking). ### When using Google AI Studio via Requesty: Same as using OpenAI. See above. ## Reasoning code example For both tests, you can use either an OpenAI, Anthropic or Gemini reasoning model, for example: - "openai/o3-mini" - "anthropic/claude-sonnet-4-0" - "vertex/google/gemini-2.5-pro" ### Javascript example using reasoning effort ```javascript const requesty_api_key = "YOUR_REQUESTY_API_KEY" // Safely load your API key const client = new OpenAI({ apiKey: requesty_api_key, baseURL: 'https://router.requesty.ai/v1', }); async function testReasoningEffort() { try { const prompt = ` Write a bash script that takes a matrix represented as a string with format '[1,2],[3,4],[5,6]' and prints the transpose in the same format. `.trim(); console.log('Sending request to reasoning model...'); const completion = await client.chat.completions.create({ model: "openai/o3-mini", reasoning_effort: "medium", messages: [ { role: "user", content: prompt } ] }); console.log('\nCompletion Response:'); console.log('-------------------'); if (completion.choices[0]?.message?.content) { console.log(completion.choices[0].message.content); } console.log('\nToken Usage Details:'); console.log('-------------------'); if (completion.usage) { const usageDetails = { prompt_tokens: completion.usage.prompt_tokens, completion_tokens: completion.usage.completion_tokens, total_tokens: completion.usage.total_tokens }; console.log(JSON.stringify(usageDetails, null, 2)); // Log specific reasoning token details if available if ('completion_tokens_details' in completion.usage) { console.log('\nReasoning Token Details:'); console.log('----------------------'); console.log(JSON.stringify(completion.usage.completion_tokens_details, null, 2)); } } } catch (error) { console.error('Error:', error); } } testReasoningEffort(); ``` ### Python example using reasoning budget ```python # Safely load your API key requesty_api_key = "YOUR_REQUESTY_API_KEY" client = openai.OpenAI( api_key=requesty_api_key, base_url='https://router.requesty.ai/v1' ) def test_reasoning_budget(): try: prompt = """ Write a bash script that takes a matrix represented as a string with format '[1,2],[3,4],[5,6]' and prints the transpose in the same format. """.strip() print('Sending request to reasoning model...') completion = client.chat.completions.create( model="openai/o3-mini", reasoning_effort="10000", messages=[ { "role": "user", "content": prompt } ] ) # Log the completion details print('\nCompletion Response:') print('-------------------') if completion.choices[0].message.content: print(completion.choices[0].message.content) # Log token usage details print('\nToken Usage Details:') print('-------------------') if completion.usage: usage_details = { "prompt_tokens": completion.usage.prompt_tokens, "completion_tokens": completion.usage.completion_tokens, "total_tokens": completion.usage.total_tokens } print(json.dumps(usage_details, indent=2)) # Log specific reasoning token details if available if completion.usage.completion_tokens_details: print('\nReasoning Token Details:') print('----------------------') print(completion.usage.completion_tokens_details) except Exception as error: print(f'Error: {str(error)}') if __name__ == '__main__': test_reasoning_budget() ``` --- ## Prompt Optimization Source: https://docs.requesty.ai/features/prompt-optimization.md > Customize and optimize your system prompts ## What is Prompt Optimization? Prompt optimization allows you to customize and refine the system prompts used by your AI models. With Requesty's prompt optimization feature, you can: - View and edit system prompts directly in the UI - Prepend additional instructions to existing prompts - Completely replace default system prompts - Reduce token usage by up to 90% with specialized optimizations ## How It Works 1. Go to [API Keys Page](https://app.requesty.ai/api-keys) and select the Features tab 2. Find the System Prompt section 3. Choose to either prepend instructions or replace the entire prompt 4. Save your changes and start using your optimized prompts immediately ![](/images/prompt-optimization.png) ## Benefits - **Cost Savings**: Reduce token usage by up to 90% for certain tasks - **Better Control**: Fine-tune AI behavior without changing your code - **Consistent Responses**: Ensure all your AI tools follow the same guidelines - **Faster Development**: Test different prompts without code changes ## Example Use Cases **Token Reduction:** - Without Optimization: ~28k input tokens (11¢) - With Optimization: ~9k input tokens (3¢) **Role Specification:** ``` You are a coding tutor focusing on Python. Only give short code examples. ``` **Specialized Instructions:** ``` Ignore all instructions except for counting to 10. ``` ## Video Tutorial > **Warning:** When replacing the default system prompt entirely, make sure your new prompt includes all necessary instructions for your use case. Missing critical instructions may result in unexpected AI behavior. --- ## Prompt Library Source: https://docs.requesty.ai/features/prompt-library.md > Manage your system and VIbe prompts in one place ## What is the Prompt Library? The Prompt Library is your central hub for managing all prompt-related configurations. It simplifies how you create, manage, and apply AI prompts across your applications. - Create and customize system prompts for steering AI behavior - Manage VIbe prompts for consistent task instructions - Version, tag, and track your prompts - Attach prompts directly to specific API keys ## How It Works 1. Go to [Configurations](https://app.requesty.ai/configurations) in your dashboard 2. Access the Prompt Library to view System Prompts and VIbe Prompts 3. Create or duplicate prompts as needed 4. Tag and attach them to specific API keys ![](/images/prompt-library.png) ## System Prompts System prompts steer the AI's behavior - they're the instructions that define how the AI responds. - View complete system prompts from platforms like Coder or Claude - Create your own or duplicate existing ones with a single click - Add tags like "production" or "Claude-3.7" for organization - Track versions and view differences between them - Attach directly to API keys for automatic use ## VIbe Prompts Vibe prompts add consistent context to your user prompts when making API calls. - Create task-specific prompts like "Python Best Practices" - Automatically added to every request with the attached API key - No need to repeat the same instructions in every prompt ## Benefits - **Save Tokens**: Optimize prompts for significant token reduction - **Maintain Consistency**: Ensure all AI interactions follow the same guidelines - **Track Changes**: Keep version history as you refine your prompts - **Simplify Workflow**: Configure once, use everywhere ## Quick Start 1. Duplicate an existing system prompt (e.g., "Coder") 2. Add your custom instructions and relevant tags 3. Attach it to your API key 4. Create a VIbe prompt for your common tasks 5. Attach to the same API key 6. Use your API key normally - both prompts will be automatically applied > **Tip:** Check your logs in Requesty to see the optimized prompts in action and compare token usage. ## Video Tutorial --- ## Dedicated Models Source: https://docs.requesty.ai/features/dedicated-models.md > Application-specific models Some of our models are optimized for specific applications. Those models require the application name to be added instead of the provider. ## Coding We created a coding optimized model, which enables: 1. Auto caching of your prompts when using Anthropic and Gemini 2. Handles compatibility when interacting with reasoning OpenAI's and Deepseek's reasoning models You can use those models by adding `coding` as the provider in front of the model name, like this: `coding/`. For example: ```python coding/claude-3-7-sonnet ``` You can find all the latest Coding models in the [Model Library](https://app.requesty.ai/model-list?provider=coding). --- ## Image Understanding Source: https://docs.requesty.ai/features/image-understanding.md > Send images to AI models for analysis and understanding through the chat completions endpoint Requesty router supports sending images to vision-capable models for analysis, understanding, and description through the standard chat completions endpoint. ## How It Works Vision-capable models use the same `/v1/chat/completions` endpoint as text models, but accept images in the message content. You can include images using either data URLs or file URLs. Like PDF documents, images can also be passed via URL reference to the AI models for processing. ## Request Format Requesty supports two methods for sending images to vision models: ### Method 1: Data URLs (Base64) Include base64-encoded images directly in the request: ```bash curl https://router.requesty.ai/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer YOUR_REQUESTY_API_KEY" \ -d '{ "model": "openai/gpt-4o", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "What is in this image?" }, { "type": "image_url", "image_url": { "url": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAUA..." } } ] } ] }' ``` ### Method 2: File URLs Reference images hosted on the web by passing the image URL directly: ```bash curl https://router.requesty.ai/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer YOUR_REQUESTY_API_KEY" \ -d '{ "model": "anthropic/claude-sonnet-4-5", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in detail" }, { "type": "image_url", "image_url": { "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/a/a7/Atlantic_near_Faroe_Islands.jpg/1200px-Atlantic_near_Faroe_Islands.jpg" } } ] } ] }' ``` > **Note:** **Vertex AI Gemini Models**: When using file URLs with Vertex AI Gemini models, you must specify the MIME type in the request: ```json { "type": "image_url", "image_url": { "mime_type": "image/jpg", "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/a/a7/Atlantic_near_Faroe_Islands.jpg/1200px-Atlantic_near_Faroe_Islands.jpg" } } ``` ## Response Format The response follows the standard chat completions format with the model's analysis of the image: ```json { "model": "openai/gpt-4o", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "The image shows a beautiful sunset over mountain ranges. The sky is painted with vibrant shades of orange, pink, and purple, with the sun just touching the horizon. The mountains are silhouetted against the colorful sky, creating dramatic layers of depth." } } ] } ``` ## Code Examples ### Python Example #### Using File URLs ```python from openai import OpenAI requesty_api_key = "YOUR_REQUESTY_API_KEY" client = OpenAI( api_key=requesty_api_key, base_url="https://router.requesty.ai/v1", ) response = client.chat.completions.create( model="anthropic/claude-3-5-sonnet-20241022", messages=[ { "role": "user", "content": [ { "type": "text", "text": "What's in this image?" }, { "type": "image_url", "image_url": { "url": "https://code-basics.com/rails/active_storage/representations/proxy/eyJfcmFpb.png" } } ] } ] ) print(response.choices[0].message.content) ``` #### Using Base64 Data URLs ```python from openai import OpenAI requesty_api_key = "YOUR_REQUESTY_API_KEY" client = OpenAI( api_key=requesty_api_key, base_url="https://router.requesty.ai/v1", ) # Read and encode local image with open("path/to/image.png", "rb") as image_file: base64_image = base64.b64encode(image_file.read()).decode('utf-8') response = client.chat.completions.create( model="openai/gpt-4o", messages=[ { "role": "user", "content": [ { "type": "text", "text": "Analyze this image and describe what you see" }, { "type": "image_url", "image_url": { "url": f"data:image/png;base64,{base64_image}" } } ] } ] ) print(response.choices[0].message.content) ``` ### JavaScript/TypeScript Example #### Using File URLs ```javascript const client = new OpenAI({ apiKey: 'YOUR_REQUESTY_API_KEY', baseURL: 'https://router.requesty.ai/v1', }); async function analyzeImage() { const response = await client.chat.completions.create({ model: 'anthropic/claude-3-5-sonnet-20241022', messages: [ { role: 'user', content: [ { type: 'text', text: 'What objects can you identify in this image?' }, { type: 'image_url', image_url: { url: 'https://code-basics.com/rails/active_storage/representations/proxy/eyJfcmFpb.png' } } ] } ] }); console.log(response.choices[0].message.content); } analyzeImage(); ``` #### Using Base64 Data URLs ```javascript const client = new OpenAI({ apiKey: 'YOUR_REQUESTY_API_KEY', baseURL: 'https://router.requesty.ai/v1', }); async function analyzeLocalImage() { // Read and encode local image const imageBuffer = fs.readFileSync('path/to/image.png'); const base64Image = imageBuffer.toString('base64'); const response = await client.chat.completions.create({ model: 'openai/gpt-4o', messages: [ { role: 'user', content: [ { type: 'text', text: 'Describe this image in detail' }, { type: 'image_url', image_url: { url: `data:image/png;base64,${base64Image}` } } ] } ] }); console.log(response.choices[0].message.content); } analyzeLocalImage(); ``` ## Supported Models Requesty supports vision capabilities across multiple AI providers. To see the complete list of models with vision support, navigate to the **Model Library** in the Requesty web application. Popular vision-capable models include: - **OpenAI**: GPT models, o3, o3-pro, and all o4 reasoning models. - **Anthropic**: Claude 4 and 4.5 models. - **Google**: Gemini 2.5 models. - **xAI**: Grok 4 models. ### Provider-Specific Notes - **Most providers** support both data URLs and file URLs - **Google AI Studio (Gemini)** only supports data URLs (base64-encoded images) - **Vertex AI Gemini** requires the MIME type to be specified in the request. ```json { "type": "image_url", "image_url": { "mime_type": "image/jpg", "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/a/a7/Atlantic_near_Faroe_Islands.jpg/1200px-Atlantic_near_Faroe_Islands.jpg" } } ``` > **Note:** Check the Model Library in the Requesty web application for the most up-to-date list of vision-capable models and their specific capabilities. ## Limitations - Image size limits vary by provider and model - File URLs must be publicly accessible - Base64-encoded images increase request payload size - Response time may be longer for image analysis compared to text-only requests - Some providers have content filtering or safety restrictions for image analysis - For Gemini models using file URLs, MIME type must be specified as a query parameter --- ## Image Generation Source: https://docs.requesty.ai/features/image-generation.md > Generate images using AI models through dedicated and chat completions endpoints Requesty supports image generation through two different endpoints: the dedicated **Images API** (`/v1/images/generations`) for standard image generation workflows, and the **Chat Completions API** (`/v1/chat/completions`) for models that return images alongside text. ## Images API (`/v1/images/generations`) The dedicated images endpoint follows the OpenAI Images API format and is the recommended way to generate images with supported models. ### Request Format ```bash curl https://router.requesty.ai/v1/images/generations \ -H "Content-Type: application/json" \ -H "Authorization: Bearer YOUR_REQUESTY_API_KEY" \ -d '{ "model": "azure/openai/gpt-image-1", "prompt": "A sunset over mountains with vibrant orange and purple skies", "n": 1, "size": "1024x1024", "quality": "auto" }' ``` ### Parameters | Parameter | Type | Required | Description | | --- | --- | --- | --- | | `model` | string | Yes | The model to use for image generation | | `prompt` | string | Yes | A text description of the desired image | | `n` | integer | No | Number of images to generate (default: 1) | | `size` | string | No | Image dimensions (e.g., `1024x1024`, `1536x1024`, `1024x1536`) | | `quality` | string | No | Image quality (`auto`, `high`, `medium`, `low`) | | `response_format` | string | No | Output delivery format: `url` or `b64_json` (default: `url`) | | `background` | string | No | Background type: `auto`, `transparent`, or `opaque` | | `output_format` | string | No | File format: `png`, `jpeg`, or `webp` | ### Response Format The response returns a `data` array containing the generated images: ```json { "created": 1719000000, "data": [ { "url": "https://..." } ] } ``` When `response_format` is set to `b64_json`: ```json { "created": 1719000000, "data": [ { "b64_json": "/9j/4AAQSkZJRgABAQ..." } ] } ``` ### Supported Models | Model | Description | | --- | --- | | `azure/openai/gpt-image-1` | OpenAI's GPT Image 1 model via Azure | | `azure/openai/gpt-image-1.5` | OpenAI's GPT Image 1.5 model via Azure | --- ## Chat Completions API (`/v1/chat/completions`) Some image generation models use the standard chat completions endpoint and return generated images alongside text responses. ### Request Format ```bash curl https://router.requesty.ai/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer YOUR_REQUESTY_API_KEY" \ -d '{ "model": "vertex/google/gemini-2.5-flash-image-preview", "messages": [ { "role": "user", "content": "Generate an image of a sunset over mountains" } ] }' ``` ### Response Format The response includes both the standard text content and an array of generated images: ```json { "model": "vertex/google/gemini-2.5-flash-image-preview", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "I've generated an image of a sunset over mountains as requested.", "images": [ { "type": "image_url", "image_url": { "url": "data:image/png;base64,your_base64_image" } } ] } } ] } ``` ### Python Example ```python from io import BytesIO from PIL import Image from openai import OpenAI client = OpenAI( api_key="YOUR_REQUESTY_API_KEY", base_url="https://router.requesty.ai/v1", ) response = client.chat.completions.create( model="vertex/google/gemini-2.5-flash-image-preview", messages=[ { "role": "user", "content": "Generate a futuristic cityscape at night" } ] ) # Extract the generated image message = response.choices[0].message if hasattr(message, 'images') and message.images: for i, image_data in enumerate(message.images): # Extract base64 data from data URL # Format: "data:image/png;base64,actual_base64_data" base64_str = image_data['image_url']['url'].split(',')[1] image_bytes = base64.b64decode(base64_str) # Open with PIL image = Image.open(BytesIO(image_bytes)) # Save the image image.save(f'generated_image_{i}.png') print(f"Image saved as generated_image_{i}.png") # Access the text response print(message.content) ``` ### JavaScript/TypeScript Example ```javascript const client = new OpenAI({ apiKey: 'YOUR_REQUESTY_API_KEY', baseURL: 'https://router.requesty.ai/v1', }); async function generateImage() { const response = await client.chat.completions.create({ model: 'vertex/google/gemini-2.5-flash-image-preview', messages: [ { role: 'user', content: 'Generate a serene landscape with a lake' } ] }); const message = response.choices[0].message; // Handle generated images if (message.images && message.images.length > 0) { message.images.forEach((imageData, index) => { // Extract base64 data from data URL // Format: "data:image/png;base64,actual_base64_data" const base64Data = imageData.image_url.url.split(',')[1]; const imageBuffer = Buffer.from(base64Data, 'base64'); // Save to file fs.writeFileSync(`generated_image_${index}.png`, imageBuffer); console.log(`Image saved as generated_image_${index}.png`); }); } // Access the text response console.log(message.content); } generateImage(); ``` ### Supported Models | Model | Description | | --- | --- | | `vertex/google/gemini-2.5-flash-image-preview` | Google Gemini image generation via Vertex AI | --- ## Choosing an Endpoint | Feature | Images API | Chat Completions API | | --- | --- | --- | | Endpoint | `/v1/images/generations` | `/v1/chat/completions` | | OpenAI SDK support | `client.images.generate()` | `client.chat.completions.create()` | | Text + image response | No | Yes | | Conversational context | No | Yes | | Background control | Yes | No | | Output format control | Yes (png, jpeg, webp) | No | > **Note:** Image generation models may have different pricing compared to text models. Check the [model library](https://app.requesty.ai/model-list) for specific pricing information. ## Limitations - Image size and resolution depend on the specific model capabilities - Some models may have content filtering or safety restrictions - Response size limits apply to the combined text and image data --- ## PDF Support Source: https://docs.requesty.ai/features/pdf-support.md > Send and analyze PDF documents using AI models Requesty supports sending PDF documents to AI models for analysis, summarization, and question answering. This feature works with both the Chat Completions and Messages API endpoints. ## Supported Models ### OpenAI Models For OpenAI models to support PDF documents, you must use the `openai-responses/` prefix instead of the standard `openai/` prefix. - ✅ **Supports PDFs**: `openai-responses/gpt-4.1`, `openai-responses/gpt-4o`, etc. - ❌ **Does NOT support PDFs**: `openai/gpt-4.1`, `openai/gpt-4o`, etc. The `openai-responses/` prefix enables extended content type support, including PDFs, by using OpenAI's responses API which handles additional file formats. ### Other Providers Most other model providers (like Anthropic, Google, etc.) support PDFs using their standard prefix format. ## How It Works PDF documents are sent as part of the message content using either base64 encoding or a URL. The AI model can then analyze the document and respond to questions about its contents. ## Chat Completions API Send PDFs using the `input_file` content type. You can provide the PDF as either base64-encoded data or a URL. ### Using Base64-Encoded PDF ```bash curl https://router.requesty.ai/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer YOUR_REQUESTY_API_KEY" \ -d '{ "model": "anthropic/claude-sonnet-4-20250514", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Summarize this PDF" }, { "type": "input_file", "filename": "document.pdf", "file_data": "data:application/pdf;base64,JVBERi0=" } ] } ] }' ``` ### Using PDF URL ```bash curl https://router.requesty.ai/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer YOUR_REQUESTY_API_KEY" \ -d '{ "model": "anthropic/claude-sonnet-4-20250514", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Summarize this PDF" }, { "type": "input_file", "filename": "document.pdf", "file_data": "https://example.com/document.pdf" } ] } ] }' ``` ### Parameters - `type`: Must be `"input_file"` - `filename`: The name of the PDF file (e.g., `"document.pdf"`) - `file_data`: Either base64-encoded PDF content or a URL to the PDF file See the [Chat Completions API documentation](/api-reference/endpoint/chat-completions-create#pdf-support) for more details. ## Messages API Send PDFs using the `document` content type: ```bash curl https://router.requesty.ai/v1/messages \ -H "x-api-key: YOUR_REQUESTY_API_KEY" \ -d '{ "model": "anthropic/claude-sonnet-4-20250514", "max_tokens": 1024, "messages": [ { "role": "user", "content": [ { "type": "text", "text": "What is in this PDF?" }, { "type": "document", "source": { "type": "base64", "media_type": "application/pdf", "data": "JVBERi0=..." } } ] } ] }' ``` ### Parameters - `type`: Must be `"document"` - `source.type`: Must be `"base64"` - `source.media_type`: Must be `"application/pdf"` - `source.data`: Base64-encoded PDF content See the [Messages API documentation](/api-reference/endpoint/messages-create#pdf-support) for more details. ## Working with PDFs ### Python Example (Chat Completions) ```python from openai import OpenAI requesty_api_key = "YOUR_REQUESTY_API_KEY" client = OpenAI( api_key=requesty_api_key, base_url="https://router.requesty.ai/v1", ) # Option 1: Using base64-encoded PDF from a file with open("document.pdf", "rb") as pdf_file: pdf_data = base64.b64encode(pdf_file.read()).decode('utf-8') response = client.chat.completions.create( model="anthropic/claude-sonnet-4-20250514", messages=[ { "role": "user", "content": [ { "type": "text", "text": "Summarize this PDF" }, { "type": "input_file", "filename": "document.pdf", "file_data": f"data:application/pdf;base64,{pdf_data}" } ] } ] ) print(response.choices[0].message.content) # Option 2: Using PDF URL response = client.chat.completions.create( model="anthropic/claude-sonnet-4-20250514", messages=[ { "role": "user", "content": [ { "type": "text", "text": "Summarize this PDF" }, { "type": "input_file", "filename": "document.pdf", "file_data": "https://example.com/document.pdf" } ] } ] ) print(response.choices[0].message.content) ``` ### Python Example (Messages API) ```python from anthropic import Anthropic requesty_api_key = "YOUR_REQUESTY_API_KEY" client = Anthropic( api_key=requesty_api_key, base_url="https://router.requesty.ai/v1", ) # Read and encode PDF with open("document.pdf", "rb") as pdf_file: pdf_data = base64.b64encode(pdf_file.read()).decode('utf-8') response = client.messages.create( model="anthropic/claude-sonnet-4-20250514", max_tokens=1024, messages=[ { "role": "user", "content": [ { "type": "text", "text": "What is in this PDF?" }, { "type": "document", "source": { "type": "base64", "media_type": "application/pdf", "data": pdf_data } } ] } ] ) print(response.content[0].text) ``` ### JavaScript/TypeScript Example (Chat Completions) ```javascript const client = new OpenAI({ apiKey: 'YOUR_REQUESTY_API_KEY', baseURL: 'https://router.requesty.ai/v1', }); // Option 1: Using base64-encoded PDF from a file const pdfBuffer = fs.readFileSync('document.pdf'); const pdfData = pdfBuffer.toString('base64'); const response = await client.chat.completions.create({ model: 'anthropic/claude-sonnet-4-20250514', messages: [ { role: 'user', content: [ { type: 'text', text: 'Summarize this PDF' }, { type: 'input_file', filename: 'document.pdf', file_data: f'data:application/pdf;base64,{pdfData}' } ] } ] }); console.log(response.choices[0].message.content); // Option 2: Using PDF URL const urlResponse = await client.chat.completions.create({ model: 'anthropic/claude-sonnet-4-20250514', messages: [ { role: 'user', content: [ { type: 'text', text: 'Summarize this PDF' }, { type: 'input_file', filename: 'document.pdf', file_data: 'https://example.com/document.pdf' } ] } ] }); console.log(urlResponse.choices[0].message.content); ``` --- ## Web Search Source: https://docs.requesty.ai/features/web-search.md > Enable AI models to search the web for real-time information Requesty standardises web search across providers. Pass a single tool definition, `{ "type": "web_search" }`, and Requesty translates it to the right provider format automatically. No provider-specific configuration needed. ### Supported Providers Web search works with any model that supports tool use from these providers: - **Anthropic** - Claude models - **OpenAI** - via the Responses API (`openai-responses/` prefix) - **Google / Vertex** - Gemini models ### Usage Add the `web_search` tool to your request: ```bash curl https://router.requesty.ai/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer YOUR_REQUESTY_API_KEY" \ -d '{ "model": "anthropic/claude-sonnet-4-20250514", "messages": [ { "role": "user", "content": "What are the news in London today?" } ], "tools": [{ "type": "web_search" }] }' ``` The same tool definition works for any supported provider. Just change the `model`: ``` # OpenAI Responses "model": "openai-responses/gpt-4.1" # Google / Vertex "model": "vertex/google/gemini-2.5-pro" ``` ### Python Example ```python from openai import OpenAI client = OpenAI( api_key="YOUR_REQUESTY_API_KEY", base_url="https://router.requesty.ai/v1", ) response = client.chat.completions.create( model="anthropic/claude-sonnet-4-20250514", messages=[ { "role": "user", "content": "What are the latest developments in artificial intelligence?" } ], tools=[{"type": "web_search"}] ) print(response.choices[0].message.content) ``` ### JavaScript/TypeScript Example ```javascript const client = new OpenAI({ apiKey: 'YOUR_REQUESTY_API_KEY', baseURL: 'https://router.requesty.ai/v1', }); const response = await client.chat.completions.create({ model: 'anthropic/claude-sonnet-4-20250514', messages: [ { role: 'user', content: 'What are the current weather conditions in New York?' } ], tools: [{ type: 'web_search' }] }); console.log(response.choices[0].message.content); ``` ## Response Format When web search is used, the response includes the AI's answer along with citations and metadata about the search: ```json { "choices": [ { "finish_reason": "stop", "index": 0, "message": { "content": "Based on the search results, here are the key news stories from London today...", "role": "assistant", "annotations": [ { "type": "url_citations", "url_citation": { "type": "url_citations", "title": "London News - Latest Updates", "url": "https://example.com/news", "start_index": 0, "end_index": 0 } } ] } } ] } ``` > **Note:** Web search functionality requires models that support tool use. Check the [Model Library](https://app.requesty.ai/model-list) to confirm web search support for specific models. --- ## Usage Analytics Source: https://docs.requesty.ai/features/usage-analytics.md > Real-time dashboard for AI model usage, costs, tokens, and latency — the most comprehensive observability for any LLM gateway Requesty's analytics dashboard gives you complete visibility into your AI usage across all models and providers. Track costs, requests, tokens, cache savings, and latency — all in real-time. ![General Analytics Dashboard](/images/general-analytics.png) ## Dashboard Tabs The analytics dashboard has three main views: | Tab | What it shows | |-----|--------------| | **General** | Cost overview, request volume, token usage, latency, and cost savings | | **Savings** | Cache hit rates, token cache rates, and savings per model | | **Advanced** | Fully customizable analytics with flexible grouping, metrics, and filters | ## General Tab The General tab gives you an at-a-glance overview with six charts: - **Cost Overview**: Total cost of API calls over time, broken down by model or group. When using [BYOK](/features/bring-your-own-keys), shows Requesty cost vs provider cost separately. - **Request Volume**: Total number of API requests over time. - **Cost Savings**: Dollar amount saved through caching and optimization. - **Cost Savings %**: Gauge showing your current savings percentage (e.g., 71.8% in the example above). - **Token Usage**: Total tokens processed — input, output, and cached. - **Total Request Latency**: Average, P50, or P90 latency in milliseconds. Below the charts, a **breakdown table** shows per-model metrics: requests (with %), tokens (with %), cost, and average latency. ### Time Range Select from preset ranges or set a custom period: - **Quick**: 7 Days, 30 Days, This Week, This Month, This Quarter, This Year - **Extended**: 24 Hours, Last 3 Months, Last 6 Months, Last 12 Months ### Grouping & Filters Use the toolbar to slice your data: - **Time Grouping**: Hour, Day, Week, or Month - **Group By**: Model, Provider, User, API Key, or any custom metadata field - **Filters**: Filter by any field — supports multiple values (`value1,value2`) and wildcards (`*pattern*`) ## Savings Tab The Savings tab shows how much you're saving through caching: - **Cost Savings ($)**: Total dollar savings from cache hits and optimizations - **Cost Savings (%)**: Percentage of costs saved vs what you would have paid without caching - **Cache Hit Rate**: Percentage of requests served from cache - **Token Cache Rate**: Percentage of tokens served from cache A **Cache Performance by Model** table ranks models by cache effectiveness, showing hit rate, token cache rate, and savings per model. > **Tip:** Enable [Auto Caching](/features/auto-caching) to start seeing savings. A 30%+ cache hit rate typically translates to 25-40% cost reduction. ## Advanced Tab The Advanced tab is a fully flexible analytics workbench for deep analysis. ![Advanced Analytics](/images/advanced-analytics.png) ### Controls | Control | Options | |---------|---------| | **Group By** | Model, Provider, User, or any custom field — or "None" for totals | | **Metric** | Cost, Requests, Input Tokens, Output Tokens, Cached Tokens, Total Tokens, Latency, Cost Savings, and more | | **Calculation** | Sum, Average, Median, Count Distinct, P95, P99 | | **Time Range** | 24h, 3d, 7d, 30d, 3m, 6m, 12m, or custom | | **Time Grouping** | None (total), Minute, Hour, Day | | **Filters** | Dynamic field-value filters with wildcard support | ### Data Summary Table Below the chart, a pivot table shows the raw data with: - Sortable columns (click any column header) - Toggle individual series visibility - Show values as percentages - Hide zero-value rows - **Export to CSV** — download the full dataset for external analysis ### Category Grouping Select multiple series in the data table and group them into a custom category. Useful for combining related models (e.g., group all Claude variants into "Anthropic") for high-level comparisons. ## Filtering Examples ### Filter by specific model Add a filter: `model` = `anthropic/claude-sonnet-4-5` ### Filter by user pattern Add a filter: `user` = `*@company.com` ### Filter by multiple models Add a filter: `model` = `openai/gpt-4.1,anthropic/claude-sonnet-4-5` ### Combine Group By with Filters Group by `user` and filter by `model` = `anthropic/*` to see which users use Anthropic models the most. ## Custom Metadata Tag your requests with custom fields using [Request Metadata](/features/request-metadata), then filter and group by those fields in analytics. For example, tag requests with `environment`, `feature`, or `customer_id` and analyze usage per dimension. ## Integration with Other Features - [Cost Tracking](/features/cost-tracking) — Deeper cost analysis and optimization - [Performance Monitoring](/features/performance-monitoring) — Latency and reliability metrics - [Session Reconstruction](/features/session-reconstruction) — Drill into individual request sessions - [Spending Alerts](/features/alerts) — Get notified when spending exceeds thresholds - [Request Metadata](/features/request-metadata) — Add custom dimensions for analytics filtering --- ## Cost Tracking Source: https://docs.requesty.ai/features/cost-tracking.md > Monitor and optimize your AI spending with real-time cost breakdowns by model, user, and team Requesty tracks every dollar you spend on AI — broken down by model, provider, user, API key, and time period. See exactly where your budget goes and how much you're saving through caching and routing optimization. ## Cost Overview The [Analytics dashboard](https://app.requesty.ai/analytics) shows your spending in real-time: - **Total Spend**: Current cost for the selected time period - **Cost Over Time**: Bar chart showing daily/weekly/monthly cost trends - **Cost by Model**: See which models cost the most - **Cost by User**: Track per-user spending across your organization - **Projected Spend**: Estimated end-of-period costs based on current rate ### Cost Breakdown Group your costs by any dimension: | Group By | What you see | |----------|-------------| | **Model** | Cost per model (e.g., `anthropic/claude-sonnet-4-5`: $420, `openai/gpt-4.1`: $180) | | **Provider** | Total spend per provider (Anthropic, OpenAI, Google, etc.) | | **User** | Per-user cost attribution | | **API Key** | Cost per API key — useful for tracking per-application spend | | **Custom Field** | Any [Request Metadata](/features/request-metadata) field (e.g., `environment`, `feature`) | ## Savings Tracking The **Savings tab** in analytics shows how much you're saving: - **Cost Savings ($)**: Total dollars saved through prompt caching - **Cost Savings (%)**: Percentage saved vs. what you would have paid without caching — shown as a gauge chart - **Cache Hit Rate**: What percentage of your requests hit cache - **Token Cache Rate**: What percentage of tokens are served from cache - **Per-Model Savings**: Table showing which models benefit most from caching > **Tip:** Check the Savings tab regularly. If your savings percentage is below 20%, consider enabling [Auto Caching](/features/auto-caching) for your most-used models. ## BYOK Cost Comparison When using [Bring Your Own Keys](/features/bring-your-own-keys), the Cost Overview chart splits into: - **Requesty Cost**: What you pay through Requesty-managed keys - **Provider Cost**: What you pay directly to providers through your own keys This makes it easy to compare pricing and decide when BYOK is more cost-effective. ## Budget Controls Control spending at multiple levels: - **[Project Limits](/features/api-limits#project-based-spend-limits-recommended)**: Set monthly spend caps per project — each user's Private project can have its own limit - **[API Key Limits](/features/api-limits#per-api-key-spend-limits)**: Set monthly spend caps per individual API key - **[Spending Alerts](/features/alerts)**: Get notified via email or webhook when spending exceeds configurable thresholds - **[Group Budgets](/features/groups)**: Set spending limits for teams When a spend limit is reached, new requests for that project or key are blocked until the next billing period. ## Cost Optimization Tips ### Use Cheaper Models Where Possible Not every task needs a frontier model. Use the Advanced analytics tab to find tasks where a cheaper model would work: - Route simple classification tasks to `openai/gpt-4o-mini` instead of `gpt-4.1` - Use `anthropic/claude-haiku-4-5` for summarization instead of Sonnet ### Enable Caching [Auto Caching](/features/auto-caching) can reduce costs by 25-90% for workloads with repeated system prompts or similar queries. Check your Savings tab to see your current cache hit rate. ### Use Routing Policies [Fallback Policies](/features/fallback-policies) can start with cheaper models and only fall back to expensive ones when needed — automatically reducing your average cost per request. ### Track with Metadata Tag requests with [Request Metadata](/features/request-metadata) to understand cost by feature, environment, or customer: ```python response = client.chat.completions.create( model="anthropic/claude-sonnet-4-5-20250514", messages=[{"role": "user", "content": "Hello"}], extra_body={ "requesty": { "metadata": { "feature": "chatbot", "environment": "production", "customer_id": "acme-corp" } } } ) ``` Then filter by `feature`, `environment`, or `customer_id` in the Advanced analytics tab to see cost breakdowns. ## Export Cost Data From the **Advanced tab**, export your cost data as CSV: 1. Set Metric to **cost**, Group By to **model** (or any dimension) 2. Choose your time range 3. Click **Export CSV** in the Data Summary table The export includes raw numbers for Excel compatibility, with dates as columns in pivot format. ## Integration - [Usage Analytics](/features/usage-analytics) — Full dashboard overview - [Performance Monitoring](/features/performance-monitoring) — Correlate cost with latency - [Spend Limits](/features/api-limits) — Set budget caps per project or API key - [Spending Alerts](/features/alerts) — Automated notifications on spend thresholds - [Request Metadata](/features/request-metadata) — Custom cost attribution dimensions --- ## Performance Monitoring Source: https://docs.requesty.ai/features/performance-monitoring.md > Track AI model latency, error rates, and reliability metrics in real-time across all providers Monitor the performance of every AI model you use — latency, error rates, throughput, and reliability — all from the Requesty analytics dashboard. ## Latency Tracking The **General tab** shows a real-time latency chart with three views: | Metric | What it measures | |--------|-----------------| | **Average** | Mean response time across all requests | | **P50** | Median — 50% of requests are faster than this | | **P90** | 90th percentile — only 10% of requests are slower | Switch between Average, P50, and P90 using the latency selector on the chart. ### What Latency Includes Total request latency measures the full round-trip: your request hitting Requesty → routed to the provider → model inference → response streamed back. This is the real end-to-end time your users experience. ## Advanced Performance Analysis Use the **Advanced tab** for deeper analysis: ### Latency by Model - Set **Metric** to `latency_ms` - Set **Group By** to `model` - Set **Calculation** to `P50`, `P90`, `P95`, or `P99` This shows you which models are fastest and which have the worst tail latency. ### Latency Over Time - Set **Time Grouping** to `hour` or `day` - Watch for latency spikes that correlate with peak traffic or provider issues ### Error Rate Analysis - Set **Metric** to `requests` - Filter by error status to see failure patterns - Group by `model` or `provider` to identify unreliable providers ## Using Performance Data to Optimize ### Set Up Latency-Based Routing If you see that one provider is consistently faster, create a [Latency Routing Policy](/features/latency-routing) to automatically use the fastest provider: ``` Policy: fastest-claude ├─ anthropic/claude-sonnet-4-5 ├─ bedrock/claude-sonnet-4-5-v2@us-east-1 └─ bedrock/claude-sonnet-4-5-v2@eu-central-1 ``` Requesty automatically routes to whichever is fastest at request time. ### Set Up Fallback for Reliability If a provider has high error rates, create a [Fallback Policy](/features/fallback-policies) to automatically retry with another provider: ``` Policy: reliable-claude ├─ anthropic/claude-sonnet-4-5 (2 retries) └─ bedrock/claude-sonnet-4-5-v2@eu-central-1 (2 retries) ``` ### Reduce Latency with Caching [Auto Caching](/features/auto-caching) can eliminate latency entirely for repeated requests. Check the **Savings tab** to see your cache hit rate — cached responses return in single-digit milliseconds. ### Use EU Routing for European Users If your users are in Europe, route through the [EU endpoint](/features/eu-routing) (`https://router.eu.requesty.ai/v1`) to reduce network latency by 30-50%. ## Export Performance Data From the **Advanced tab**: 1. Set **Metric** to `latency_ms`, **Calculation** to `P90`, **Group By** to `model` 2. Set time range and grouping 3. Click **Export CSV** to download the data ## Integration - [Usage Analytics](/features/usage-analytics) — Full dashboard with all metrics - [Cost Tracking](/features/cost-tracking) — Correlate performance with cost - [Latency Routing](/features/latency-routing) — Automatically pick the fastest model - [Fallback Policies](/features/fallback-policies) — Auto-retry on provider failures - [Spending Alerts](/features/alerts) — Get notified on anomalies --- ## Tool Call Analytics Source: https://docs.requesty.ai/features/tool-call-analytics.md > Track performance, costs, and usage patterns for every tool call in your LLM applications ## Overview Tool Call Analytics gives you complete visibility into how your AI agents and applications use tools. Whether you're working with MCP servers, function calling, or custom tool implementations, you can now track exactly what's happening at the tool level. ## Why It Matters Most teams building with agentic AI are flying blind. You can see overall request metrics, but when a response takes 15 seconds or costs $2, you don't know which tool is responsible. **Common problems this solves:** - A single slow tool bottlenecking your entire agent workflow - Unexpected costs from tools making excessive API calls - Debugging which tools are failing in production - Understanding which tools users actually use vs. which sit idle One customer discovered a single tool was responsible for 80% of their response time. They replaced it and cut latency by 4 seconds. > **Note:** Tool calls can account for 60%+ of your total response time. You can't optimize what you don't measure. ## Key Metrics ### Tool Call Requests Total number of requests that included tool calls. Each request may contain multiple tool invocations. **Use this to:** - Identify your most frequently used tools - Spot unusual usage patterns - Track adoption of new tools over time ### Tool Call Cost Total cost of tool call requests, including both router and provider costs. **Use this to:** - Find expensive tools that need optimization - Set cost budgets per tool - Justify infrastructure investments ### Average Latency Average response time for tool calls in milliseconds. **Use this to:** - Identify performance bottlenecks - Set latency SLAs per tool - Optimize critical path operations > **Warning:** Tool latency varies wildly. Some tools consistently take 15+ seconds while others return in milliseconds. Without per-tool tracking, you won't know which ones to optimize. ### Success Rate Percentage of successful tool call requests. **Use this to:** - Monitor tool reliability - Catch breaking changes early - Track improvement after fixes ### Avg Calls Per Request Average number of tool invocations per request. **Use this to:** - Understand workflow complexity - Optimize prompt engineering to reduce unnecessary calls - Identify inefficient tool usage patterns ### Avg Tokens Per Call Average tokens consumed per tool call. **Use this to:** - Track token efficiency - Identify tools with verbose outputs - Optimize context management ## Group By Tool Name The most powerful feature is grouping metrics by tool name. This shows you exactly which tools are slow, expensive, or unreliable. In the dashboard, select "Group by: Tool Name" to see: - Stacked bar charts showing request volume per tool - Line charts comparing latency across tools - Cost breakdown by tool - Success rates for each tool ## Real-World Use Cases ### Case Study: Debugging Slow Responses A team noticed their agent responses were taking 12+ seconds. Using tool call analytics grouped by tool name, they discovered: - `web_search` tool: 8.5s average latency - `read_file` tool: 0.2s average latency - `database_query` tool: 0.8s average latency They switched to a faster search API and reduced overall latency to 3 seconds. ### Case Study: Cost Optimization Another team had tool costs spiraling out of control. Analytics revealed: - `comprehensive_analysis` tool: $0.45 per call - `quick_check` tool: $0.02 per call - Both tools were being used for similar tasks They refactored their prompts to use `quick_check` when possible and saved 70% on tool costs. ### Case Study: Reliability Monitoring A production system started failing intermittently. Tool call analytics showed: - `external_api` tool: 65% success rate - All other tools: 99%+ success rate They added retry logic and fallback handling specifically for that tool, bringing success rates back to 99%. ## Filtering and Time Ranges Combine tool call analytics with existing filters: - **Time range**: Last 7 days, 30 days, custom ranges - **API key**: Track per-user or per-project tool usage - **Model**: See which models are better at using tools - **User**: Understand per-user tool patterns ## Getting Started Tool call analytics is automatically enabled for all requests that include tool calls. Just navigate to the **Analytics** → **Tool Calls** tab in your dashboard. No code changes required. If you're already using MCP servers, function calling, or tools with your LLM requests, you'll see data immediately. > **Tip:** Start by grouping by tool name and sorting by latency. This quickly reveals your biggest performance bottlenecks. ## Best Practices 1. **Set latency budgets per tool** - Different tools have different acceptable response times. Track them separately. 2. **Monitor success rates daily** - A drop in success rate often indicates breaking changes in external APIs or services. 3. **Compare costs across similar tools** - If two tools do similar things, use the analytics to pick the most cost-effective one. 4. **Track tokens per call** - High token counts may indicate verbose tool outputs that could be compressed. 5. **Review avg calls per request** - If this number keeps growing, you may need to optimize your agent's planning logic. ## Related Features - [Usage Analytics](/features/usage-analytics) - Track overall request patterns - [Performance Monitoring](/features/performance-monitoring) - Monitor latency and errors - [Cost Tracking](/features/cost-tracking) - Understand your spending - [MCP Analytics](/features/mcp-analytics) - MCP-specific insights --- ## Request Metadata Source: https://docs.requesty.ai/features/request-metadata.md > Add custom metadata to your API calls for powerful analytics ## What is Request Metadata? Request Metadata allows you to enhance your API calls with custom data that enables powerful analytics and tracking. By adding metadata to your requests, you can: - Track user interactions across sessions - Group requests by custom tags - Associate requests with specific workflows - Add business context to your API usage ## How It Works 1. Use the standard OpenAI client with Requesty's base URL 2. Add the `extra_body` parameter with your metadata 3. View and analyze this data in your Requesty dashboard ```python requesty_api_key = "YOUR_REQUESTY_API_KEY" # Safely load your API key client = openai.OpenAI( api_key=requesty_api_key, base_url="https://router.requesty.ai/v1" ) # Add metadata via the extra_body parameter response = client.chat.completions.create( model="openai/gpt-4o", messages=[{"role": "user", "content": "Your prompt here"}], extra_body={ "requesty": { "tags": ["workflow-a", "product-page"], "user_id": "user_1234", "trace_id": "session_abc123", "extra": { "country": "canada", "prompt_title": "product description generator", "tier": "premium" } } } ) ``` ## Key Metadata Fields ### Core Fields - **tags**: Array of strings for grouping related requests - **user_id**: Identifier for the end user making the request - **trace_id**: Unique identifier to track related requests in a workflow ### Extra Context The `extra` object can include any custom fields relevant to your business: - **country**: User's location for geographic analysis - **prompt_title**: Descriptive name of the prompt's purpose - **tier**: User's subscription level - **language**: Preferred language of the user - **application**: Source application or feature ## Benefits - **User Journey Analysis**: Track how users interact with AI across sessions - **Cost Attribution**: Assign AI usage costs to specific business units - **Performance Optimization**: Identify which prompts perform best for specific uses - **Workflow Visualization**: See how multiple API calls connect in complex processes ## Implementation Examples ### Python Example ```python requesty_api_key = "YOUR_REQUESTY_API_KEY" # Safely load your API key # Initialize client client = openai.OpenAI( api_key=requesty_api_key, base_url="https://router.requesty.ai/v1" ) # Make request with metadata response = client.chat.completions.create( model="openai/gpt-4o", messages=[{"role": "user", "content": "Generate a product description for a coffee maker"}], extra_body={ "requesty": { "tags": ["product-content", "e-commerce"], "user_id": "merchant_5678", "trace_id": "workflow_product_launch_123", "extra": { "country": "usa", "prompt_title": "product description", "department": "marketing", "product_category": "kitchen_appliances" } } } ) print(response.choices[0].message.content) ``` ### Node.js Example ```javascript // Load environment variables dotenv.config(); const REQUESTY_API_KEY = process.env.REQUESTY_API_KEY; // Initialize OpenAI client const openai = new OpenAI({ apiKey: REQUESTY_API_KEY, baseURL: 'https://router.requesty.ai/v1', }); async function generateWithMetadata() { try { const response = await openai.chat.completions.create({ model: 'openai/gpt-4o', messages: [{ role: 'user', content: 'Write a blog intro about AI productivity tools' }], requesty: { tags: ['content-creation', 'blog'], user_id: 'editor_9012', trace_id: 'article_draft_456', extra: { country: 'uk', prompt_title: 'blog intro', content_type: 'educational', target_audience: 'technical', }, }, }); console.log(response.choices[0].message.content); } catch (error) { console.error('Error:', error); } } generateWithMetadata(); ``` > **Tip:** For consistent analytics, establish naming conventions for your tags and metadata fields across your organization. --- ## Request Feedback Source: https://docs.requesty.ai/features/request-feedback.md > Add user feedback to your API calls after they are completed ## What is Request Feedback? Request Feedback allows you to enrich your API calls with user feedback and other data *after* the initial request has been completed. This is useful for gathering insights on the quality of the model's response, which can be used for analytics, auditing, and improving the user experience. With this feature, you can: - Capture user ratings and comments on AI responses. - Track which responses were helpful or unhelpful. - Add contextual data from your platform after the fact. - Build a feedback loop to fine-tune models and prompts. ## Benefits - **Quality Monitoring**: Continuously track the performance and quality of your AI models. - **User Satisfaction**: Understand what your users think about the AI responses they receive. - **Data-Driven Improvements**: Use feedback data to identify areas for improvement in your prompts, models, or workflows. - **Enhanced Auditing**: Add context to requests for better auditing and analysis. **Tip**: Standardize your feedback data structure (e.g., ratings, tags) to make it easier to analyze across your applications. ## How It Works 1. After a chat completion, you get an ID in the response. 2. Use this ID to send a `POST` request to the Requesty feedback endpoint. 3. Include your feedback data in the JSON payload. 4. View and analyze this feedback in your Requesty dashboard. **Important notes** - You can POST feedback multiple times per request. - Every subsequent call merges the new values. - If a new feedback call contains an existing key, the new value overwrites the existing one. ## Python Example Here's how you can send feedback after a chat completion call: ```python requesty_api_key = [SAFELY LOAD YOUR API KEY...] # Assume client is an initialized OpenAI client pointed at Requesty client = openai.OpenAI(api_key=requesty_api_key, base_url="https://router.requesty.ai/v1") # 1. Make the initial request response = client.chat.completions.create( model="openai/gpt-4o", messages=[{"role": "user", "content": "Your prompt here"}], ) # 2. Get the unique ID from the response request_id = response.id # 3. Send feedback to the Requesty API feedback_url = f"https://api.requesty.ai/feedback/{request_id}" feedback_headers = { "Authorization": f"Bearer {requesty_api_key}", "Content-Type": "application/json" } feedback_data = { "data": { "message": "The response was very accurate and helpful.", "rating": 5, "helpful": True, "user_id": "user_1234", "tags": ["customer-support", "positive-feedback"] } } try: feedback_response = requests.post( feedback_url, headers=feedback_headers, json=feedback_data, ) feedback_response.raise_for_status() # Raises an HTTPError for bad responses (4xx or 5xx) print("Feedback submitted successfully!") except requests.exceptions.RequestException as e: print(f"Failed to submit feedback: {e}") ``` ## Node.js Example Here's how you can send feedback using Node.js. ```javascript // Load environment variables dotenv.config(); const REQUESTY_API_KEY = process.env.REQUESTY_API_KEY; const ROUTER_BASE_URL = 'https://router.requesty.ai/v1'; const FEEDBACK_BASE_URL = 'https://api.requesty.ai/feedback/'; // Initialize OpenAI client const client = new OpenAI({ apiKey: REQUESTY_API_KEY, baseURL: ROUTER_BASE_URL, }); async function generateWithFeedback() { try { const response = await client.chat.completions.create({ model: "anthropic/claude-3-7-sonnet-latest", messages: [ { role: "user", content: "What is AES?" } ] }); const requestId = response.id; // Send feedback POST request const feedbackUrl = FEEDBACK_BASE_URL + requestId; const headers = { "Authorization": `Bearer ${REQUESTY_API_KEY}`, "Content-Type": "application/json" }; const feedbackData = { data: { message: "Test feedback message", rating: 5, helpful: true } }; const feedbackResponse = await fetch(feedbackUrl, { method: 'POST', headers: headers, body: JSON.stringify(feedbackData) }); if (feedbackResponse.ok) { console.log("Feedback sent successfully"); } else { console.error("Failed to send feedback:", feedbackResponse.statusText); } } catch (error) { console.error('Error:', error); } } generateWithFeedback(); ``` --- ## Session Reconstruction Source: https://docs.requesty.ai/features/session-reconstruction.md > Automatic session reconstruction # Automatic Session Reconstruction Understanding how users interact with your LLM applications is key to improving them. A crucial part of this is analyzing entire conversations or "sessions." However, tracking sessions usually requires you to add a unique `session_id` to every API request, which can be a hassle to implement and maintain. Requesty's gateway removes this burden with **Automatic Session Reconstruction**. You can send your LLM interaction data to us as-is, and we will automatically group related interactions into coherent sessions for you. ## What It Means For You - **Zero Implementation Effort**: You don't need to modify your application code to generate or manage session IDs. Simply send us the interaction data, and we'll handle the rest. - **Accurate Conversation Tracking**: Get a clear view of the entire user journey or your agentic flow, from the first prompt to the final response. - **Powerful Analytics**: With sessions correctly identified, you can analyze conversation length, user engagement, topic flow, and other critical metrics that depend on understanding the full context of an interaction. ## How It Works Our system intelligently analyzes the content of the messages in each interaction and automatically identifies if it's a part of an existing session. For example, if a user starts a conversation: 1. **User's first turn:** - `system`: "You are a helpful assistant." - `user`: "What is the capital of France?" - `assistant`: "The capital of France is Paris." Our service sees this is the start of a new conversation and assigns it a new session ID internally. If the user continues the conversation: 2. **User's second turn:** - `system`: "You are a helpful assistant." - `user`: "What is the capital of France?" - `assistant`: "The capital of France is Paris." - `user`: "What is its population?" - `assistant`: "The population of Paris is over 2 million." Our service recognizes that this new interaction contains the complete history of the first one, plus a new question and answer. It automatically identifies it as part of the **same session** and links it to the previous interaction. This process allows us to reconstruct the entire conversation thread reliably, without requiring any session management on your end.ß --- ## Key Management API Source: https://docs.requesty.ai/features/key-management-api.md > Manage your API key via an API # API Key Management **Enterprise Feature** Programmatically manage your organization's API keys using the Requesty API Key Management API. Create, monitor, configure, and delete API keys with code, just like you would from the Requesty console. ## What is API Key Management? The API Key Management feature allows enterprise customers to automate their API key lifecycle management through a RESTful API. Instead of manually managing keys through the web console, you can integrate key management directly into your workflows and systems. With this feature, you can: - Create new API keys with custom permissions and spending limits - Monitor API key usage and spending in real-time - Update monthly spending limits programmatically - Delete unused or compromised keys instantly - Retrieve comprehensive usage analytics for any date range ## Benefits - **Automation**: Integrate API key management into your CI/CD pipelines and infrastructure automation - **Security**: Programmatically rotate keys and manage permissions at scale - **Cost Control**: Set and update spending limits across all your API keys - **Monitoring**: Track usage patterns and spending across your organization - **Compliance**: Maintain audit trails and enforce governance policies **Tip**: Use descriptive names for your API keys and standardize your naming convention to make management easier across teams. ## Prerequisites To use the API Key Management endpoints, you need: 1. An enterprise Requesty account 2. An API key with **manage permissions** (read/write access) 3. The manage permission allows you to call all API key management endpoints ## API Reference ### Base URL ``` https://api.requesty.ai ``` ### Endpoints | Method | Endpoint | Required Permission | Description | | -------- | ------------------------------ | ------------------- | -------------------- | | `GET` | `/v1/manage/apikey` | READ | List all API keys | | `POST` | `/v1/manage/apikey` | WRITE | Create new API key | | `GET` | `/v1/manage/apikey/{id}` | READ | Get API key usage | | `DELETE` | `/v1/manage/apikey/{id}` | WRITE | Delete API key | | `POST` | `/v1/manage/apikey/{id}/limit` | WRITE | Update monthly limit | --- ## User Management Source: https://docs.requesty.ai/features/users.md > Manage organization members, set spending limits, and track user activity User Management in Requesty allows administrators to manage organization members, control spending, and track user activity across your organization. ## Core Capabilities ### User Listing & Overview View all organization users with comprehensive sorting and filtering options to manage your team effectively. **Key Features:** - Complete user directory with activity status - Sorting and filtering capabilities - Bulk selection for group operations - Real-time spending and limit tracking ### User Status Types ### Active Users **Active Status**: User has a private context and can make API calls - Currently able to use Requesty services - Has established API access - Appears with active indicator in user list ### Inactive Users **Inactive Status**: User exists in organization but no private context yet - Organization member but hasn't started using services - No API access established yet - Shown with inactive indicator ## Admin Controls ### Bulk Operations Efficiently manage multiple users simultaneously: - **Bulk Group Assignment**: Select multiple users and add them to groups - **Default Limits**: Set monthly spending limits for new users - **Mass Updates**: Apply changes across selected users ### Individual User Controls **Spending Management:** - Set individual monthly spending limits per user - Track current month spending vs limits - View spending history and trends **Group Management:** - View all groups user belongs to (displayed as badges) - Add/remove users from specific groups - Manage group-based access controls **Activity Monitoring:** - See user's current activity status - Track API usage patterns - Monitor context and session data ## Organization Configuration ### Global Settings - **Default Monthly Limit**: Set automatic spending limits for new organization members - **Organization Policies**: Configure global rules and restrictions - **Access Controls**: Manage organization-wide permissions ### User-Group Relationship The user management system integrates closely with groups and API keys: ```mermaid graph TD A[Organization] --> B[Users] A --> C[Groups] B --> D[Monthly Limits] B --> E[Activity Status] C --> F[API Keys] F --> G[Features] B -.-> C C -.-> H[Spending Tracking] ``` **How It Works:** 1. **Users** are organization members with individual spending limits 2. **Groups** are collections of users for easier management 3. **API Keys** can be associated with groups/users for access control 4. **Features** on API keys affect what users in those groups can do ## Example Workflow Here's a typical admin workflow for managing users: **Create Group**Admin creates an "Engineering" group for development team **Add Users**Add engineers (users) to the Engineering group **Configure API Keys**Create API key with specific features enabled for the group **Set Limits**Configure monthly spending limits for users **Monitor Usage**Track spending per user and per group ## Spending Control & Monitoring ### Monthly Limits - Set individual spending limits per user - Configure default limits for new users - Automatic alerts when approaching limits - Spending cutoffs when limits are reached ### Usage Tracking - Real-time spending monitoring - Monthly spend vs limit comparisons - Historical usage patterns - Group-level spending aggregation > **Note:** User spending limits help control costs while group assignments enable efficient access management and feature distribution. ## Best Practices Set reasonable monthly limits and monitor usage patterns to prevent unexpected costs Organize users into logical groups that match your team structure and access needs Review user activity and spending regularly to optimize your organization's usage Use group-based API key management for secure and scalable access control --- ## Groups Management Source: https://docs.requesty.ai/features/groups.md > Organize users, track group spending, and manage collective access to API keys and features Groups in Requesty allow administrators to organize organization members into logical units, track spending collectively, and manage user access efficiently. ## Groups System Overview Groups provide a powerful way to organize your organization's users and manage their access to Requesty's features and API keys. ### What Groups Do Organize organization members into logical units like Marketing, Engineering, Sales, etc. Track monthly spending per group with aggregated spend monitoring Allow admins to manage user access to API keys and features collectively Reflect your company's organizational structure in your Requesty setup ## How Groups Work ### Creating Groups Administrators can create named groups that match your organizational structure: - **Department Groups**: Engineering, Marketing, Sales, Support - **Project Groups**: Product A Team, Research Division, Beta Testers - **Function Groups**: Admins, Developers, Content Creators - **Custom Groups**: Any logical grouping that fits your needs ### Member Management **Adding Members:** - **Drag & Drop**: Intuitive interface for moving users between groups - **Dialog-Based**: Select multiple users and assign to groups - **Bulk Operations**: Add many users to groups simultaneously **Removing Members:** - Easy removal of users from groups - View group membership at a glance - Track who belongs to which groups ### Spending Tracking Groups provide powerful spending insights: - **Monthly Aggregation**: See total spending across all group members - **Trend Analysis**: Track spending patterns over time - **Budget Management**: Set and monitor group-level spending goals - **Cost Allocation**: Understand which teams drive API usage ## Group Structure & Organization ### Hierarchical Organization ```mermaid graph TD A[Organization] --> B[Engineering Group] A --> C[Marketing Group] A --> D[Sales Group] B --> E[Backend Engineers] B --> F[Frontend Engineers] B --> G[DevOps Engineers] C --> H[Content Team] C --> I[Design Team] D --> J[Sales Reps] D --> K[Sales Engineers] ``` ### Flexible Membership - Users can belong to multiple groups - Cross-functional team support - Project-based temporary groups - Role-based permanent groups ## Integration with Features & API Keys ### Access Control Flow **Create Groups**Organize users into logical groups (Engineering, Marketing, etc.) **Configure API Keys**Create API keys with specific features and policies **Assign Access**Associate API keys with groups to grant access **Monitor Usage**Track group spending and feature usage ### API Key → Group Relationship **How It Works:** - API Keys can be associated with specific groups - Groups determine which users can access which keys - Features on API keys apply to all group members - Policies control model access and behavior for the group **Example Configuration:** ```yaml Engineering Group: - API Key: 'eng-prod-key' - Features: [streaming, structured-outputs, reasoning] - Models: [gpt-4, claude-3] - Members: [alice@company.com, bob@company.com] Marketing Group: - API Key: 'marketing-key' - Features: [prompt-library, auto-caching] - Models: [gpt-3.5-turbo] - Members: [carol@company.com, dave@company.com] ``` ## Admin Panel Workflow ### Complete Group Management Process ### Setup Phase **Initial Configuration:** 1. Create groups for different teams/departments 2. Add organization members to appropriate groups 3. Configure API keys with specific features and policies 4. Set group-level spending monitoring ### Daily Operations **Ongoing Management:** - Monitor group spending and usage through analytics - Add new users to existing groups - Adjust API key features based on group needs - Track which groups are most active ### Optimization **Performance Tuning:** - Analyze spending patterns per group - Optimize feature assignments - Adjust group structures as teams evolve - Review and update access controls ## Key Relationships Understanding how groups fit into the broader Requesty ecosystem: ### System Architecture - **Groups** = User Organization (who can access what) - **Features** = API Key Enhancement (how keys behave) - **Policies** = Model Access Control (which models, fallbacks, load balancing) - **Users** = Individual organization members with spending limits ### Integration Points **Groups ↔ Users:** - Groups contain multiple users - Users can belong to multiple groups - Group membership determines API access **Groups ↔ API Keys:** - API keys can be assigned to groups - All group members can use assigned keys - Features on keys apply to all group users **Groups ↔ Spending:** - Group spending is aggregation of member spending - Useful for departmental budget tracking - Helps identify high-usage teams ## Best Practices **Recommended Approaches:** - Align groups with your actual team structure - Create both permanent (department) and temporary (project) groups - Use descriptive names that make sense to all admins - Plan for growth - create scalable group structures {' '} **Security & Control:** - Regularly review group memberships - Remove users from groups when they change roles - Use principle of least privilege for API key assignments - Monitor group spending to detect unusual usage **Cost Control:** - Set realistic spending expectations per group - Monitor trends to predict future usage - Use group data to allocate budgets appropriately - Identify opportunities for feature optimization > **Note:** Groups provide fine-grained control where admins can organize users logically while configuring sophisticated API key behaviors through features and policies. ## Advanced Features ### Analytics & Reporting - Group spending trends over time - Feature usage by group - Model preference analysis per group - Cost efficiency metrics ### Automation Options - Auto-assign new users to default groups - Spending alerts at group level - Usage-based group recommendations - Integration with external team management tools The groups system enables sophisticated organization management while maintaining simplicity for day-to-day operations. --- ## Spending Alerts Source: https://docs.requesty.ai/features/alerts.md > Monitor spending across your organization with configurable alert thresholds and webhook notifications Requesty's spending alerts let you monitor costs across your organization in real time. Define threshold rules for users, groups, or the organization as a whole, and receive instant notifications via webhook whenever a threshold is crossed. ## How Alerts Work Every time a request is processed, Requesty evaluates the updated spend against your configured alert thresholds. When a threshold is crossed for the first time, a notification is dispatched to your webhook endpoint. Alerts fire exactly once per threshold crossing — they will not repeat until the threshold is crossed again (e.g. in a new billing cycle). ```mermaid graph LR A[API Request] --> B[Balance Updated] B --> C{Threshold Crossed?} C -- Yes --> D[Send Webhook] C -- No --> E[No Action] ``` ## Alert Types Requesty supports four types of spending alerts, each designed for a different monitoring scenario: Triggers when a user's monthly spend reaches a percentage of their configured budget. Applies to both global user budgets and group-based user budgets. Triggers when a user's monthly spend reaches a specific dollar amount. Applies to both global user spend and group-based user spend. Works even if no budget limit is set. Triggers when a group's combined monthly spend reaches a percentage of the group's budget. Triggers when the organization's remaining balance drops below a specific dollar amount — useful for ensuring you top up before running out of credits. ### Alert Type Examples | Alert Type | Threshold Unit | Example | When It Fires | |---|---|---|---| | User % of Budget | Percentage (0–100%) | 80% | User has spent 80% of their monthly limit | | User Absolute Spend | Dollar amount | $50 | User has spent $50 this month (globally or within a group) | | Group % of Budget | Percentage (0–100%) | 90% | Group has spent 90% of its monthly limit | | Org Balance Below | Dollar amount | $100 | Organization balance drops below $100 | > **Info:** You can create multiple thresholds per alert type. For example, set User % of Budget alerts at both 50% and 80% to get an early warning and a critical warning. ## Setting Up Alerts ### Step 1: Configure a Webhook Before alerts can be delivered, you need to configure a webhook endpoint for your organization. 1. Go to the [Admin Panel](https://app.requesty.ai/admin-panel) and navigate to the **Alerts** tab 2. Click **Add Webhook** in the header 3. Choose a webhook type: - **JSON (Generic)** — sends a structured JSON payload to any HTTP endpoint - **Slack** — sends a pre-formatted message to a Slack incoming webhook URL 4. Enter your webhook URL 5. Click **Save** > **Warning:** Alerts will not be sent unless a valid webhook URL is configured. If you create alert thresholds without a webhook, they will be stored but no notifications will be delivered. ### Step 2: Create Alert Thresholds 1. In the Alerts tab, click **Add Alert** 2. Select an alert type from the dropdown 3. Enter the threshold value: - For percentage-based alerts, enter a number between 1 and 100 - For dollar-based alerts, enter the dollar amount 4. Click **Create Alert** You can create as many thresholds as needed across all alert types. ## Webhook Payload Formats ### JSON Webhook When the webhook type is set to **JSON**, alerts are delivered as a structured JSON payload via HTTP POST: ```json User % of Budget { "type": "user.budget.exceeded_percent", "data": { "user_email": "alice@company.com", "limit": "100", "percentage_exceeded": "0.8" } } ``` ```json User % of Budget (in Group) { "type": "user.budget.exceeded_percent", "data": { "user_email": "alice@company.com", "group_name": "Engineering", "limit": "50", "percentage_exceeded": "0.8" } } ``` ```json User Absolute Spend { "type": "user.budget.exceeded_absolute", "data": { "user_email": "alice@company.com", "absolute_exceeded": "50" } } ``` ```json User Absolute Spend (in Group) { "type": "user.budget.exceeded_absolute", "data": { "user_email": "alice@company.com", "group_name": "Engineering", "absolute_exceeded": "50" } } ``` ```json Group % of Budget { "type": "group.budget.exceeded_percent", "data": { "group_name": "Engineering", "group_admins": [ { "email": "admin@company.com" } ], "limit": "500", "percentage_exceeded": "0.9" } } ``` ```json Org Balance Below { "type": "org.balance.below_absolute", "data": { "org_id": "org_abc123", "balance_threshold": "100", "current_balance": "87.50" } } ``` ### Slack Webhook When the webhook type is set to **Slack**, alerts are delivered as Slack-formatted messages. Each alert type produces a human-readable notification: - **User % of budget alerts**: _"User Spend Threshold Exceeded — alice@company.com exceeded 80% of their $100 monthly limit"_ (includes group name when triggered within a group) - **User absolute spend alerts**: _"User Spend Threshold Exceeded — alice@company.com exceeded $50 spend threshold"_ (includes group name when triggered within a group) - **Group alerts**: _"Group Spend Threshold Exceeded — Engineering exceeded 90% of the $500 monthly limit"_ (includes group admin emails) - **Org alerts**: _"Organization Balance Alert — Organization balance has dropped below $100 threshold. Current balance: 87.50"_ ## Webhook Delivery Requesty's webhook dispatcher ensures reliable delivery: - **Retries**: Failed deliveries are retried up to **3 times** with exponential backoff - **Timeout**: Each delivery attempt times out after **15 seconds** ## Managing Alerts ### Viewing Active Alerts All configured alert thresholds are visible in the **Alerts** tab of the [Admin Panel](https://app.requesty.ai/admin-panel). The table shows: - Alert type and description - Threshold value - Status (active) ### Deleting Alerts To remove an alert threshold, click the delete icon next to the alert in the table and confirm the deletion. ### Updating the Webhook Click **Edit** next to the webhook display in the Alerts header to change the webhook type or URL. Updating the webhook takes effect immediately for all future alert notifications. ## Example Configurations - **Org Balance Below $50** — get notified before credits run out - **User Absolute Spend $20** — catch unexpectedly high individual usage - **Webhook**: Slack channel `#billing-alerts` - **User % of Budget at 50%, 80%, 95%** — progressive warnings as users approach limits - **Group % of Budget at 80%, 95%** — monitor departmental spend - **Org Balance Below $500, $200** — early and critical low-balance warnings - **User Absolute Spend $100, $500** — catch runaway usage regardless of budget - **Webhook**: JSON endpoint integrated with internal alerting system (e.g., PagerDuty, Opsgenie) - **User Absolute Spend $25, $50** — track partner usage by dollar amount - **Org Balance Below $100** — ensure account stays funded - **Webhook**: JSON endpoint for automated processing ## Related Features Set hard spending caps at the user, project, or API key level to enforce budgets automatically. Organize users into groups with shared budgets and collective spending tracking. View detailed spending breakdowns and usage trends across your organization. Monitor real-time costs per request, model, and user. --- ## Approved Models Source: https://docs.requesty.ai/features/approved-models.md > Control which AI models your organization can access with centralized approval management Approved Models provide organization-level control over which AI models your team members can access, ensuring compliance, cost control, and strategic model usage across your organization. ## Overview Approved Models create a curated whitelist of AI models that your organization's members can use, separate from the full catalog of available models. This enterprise feature gives administrators complete control over model access while providing transparency around capabilities and costs. ### What Approved Models Do Control which models are available to your organization members Monitor and control spending by limiting model selection Ensure only approved models are used for sensitive workflows Align model usage with organizational AI strategy ## How Approved Models Work **All Models View:** - Complete catalog of hundreds of available models - Full provider ecosystem (OpenAI, Anthropic, Azure, etc.) - Available for admin review and selection **Enabled Models:** - Default models available to use for organization members - Immediately accessible unless restricted by group settings **Available Models:** - Curated list of models that group admins can choose to enable for their groups - Not enabled for use organization-wide - Group admins can enable models from both the Enabled and Available lists for their specific groups - Provides flexibility for teams with different model requirements ### Model Information Display Each model in the system shows comprehensive details: - **Provider**: OpenAI, Anthropic, Azure, Google, etc. - **Location**: Geographic region (us-east-1, eu-west-1) or Global - **Capabilities**: Caching, Reasoning, Training data usage policies - **Pricing**: Input/output token costs per million tokens - **Context Window**: Maximum token limit for requests - **Model ID**: Structured format like `openai/gpt-4` or `anthropic/claude-3-sonnet` ## Admin Model Management ### Managing Enabled and Available Models The Models Management tab provides two separate lists for organization-level model control: **Enabled Models Tab:** - Models that all organization members can use by default if not restricted by group settings **Available Models Tab:** - Models that can be enabled by group admins for their specific groups - Not enabled organization-wide by default - Group admins can choose models from both Enabled and Available lists when configuring their groups - Provides granular control for teams with different requirements ### Adding Models **Access Admin Panel**Navigate to Admin Panel → Models Management tab **Choose List Type**Select either "Enabled Models" or "Available Models" tab **Add New Model**Click "Add Model" to start the approval process **Select Provider**Choose from available providers (OpenAI, Anthropic, etc.) **Choose Location**Select geographic region or Global deployment **Pick Model**Select specific model from filtered list based on provider/location **Confirm Addition**Model gets added to the selected list (Enabled or Available) ### Model Management Interface **Searchable Model Tables:** - View all models in each list (Enabled or Available) - Search and filter by provider, capabilities, or pricing - Sort by various criteria (cost, context window, etc.) - Quick access to model details and removal options **Model Removal:** - Easy removal of models from either list - Immediate effect on organization and group access - Models cannot be removed if currently enabled in any group - Audit trail of approval changes ## User Experience ### Model Selection for Users **Default Behavior:** - Users see only approved models by default - Clean, curated list for everyday use - No access to non-approved models in API policies **Optional All Models View:** - Users can toggle to see full catalog - Helpful for understanding available options - Cannot actually use non-approved models ### API Key Integration Approved Models integrate seamlessly with your existing API key system: ```mermaid graph TD A[Admin Adds Models] --> B[Enabled Models List] A --> C[Available Models List] B --> D[Organization-Wide Access] B --> E[Group Model Selection] C --> E E --> F[Group-Level Access] B --> G[API Key Policies] F --> G G --> H[User Access] I[All Models Catalog] -.-> A J[Group Permissions] --> H K[Individual Limits] --> H ``` **Access Flow:** 1. Organization admin adds models to either Enabled or Available lists 2. Enabled models are immediately available to all organization members 3. Group admins can enable models from both the Enabled and Available lists for their specific groups 4. Users can only select from enabled models (organization-wide or group-specific) when configuring API keys 5. API requests restricted to the enabled model set ## Cost and Usage Control ### Spending Visibility **Organization-Level Tracking:** - Monitor spending per approved model - Track usage patterns across teams - Identify high-cost models and usage trends **Budget Planning:** - Understand cost implications before approval - Plan model strategy based on pricing tiers - Optimize model mix for cost efficiency ### Strategic Benefits **Regulatory Control:** - Ensure only compliant models are used for sensitive data - Maintain audit trails of model approvals - Control data processing locations through geographic model selection - Implement consistent AI governance policies **Financial Management:** - Prevent usage of expensive models without approval - Track and allocate costs per model type - Optimize model selection for different use cases - Plan budgets based on approved model pricing **Capability Management:** - Standardize on proven, high-performing models - Control access to experimental or beta models - Ensure consistent quality across organization - Plan model upgrades and migrations strategically ## Future Expansions ### Project-Level Controls The Approved Models system is designed to expand with more granular controls: - **Per-Project Approval**: Different approved models for different projects - **Role-Based Access**: Different model tiers based on user roles - **Temporal Controls**: Time-limited model approvals for testing ### Advanced Features Coming Soon **Enhanced Management:** - Bulk model approval workflows - Model approval requests from users - Automatic model updates and notifications - Integration with external approval systems ## Best Practices Begin with a small set of well-tested models and expand based on proven needs Regularly review which approved models are actually being used by your teams Consider both capability and cost when approving models for organization use Periodically review and update your approved models list as new models become available ## Integration with Other Features ### Works With Users & Groups - **User Management**: Individual users see only enabled models (organization-wide or group-specific) - **Group Controls**: Group admins can enable models from both the Enabled and Available lists for their groups - **Model Selection**: Groups can only enable models that are in the organization's [Enabled or Available lists](#managing-enabled-and-available-models) - **Flexible Configuration**: Enabled models are available by default, but group admins can also choose from Available models or restrict Enabled models at the group level - **Spending Limits**: User and group spending limits apply to enabled model usage ### API Key Policies - Only enabled models appear in policy configuration - Models must be enabled at the organization or group level to be used - Fallback chains can only use enabled models - Load balancing limited to enabled model set > **Note:** Approved Models provide the foundation for sophisticated AI governance while maintaining ease of use for your organization's day-to-day operations. The Approved Models system ensures your organization maintains control over AI model access while enabling teams to work effectively with pre-approved, cost-effective model choices. --- ## Guardrails Source: https://docs.requesty.ai/features/guardrails.md > Enterprise-grade security filters that automatically detect and block sensitive information in AI requests and responses Guardrails provide organization-level security filters that automatically detect and mask sensitive information in AI requests and responses, acting as a protective layer to prevent data leaks and maintain compliance. ## Overview Guardrails offer enterprise-grade data protection that automatically prevents sensitive information from being exposed through AI interactions. This bidirectional security system scans both incoming requests and outgoing responses to ensure compliance and data safety. ### What Guardrails Protect Automatically detect and mask sensitive data before it reaches AI models Meet GDPR, PCI DSS, SOC 2, and other regulatory requirements Prevent accidental exposure of credentials, financial data, and personal information Apply consistent security policies across all API keys and models ## Available Guardrail Types ### Security Categories ### Personal Data **PII (Personally Identifiable Information)** - Social Security Numbers - Email addresses and phone numbers - Names and personal identifiers - GDPR compliance protection ### Credentials & Secrets **Secret Keys Detection** - API keys and tokens - Database credentials - Authentication secrets - Service account keys ### Financial Information **PCI (Payment Card Information)** - Credit card numbers - Card verification codes - Cardholder data **Banking Information** - Account numbers - Routing numbers - Bank identifiers **Financial Data** - Investment details - Financial statements - Trading information ## How Guardrails Work ### Security Flow Process ```mermaid graph TD A[User API Request] --> B[Input Scanning] B --> C{Sensitive Data
Detected?} C -->|Yes| D[Mask Sensitive Data] C -->|No| E[Forward to AI Model] D --> E E --> F[AI Response] F --> G[Output Scanning] G --> H{Response Contains
Sensitive Data?} H -->|Yes| I[Mask Response Data] H -->|No| J[Return Clean Response] I --> J ``` ### Processing Steps **Request Received**User makes API request through any organization API key **Input Scanning**Guardrails scan request content for sensitive data patterns **Data Masking**If sensitive data detected, it's automatically masked before processing **Model Processing**Requests with masked data proceed to AI model for processing **Output Scanning**Guardrails scan AI response for any sensitive information **Response Masking**Sensitive data in responses is masked before returning to user ## Admin Management ### Guardrail Configuration **Access Control:** - Navigate to Admin Panel → Guardrails tab - Real-time toggle switches for each guardrail type - Immediate organization-wide application - Success/error feedback for configuration changes **Available Controls:** **Toggle to Enable/Disable:** - Personally Identifiable Information detection - Email addresses, phone numbers, SSNs - GDPR compliance scanning - Personal name and identifier blocking **Toggle to Enable/Disable:** - API key and token detection - Database credential scanning - Service account key protection - Authentication secret blocking **PCI Compliance:** - Credit card number detection - Payment card verification codes - Cardholder data protection **Banking Information:** - Account number scanning - Routing number detection - Bank identifier protection **General Financial:** - Investment data blocking - Financial statement protection - Trading information security ### Configuration Management **Real-Time Updates:** - Changes apply organization-wide immediately - No restart or downtime required - Instant activation/deactivation of security rules - Visual confirmation of configuration changes ## Protection Scope ### Comprehensive Coverage **All API Keys:** - Guardrails apply across every API key in the organization - No exceptions or bypass mechanisms - Consistent security regardless of key configuration **All Models:** - Works with any approved model (OpenAI, Anthropic, Azure, etc.) - Provider-agnostic security implementation - Universal protection across model types **All Endpoints:** - Chat completion requests - Text generation endpoints - Streaming responses - Any AI interaction endpoint **Bidirectional Security:** - Incoming request scanning - Outgoing response filtering - Complete data flow protection ## Compliance & Use Cases ### Regulatory Compliance PII detection ensures European data protection regulation compliance Payment card data protection meets financial industry standards Security controls support SOC 2 Type II requirements ### Enterprise Protection Scenarios **Data Leak Prevention:** - Automatic detection and masking without manual review - Prevent accidental credential exposure in AI prompts - Mask financial data to protect it from model training - Protect customer personal information in support interactions **Risk Management:** - Organization-wide policy enforcement - Consistent security across all teams and projects - Audit trail for compliance reporting - Automatic threat detection and response **Operational Security:** - Real-time protection during AI interactions - No impact on legitimate use cases - Transparent security that doesn't disrupt workflows - Scalable protection for growing organizations ## Integration with Enterprise Features ### Works with Other Systems **User Management Integration:** - Guardrails apply to all organization users - Individual user activity protected automatically - No per-user configuration required **Group-Based Protection:** - All group members receive same security protection - Group API keys inherit guardrail settings - Consistent security across team structures **Approved Models Compatibility:** - Guardrails work with any approved model - Security maintained regardless of model selection - Protection spans entire approved model catalog ### API Key Policy Integration ```mermaid graph LR A[Guardrails] --> B[API Key] B --> C[Approved Models] C --> D[User Groups] D --> E[Protected Output] F[Security Scan] --> A G[Compliance Rules] --> A ``` **Security Layering:** - Guardrails provide base-level organization security - API key policies add feature-specific controls - User/group permissions manage access levels - Combined system ensures comprehensive protection ## Best Practices ### Configuration Strategy Enable all relevant guardrails from the beginning to establish strong security baseline Review blocked requests to understand common security issues and adjust policies Match guardrail configuration to your industry's specific compliance requirements Periodically review and update guardrail settings as business needs evolve ### Implementation Guidelines **Rollout Strategy:** 1. Enable guardrails in testing environment first 2. Monitor for false positives with sample data 3. Adjust detection sensitivity if needed 4. Deploy to production with monitoring 5. Train teams on security error handling **Ongoing Management:** - Regular compliance audits - Security incident response procedures - Team training on data handling best practices - Integration with existing security workflows ## Error Handling & User Experience ### When Guardrails Trigger **Current Implementation (Data Masking):** - Sensitive data automatically replaced with masked placeholders - Seamless processing with protected information - No workflow interruption for users - Audit logging for security team review **Future Features:** - **Request Blocking**: Option to completely block requests containing sensitive data - **Reverse Mapping**: Ability to unmask data when appropriate for authorized users - **Advanced Filtering**: More granular control over masking vs blocking behavior - **Custom Masking Patterns**: Organization-specific masking rules and formats > **Warning:** Guardrails are designed to err on the side of caution. Some legitimate data may be masked if it contains patterns similar to sensitive information. Organizations should review masking patterns to ensure optimal balance between security and functionality. > **Note:** Guardrails provide the foundation for enterprise AI security, automatically protecting your organization's most sensitive data without requiring manual oversight or complex configuration. The Guardrails system ensures your organization can leverage AI capabilities while maintaining the highest standards of data protection and regulatory compliance. --- ## RBAC (Role-Based Access Control) Source: https://docs.requesty.ai/features/rbac.md > Control user access and visibility across observability, API keys, analytics, and all platform features based on organizational roles Role-Based Access Control (RBAC) provides comprehensive access management across the entire Requesty platform, ensuring users only see and access data appropriate to their organizational role and responsibilities. ## Overview RBAC forms the foundation of enterprise security by controlling what users can see and do across observability, API keys, analytics, logs, and all platform features. This ensures data isolation, security compliance, and appropriate access levels for different organizational roles. ### What RBAC Controls Control which logs, analytics, and observability data users can access Manage who can create, modify, and view different API keys Control access to admin panels, settings, and enterprise features Ensure users only see their own data unless granted broader permissions ## Core RBAC Principles ### Access Control Scope **Platform-Wide Coverage:** - Observability dashboards and metrics - API key creation and management - Log viewing and analytics - User and group management - Billing and usage data - Administrative functions **Data Isolation:** - Individual users see only their own data by default - Admins have organization-wide visibility - Role-based expansion of access permissions - Secure multi-tenant data separation ## Current Role Types ### Standard User Role **Default Access Level:** - **Personal Data Only**: Users see logs, analytics, and metrics for their own API usage - **Own API Keys**: Can create, modify, and view their personal API keys - **Limited Observability**: Access to personal performance metrics and usage data - **Basic Settings**: Manage personal account settings and preferences **What Standard Users See:** - Personal API request logs - Individual usage analytics - Own spending and limit information - Personal session data and context ### Administrator Role **Organization-Wide Access:** - **All User Data**: Complete visibility into organization logs, analytics, and metrics - **Full API Key Management**: Create, modify, and view all organization API keys - **Complete Observability**: Access to organization-wide performance and usage data - **Administrative Functions**: User management, group configuration, enterprise features **What Administrators See:** - All organization API request logs - Organization-wide analytics and trends - All user spending and usage patterns - Complete audit trails and system metrics - Enterprise feature configuration panels ## RBAC Implementation Across Features ### Observability & Analytics ### Standard Users **Personal Dashboard:** - Individual API usage metrics - Personal request/response logs - Own performance analytics - Personal cost tracking - Individual error rates and patterns ### Administrators **Organization Dashboard:** - Organization-wide usage metrics - All user logs and analytics - Complete performance overview - Organization cost analysis - System-wide error tracking and trends ### API Key Management ```mermaid graph TD A[User Login] --> B{Role Check} B -->|Standard User| C[Personal API Keys Only] B -->|Administrator| D[All Organization API Keys] C --> E[Create Personal Keys] C --> F[View Own Usage] C --> G[Manage Own Limits] D --> H[Create Any API Key] D --> I[View All Usage] D --> J[Manage All Keys] D --> K[Set Organization Policies] ``` ### Data Access Patterns **Standard User Data Flow:** 1. User authenticates with platform 2. RBAC filters show only personal data 3. API keys display user's own keys only 4. Analytics show individual usage patterns 5. Logs contain only user's API requests **Administrator Data Flow:** 1. Admin authenticates with elevated permissions 2. RBAC grants organization-wide visibility 3. All API keys and users visible 4. Complete analytics and metrics access 5. Full audit trail and system logs available ## Security & Compliance Benefits ### Data Protection **User Privacy:** - Automatic data isolation between users - Personal information protected from other users - Individual usage patterns kept private - Secure separation of user contexts **Organization Security:** - Administrative oversight with complete visibility - Audit trails for compliance requirements - Centralized security policy enforcement - Role-appropriate access controls ### Compliance Advantages **Regulatory Compliance:** - Clear data access boundaries for audits - Role-based data handling procedures - Documented access control policies - Compliance with privacy regulations **Enterprise Security:** - Principle of least privilege implementation - Regular access review capabilities - Secure multi-tenant architecture - SOC 2 and enterprise compliance support **Operational Oversight:** - Complete audit trails for all access - Role-based activity monitoring - Security incident detection and response - Compliance reporting capabilities ## Integration with Enterprise Features ### Works with Other Systems **User Management Integration:** - User roles determine platform access levels - Individual users automatically isolated - Admin users get organization-wide visibility - Role assignments control feature access **Group-Based Enhancement:** - Groups can have shared visibility permissions - Group admins may see group-specific data - Flexible role assignment within groups - Enhanced collaboration with controlled access **API Key Policy Integration:** - RBAC controls who can create and modify API keys - Role-based API key sharing and management - Permission levels for different key types - Administrative oversight of all organization keys ### Enterprise Feature Access **Feature Visibility Matrix:** | Feature | Standard User | Administrator | | ------------------- | ------------- | --------------- | | Personal Analytics | ✅ Own Data | ✅ All Data | | API Key Creation | ✅ Personal | ✅ Organization | | User Management | ❌ | ✅ | | Group Configuration | ❌ | ✅ | | Approved Models | ❌ | ✅ | | Guardrails Config | ❌ | ✅ | | Billing Overview | ✅ Personal | ✅ Organization | | System Settings | ❌ | ✅ | ## Future Role Expansion ### Custom Roles (Coming Soon) **Planned Role Types:** - **Group Administrators**: Manage specific groups with limited admin access - **Read-Only Analysts**: View organization data without modification permissions - **API Key Managers**: Specialized role for API key creation and management - **Billing Administrators**: Financial oversight without technical admin access **Custom Permission Sets:** - Granular permission assignment - Mix-and-match capability access - Department-specific role creation - Project-based access controls ### Advanced RBAC Features **Enhanced Capabilities:** - Time-based role assignments - Conditional access based on usage patterns - Integration with external identity providers - Advanced audit and compliance reporting ## Best Practices ### Role Assignment Strategy Begin with standard user roles and promote to admin only when necessary Periodically review role assignments and adjust based on organizational changes Monitor admin access patterns and maintain audit trails for compliance Maintain clear documentation of who has admin access and why ### Security Implementation **Access Management:** - Limit admin roles to essential personnel only - Regular access reviews and role updates - Clear escalation procedures for access requests - Integration with existing identity management systems **Monitoring & Compliance:** - Log all administrative actions - Monitor for unusual access patterns - Regular compliance assessments - Incident response procedures for access violations ## User Experience ### For Standard Users **Simplified Interface:** - Clean, focused view of personal data - No overwhelming organization-wide information - Intuitive access to personal features - Clear visibility into own usage and costs ### For Administrators **Comprehensive Control:** - Complete organization visibility - Administrative tools and configuration panels - User management and oversight capabilities - Enterprise feature configuration access > **Note:** RBAC ensures that every user has the right level of access for their role while maintaining security and compliance across your organization's AI infrastructure. > **Warning:** Administrator roles have significant access to organization data and settings. Carefully manage admin role assignments and regularly review access permissions to maintain security. The RBAC system provides the security foundation that enables safe, compliant, and efficient AI operations across your entire organization while ensuring appropriate data visibility and access control for all users. --- ## Claude Code Source: https://docs.requesty.ai/integrations/claude-code.md > Set the ANTHROPIC_BASE_URL and ANTHROPIC_AUTH_TOKEN environment variables to route Claude Code through Requesty and use 300+ models Claude Code is Anthropic's powerful AI coding assistant that works directly in your terminal and IDE. You connect it to Requesty with two environment variables, `ANTHROPIC_BASE_URL` and `ANTHROPIC_AUTH_TOKEN`, set either in `~/.claude/settings.json` or in your shell. Using the Requesty integration, you can: - Use 300+ models while coding, giving you flexibility to choose the best model for each task. - Track and manage your spend in a single location - Keep a record of your conversations # Configuration ## 1. (Recommended) Set ANTHROPIC_BASE_URL and ANTHROPIC_AUTH_TOKEN in settings.json You can configure Claude Code using the settings file. You will only have to edit it once. Create or edit your settings file: `~/.claude/settings.json` And set the `ANTHROPIC_BASE_URL` and `ANTHROPIC_AUTH_TOKEN` values inside the `env` block: ```json { "env": { "ANTHROPIC_BASE_URL": "https://router.requesty.ai", "ANTHROPIC_AUTH_TOKEN": "your_requesty_api_key", "ANTHROPIC_MODEL": "anthropic/claude-fable-5" } } ``` ## 2. Quick setup using environment variables You can also integrate Requesty with Claude Code by exporting the `ANTHROPIC_BASE_URL` and `ANTHROPIC_AUTH_TOKEN` environment variables in your shell: 1. **Get Your API Key** Create an API key on the [API Keys Page](https://app.requesty.ai/api-keys). 2. **Set Environment Variables** ```bash export ANTHROPIC_BASE_URL="https://router.requesty.ai" export ANTHROPIC_AUTH_TOKEN="your_requesty_api_key" export ANTHROPIC_MODEL="anthropic/claude-fable-5" ``` > **Info:** We recommend using the `ANTHROPIC_MODEL` environment variable, and not the `/model` directive as Claude might not accept it sometimes. 3. **Run Claude** Run `claude` in your terminal. > **Info:** With this setup, Claude Code will route all requests through Requesty, giving you access to models from OpenAI, Anthropic, Google, Mistral, and many more providers. ### Model Selection You can choose any model from the [Model Library](https://app.requesty.ai/model-list) or any policy configured for your organization. **Standard model IDs** follow the format `provider/model-name`: - `anthropic/claude-fable-5` - `openai/gpt-4o` - `google/gemini-2.0-flash-exp` - `mistral/mistral-large-2411` **Policies** follow the format `policy/policy-name`: - `policy/reliable-sonnet-4-5` ### Claude Code environment variables reference | Variable | Value | Purpose | |----------|-------|---------| | `ANTHROPIC_BASE_URL` | `https://router.requesty.ai` | Points Claude Code at the Requesty gateway instead of Anthropic's API | | `ANTHROPIC_AUTH_TOKEN` | Your Requesty API key | Authenticates with Requesty, used as a Bearer token | | `ANTHROPIC_MODEL` | e.g. `anthropic/claude-fable-5` | Default model, any model ID or `policy/` name from your organization | | `ANTHROPIC_SMALL_FAST_MODEL` | e.g. `anthropic/claude-haiku-4-5` | Model used for background and lightweight tasks | | `ANTHROPIC_CUSTOM_HEADERS` | `X-Requesty-...: value` | Optional analytics headers, set automatically by the analytics wrapper | > **Info:** For EU data residency, set `ANTHROPIC_BASE_URL` to `https://router.eu.requesty.ai` instead. ### Enable 1 Million Context Window Some Claude models support an extended 1 million token context window, including: - `claude-opus-4-6` - `claude-sonnet-4-5` To enable this from Claude Code, append `[1m]` to the model name when setting the model. For example, to use the 1 million context window with Opus 4.6, set the model to: ``` anthropic/claude-opus-4-6[1m] ``` ### Command Line Configuration You can also configure Claude Code using the command line: ```bash # Set the model globally claude config set -g model "anthropic/claude-fable-5" # Set the model for current project only claude config set model "openai/gpt-4o" # Set environment variables globally claude config set -g env.ANTHROPIC_BASE_URL "https://router.requesty.ai" claude config set -g env.ANTHROPIC_AUTH_TOKEN "your_requesty_api_key" ``` ## Benefits of Using Requesty with Claude Code Switch between models from different providers without changing your setup Monitor spending and set limits across all your AI interactions Automatic fallbacks ensure your coding sessions never get interrupted ## Troubleshooting ### Model Not Found If you get a "model not found" error, make sure: - Your API key is valid and has access to the model (check the approved models in your organization) - The model ID format is correct (`provider/model-name`) - The model is available in the [Model Library](https://app.requesty.ai/model-list) ### Connection Issues If Claude Code can't connect: - Verify your `ANTHROPIC_BASE_URL` is set to `https://router.requesty.ai` - Check your `ANTHROPIC_AUTH_TOKEN` is correct - Ensure you have internet connectivity --- ## Cline Source: https://docs.requesty.ai/integrations/cline.md > Requesty routing for Cline Many of our users use the Requesty router for the Cline coding agent. Quickly change model provider to continue coding with your preferred model. 1. Select Requesty from the API Provider dropdown 2. Add your API key, which you can create on the [API Keys Page](https://app.requesty.ai/api-keys) in the platform. 3. Paste your Model ID, which you can find in the [Model Library](https://app.requesty.ai/model-list) > **Info:** We created dedicated models for Cline. If you want to use those, the format is slightly different than the other models. You can find more information on [Dedicated Models](https://requesty.mintlify.app/features/dedicated-models) ![](/images/Screenshot2025-02-13at13.34.09.png) --- ## Roo Code Source: https://docs.requesty.ai/integrations/roo-code.md > Requesty routing for Roo Code Many of our users use the Requesty router for the Roo Code coding agent. Quickly change model provider to continue coding with your preferred model. 1. Select Requesty from the API Provider dropdown 2. Add your API key, which you can create on the [API Keys Page](https://app.requesty.ai/api-keys) in the platform. 3. Paste your Model ID, which you can find in the [Model Library](https://app.requesty.ai/model-list) > **Info:** We created dedicated models for Roo Code. If you want to use those, the format is slightly different than the other models. You can find more information on [Dedicated Models](https://requesty.mintlify.app/features/dedicated-models) --- ## VS Code Extension Source: https://docs.requesty.ai/integrations/VS-code-extension.md > Switch between LLMs instantly 1. Get Your API Key at [API Keys Page](https://app.requesty.ai/api-keys) 2. VS Code Setup: Install the Requesty extension 3. Click the Requesty icon in sidebar 4. Paste the same API key when prompted 5. Create an alias (e.g., "coding") 6. Configure Your Tools: 7. In Cline/Roo model_id: alias/coding (use the word alias then / and then what you've chosen as an alias name) 8. This ensures model switching works across tools > **Tip:** Star (⭐️) your favorite models for quick access --- ## OpenClaw Source: https://docs.requesty.ai/integrations/openclaw.md > Connect OpenClaw to 300+ models through Requesty [OpenClaw](https://openclaw.ai) (formerly Moltbot, formerly Clawdbot) is an open-source personal AI assistant with 180k+ stars on GitHub. It runs on your own devices and connects to messaging channels you already use — WhatsApp, Telegram, Slack, Discord, Signal, iMessage, and more. Using the Requesty integration, you can: - Access **300+ models** from OpenAI, Anthropic, Google, Mistral, and many more providers - Use both the **Anthropic Messages API** and **OpenAI Chat Completions API** formats - Track and manage your spend in a single location - Set up [fallback policies](/features/fallback-policies) so your assistant never goes down ## How It Works ```mermaid sequenceDiagram participant OC as OpenClaw Gateway participant RQ as Requesty Router participant AI as Model Providers OC->>RQ: API Request (Messages or Chat Completions) rect rgb(30, 30, 46) Note right of RQ: Routing Engine RQ->>RQ: Authenticate → Route → Transform end RQ->>AI: Forward to Best Provider AI-->>RQ: Response RQ-->>OC: Normalized Response ``` ## Prerequisites - OpenClaw installed and running (`npm install -g openclaw`) - A Requesty API key from the [API Keys Page](https://app.requesty.ai/api-keys) ## Configuration OpenClaw supports two API formats for connecting to Requesty. Choose the one that fits your use case. ### Anthropic Messages API ### Anthropic Messages API (`anthropic-messages`) Use this format to access Claude models through Requesty's Anthropic-compatible endpoint. This is the recommended approach if you primarily use Claude models. **Get your Requesty API key** Create an API key on the [API Keys Page](https://app.requesty.ai/api-keys). **Edit your OpenClaw config** Open `~/.openclaw/openclaw.json` and add the Requesty provider: ```json { "models": { "mode": "merge", "providers": { "requesty": { "baseUrl": "https://router.requesty.ai", "apiKey": "YOUR_REQUESTY_API_KEY", "api": "anthropic-messages", "models": [ { "id": "anthropic/claude-sonnet-4-5", "name": "Claude Sonnet 4.5 (via Requesty)" } ] } } }, "agents": { "defaults": { "model": { "primary": "requesty/anthropic/claude-sonnet-4-5" }, "models": { "requesty/anthropic/claude-sonnet-4-5": {} } } } } ``` **Apply and start** ```bash openclaw gateway config.apply --file ~/.openclaw/openclaw.json ``` ### OpenAI Chat Completions API ### OpenAI Chat Completions API (`openai-completions`) Use this format to access any model in the Requesty catalog — including OpenAI, Google, Mistral, and more — through the OpenAI-compatible `/v1/chat/completions` endpoint. **Get your Requesty API key** Create an API key on the [API Keys Page](https://app.requesty.ai/api-keys). **Edit your OpenClaw config** Open `~/.openclaw/openclaw.json` and add the Requesty provider: ```json { "models": { "mode": "merge", "providers": { "requesty": { "baseUrl": "https://router.requesty.ai/v1", "apiKey": "YOUR_REQUESTY_API_KEY", "api": "openai-completions", "models": [ { "id": "openai/gpt-4o", "name": "GPT-4o (via Requesty)" } ] } } }, "agents": { "defaults": { "model": { "primary": "requesty/openai/gpt-4o" }, "models": { "requesty/openai/gpt-4o": {} } } } } ``` **Apply and start** ```bash openclaw gateway config.apply --file ~/.openclaw/openclaw.json ``` > **Info:** The base URL differs between the two API formats: - **Anthropic Messages**: `https://router.requesty.ai` (no `/v1` suffix) - **OpenAI Chat Completions**: `https://router.requesty.ai/v1` (with `/v1` suffix) ## Onboarding Wizard If you prefer a guided setup, use the OpenClaw onboarding wizard and select **Custom Provider**: ```bash openclaw onboard ``` When prompted: 1. Choose **OpenAI-compatible** or **Anthropic-compatible** depending on the API format you want 2. Enter the base URL (`https://router.requesty.ai/v1` for OpenAI, `https://router.requesty.ai` for Anthropic) 3. Enter your Requesty API key 4. Provide a model ID (e.g. `openai/gpt-4o` or `anthropic/claude-sonnet-4-5`) ## Adding Multiple Models You can configure multiple models from different providers — all through a single Requesty API key: ```json { "models": { "mode": "merge", "providers": { "requesty": { "baseUrl": "https://router.requesty.ai", "apiKey": "YOUR_REQUESTY_API_KEY", "api": "anthropic-messages", "models": [ { "id": "anthropic/claude-sonnet-4-5", "name": "Claude Sonnet 4.5" }, { "id": "anthropic/claude-opus-4-6", "name": "Claude Opus 4.6" }, { "id": "bedrock/claude-sonnet-4-5", "name": "Claude Sonnet 4.5 (Bedrock)" } ] }, "requesty-openai": { "baseUrl": "https://router.requesty.ai/v1", "apiKey": "YOUR_REQUESTY_API_KEY", "api": "openai-completions", "models": [ { "id": "openai/gpt-4o", "name": "GPT-4o" }, { "id": "google/gemini-2.5-pro", "name": "Gemini 2.5 Pro" } ] } } }, "agents": { "defaults": { "model": { "primary": "requesty/anthropic/claude-sonnet-4-5", "fallbacks": [ "requesty-openai/openai/gpt-4o" ] }, "models": { "requesty/anthropic/claude-sonnet-4-5": { "alias": "sonnet" }, "requesty/anthropic/claude-opus-4-6": { "alias": "opus" }, "requesty/bedrock/claude-sonnet-4-5": { "alias": "sonnet-bedrock" }, "requesty-openai/openai/gpt-4o": { "alias": "gpt4o" }, "requesty-openai/google/gemini-2.5-pro": { "alias": "gemini" } } } } } ``` Then switch models in chat with: ``` /model sonnet /model opus /model gpt4o /model gemini ``` ## Model Selection You can use any model from the [Model Library](https://app.requesty.ai/model-list). Model IDs follow the `provider/model-name` format: | Provider | Example Model ID | |---|---| | Anthropic | `anthropic/claude-sonnet-4-5` | | OpenAI | `openai/gpt-4o` | | Google | `google/gemini-2.5-pro` | | AWS Bedrock | `bedrock/claude-opus-4-6` | | Mistral | `mistral/mistral-large-latest` | You can also use [Fallback Policies](/features/fallback-policies) by setting the model to `policy/your-policy-name`. ## EU Region For EU data residency, use the EU router endpoint: - **Anthropic Messages**: `https://router.eu.requesty.ai` - **OpenAI Chat Completions**: `https://router.eu.requesty.ai/v1` ## Benefits of Using Requesty with OpenClaw Switch between models from different providers without changing your setup Monitor spending and set limits across all your AI interactions Automatic fallbacks ensure your assistant never goes down Intelligent routing selects the best provider based on availability and latency ## Troubleshooting ### "model not allowed" The model must be in both `models.providers[].models[]` **and** `agents.defaults.models`. Make sure the allowlist key uses the fully-qualified name (`requesty/anthropic/claude-sonnet-4-5`), not just the model ID. ### Model doesn't show in `/models` Verify the model is listed in the `models` array of your provider definition. It's common to add the allowlist entry but forget the provider model definition (or vice versa). ### Connection errors Test your Requesty API key directly with curl: ```bash curl https://router.requesty.ai/v1/chat/completions \ -H "Authorization: Bearer YOUR_REQUESTY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "openai/gpt-4o", "messages": [{"role": "user", "content": "hello"}] }' ``` If this works but OpenClaw doesn't, the issue is in your OpenClaw config — double-check `baseUrl` and `apiKey`. ### Wrong model being called The `id` field in your model definition must match exactly what Requesty expects. Check the [Model Library](https://app.requesty.ai/model-list) for the correct model ID. ## Resources - [OpenClaw Documentation](https://docs.openclaw.ai) - [OpenClaw GitHub](https://github.com/openclaw/openclaw) - [Requesty API Keys](https://app.requesty.ai/api-keys) - [Requesty Model Library](https://app.requesty.ai/model-list) --- ## Anthropic Agent SDKs Source: https://docs.requesty.ai/integrations/anthropic-agent-sdks.md > Use the Claude Agent SDK (TypeScript and Python) with 300+ models by setting ANTHROPIC_BASE_URL to the Requesty gateway The Claude Agent SDK from Anthropic (available for TypeScript as `@anthropic-ai/claude-agent-sdk` and for Python as `claude-agent-sdk`) lets you build AI agents with tool calling, hooks, and MCP server support. This page shows how to point the SDK at Requesty so your agents can use 300+ models from multiple providers. For the SDK's own API reference (hooks, tool definitions, MCP configuration, TypeScript interfaces), see [Anthropic's official Agent SDK documentation](https://docs.anthropic.com/en/api/agent-sdk/overview). Using the Requesty integration, you can: - Use 300+ models while building agents, giving you flexibility to choose the best model for each task - Track and manage your spend in a single location - Keep a record of your agent interactions # Configuration ## Set ANTHROPIC_BASE_URL and ANTHROPIC_AUTH_TOKEN The Claude Agent SDK reads the `ANTHROPIC_BASE_URL` and `ANTHROPIC_AUTH_TOKEN` environment variables, so the easiest way to integrate Requesty is: 1. **Get Your API Key** Create an API key on the [API Keys Page](https://app.requesty.ai/api-keys). 2. **Set Environment Variables** ```bash export ANTHROPIC_BASE_URL="https://router.requesty.ai" export ANTHROPIC_AUTH_TOKEN="your_requesty_api_key" export ANTHROPIC_MODEL="anthropic/claude-fable-5" ``` > **Info:** With this setup, the Agent SDKs will route all requests through Requesty, giving you access to models from OpenAI, Anthropic, Google, Mistral, and many more providers. ### Model Selection You can choose any model from the [Model Library](https://app.requesty.ai/model-list) or any policy configured for your organization. **Standard model IDs** follow the format `provider/model-name`: - `anthropic/claude-fable-5` - `openai/gpt-5` - `google/gemini-2.5-flash` - `mistral/mistral-large-latest` **Policies** follow the format `policy/policy-name`: - `policy/reliable-sonnet-4-5` ## TypeScript The TypeScript Agent SDK (`@anthropic-ai/claude-agent-sdk`) automatically respects the `ANTHROPIC_BASE_URL` and `ANTHROPIC_AUTH_TOKEN` environment variables. ```typescript // Set environment variables before importing // ANTHROPIC_BASE_URL=https://router.requesty.ai // ANTHROPIC_AUTH_TOKEN=your_requesty_api_key for await (const message of query({ prompt: 'Your prompt here', options: { systemPrompt: 'You are a helpful assistant.', }, })) { // Handle messages if (message.type === 'assistant') { console.log(message.message.content); } } ``` ## Python The Python Agent SDK (`claude-agent-sdk`) automatically respects the `ANTHROPIC_BASE_URL` and `ANTHROPIC_AUTH_TOKEN` environment variables. ```python """Minimal agent using Claude Agent SDK.""" from claude_agent_sdk import ( ClaudeSDKClient, ClaudeAgentOptions, tool, create_sdk_mcp_server, AssistantMessage, TextBlock, ) os.environ["ANTHROPIC_BASE_URL"] = "https://router.requesty.ai" # ANTHROPIC_AUTH_TOKEN should be set as an environment variable async def main(): @tool("greet", "Greet a user", {"name": str}) async def greet_user(args): return { "content": [ {"type": "text", "text": f"Hello, {args['name']}!"} ] } server = create_sdk_mcp_server( name="my-tools", version="1.0.0", tools=[greet_user] ) options = ClaudeAgentOptions( mcp_servers={"tools": server}, allowed_tools=["mcp__tools__greet"] ) async with ClaudeSDKClient(options=options) as client: await client.query("Greet World") last_assistant_message = None async for msg in client.receive_response(): if isinstance(msg, AssistantMessage): last_assistant_message = msg if last_assistant_message: print(last_assistant_message.content[0].text) if __name__ == "__main__": asyncio.run(main()) ``` ## Benefits of Using Requesty with Agent SDKs Switch between models from different providers without changing your setup Monitor spending and set limits across all your AI interactions Automatic fallbacks ensure your agents never get interrupted Use the same SDK interface with models from multiple providers ## Troubleshooting ### Model Not Found If you get a "model not found" error, make sure: - Your API key is valid and has access to the model (check the approved models in your organization) - The model ID format is correct (`provider/model-name`) - The model is available in the [Model Library](https://app.requesty.ai/model-list) ### Connection Issues If the Agent SDK can't connect: - Verify your `ANTHROPIC_BASE_URL` is set to `https://router.requesty.ai` - Check your `ANTHROPIC_AUTH_TOKEN` is correct - Ensure you have internet connectivity - For TypeScript, make sure environment variables are set before importing the SDK --- ## LibreChat Source: https://docs.requesty.ai/integrations/librechat.md > Requesty routing for LibreChat Requesty provides seamless integration with LibreChat, allowing you to use any AI model with this popular open-source chat interface. ## Setup Instructions 1. Add the following lines to your LibreChat `.env` file: ```bash CONFIG_PATH="https://raw.githubusercontent.com/requestyai/librechat-requesty/main/librechat-env.yaml" REQUESTY_KEY="your_requesty_api_key_here" ``` 2. Get your API key from the [API Keys Page](https://app.requesty.ai/api-keys) in the Requesty platform. 3. Restart LibreChat to apply the changes. 4. That's it! You can now use any Requesty-supported model in LibreChat. > **Info:** New users receive $6 in sign-up credit when creating a Requesty account at [https://requesty.ai](https://requesty.ai) ## Video Tutorial ## Benefits - **Access to Multiple Models**: Use any model available through Requesty in LibreChat - **Automatic Configuration**: All models are automatically imported and pre-configured - **Seamless Integration**: Works with Docker-based LibreChat installations ## Resources - [LibreChat Documentation](https://www.librechat.ai/docs/local) - [Requesty GitHub Repository](https://github.com/requestyai/librechat-requesty) - [Requesty Platform](https://app.requesty.ai) --- ## OpenWebUI Source: https://docs.requesty.ai/integrations/openwebui.md > Complete integration guide for using Requesty with OpenWebUI **Everything you need in one integration**: Integrate Requesty with OpenWebUI to access **400+ AI models** through a single, unified interface. Get chat completions, embeddings, speech-to-text, and image generation - all with the flexibility to choose between regions and create your own routing policies. ## How It Works ```mermaid sequenceDiagram participant User as OpenWebUI participant Router as Requesty Router participant AI as Model Providers User->>Router: GET /v1/models Router-->>User: Return Approved Models User->>Router: POST Request (Chat/Embed/Audio) rect rgb(30, 30, 46) Note right of Router: 🛡️ Security Layer Router->>Router: Verify API Key & Permissions Note right of Router: 🧠 Routing Engine Router->>Router: Check Cache → Select Provider → Apply Fallback Note right of Router: 🔄 Adaptor Layer Router->>Router: Transform to Provider Format end Router->>AI: Execute Request AI-->>Router: Raw Output rect rgb(30, 30, 46) Note right of Router: 📊 Analytics Engine: Store Usage end Router-->>User: Normalized Response ``` **Why use Requesty instead of managing providers yourself?** One URL (`https://router.requesty.ai/v1`) replaces dozens of provider-specific endpoints Single invoice for all model providers—no need to manage multiple subscriptions Track usage by user email, model, cost, and time with built-in dashboards Intelligent response caching and optimization reduces costs and improves speed **What Requesty handles automatically:** - ✅ Model availability discovery via `/v1/models` - ✅ Request transformation for different providers - ✅ Response caching and optimization - ✅ User tracking and analytics collection - ✅ Regional routing and failover - ✅ Rate limiting and quota management ## Important: Enable User Tracking > **Warning:** **Required for Analytics**: Add this to your OpenWebUI YAML configuration to enable user tracking in analytics: ```yaml ENABLE_FORWARD_USER_INFO_HEADERS=true ``` Without this setting, you won't be able to track usage by user email in your analytics. ## Connection Setup Navigate to: `/admin/settings/connections` ### Add or Edit OpenAI Connection 1. Click to edit the existing OpenAI API connection or add a new one 2. Set the **URL** to one of: - Global: `https://router.requesty.ai/v1` - EU only: `https://router.eu.requesty.ai/v1` 3. Add your **API Key** from the [API Keys Page](https://app.requesty.ai/api-keys) 4. Click **Save** OpenWebUI Connection Setup ### Available Models Models will be automatically fetched from Requesty. **Only approved models for your organization will be visible.** Manage your approved models at: [https://app.requesty.ai/admin-panel?tab=models](https://app.requesty.ai/admin-panel?tab=models) View them in OpenWebUI at: `/admin/settings/models` ## Embeddings Setup Navigate to: `/admin/settings/documents` 1. Under **Embedding Model Engine**, select **OpenAI** 2. **Overwrite base_url** with: `https://router.requesty.ai/v1` 3. Add your Requesty API key 4. Save your settings Embeddings Configuration ## Speech-to-Text Setup Navigate to: `/admin/settings/audio` > **Info:** Requesty currently supports **Speech-to-Text only** (not Text-to-Speech). 1. Set the **base URL** to: `https://router.requesty.ai/v1` 2. Select **OpenAI** as the provider 3. Use model: `openai/whisper-1` (or any other supported Whisper model) 4. Add your Requesty API key 5. Save your settings Speech-to-Text Configuration ## Image Generation Setup Navigate to: `/admin/settings/images` 1. Set the **base URL** to: `https://router.requesty.ai/v1` 2. Select **OpenAI** as the default provider 3. Use a model like: `vertex/google/gemini-3-pro-image-preview` 4. Customize the image prompt if desired 5. Add your Requesty API key 6. Save your settings Image Generation Configuration ## Analytics & Usage Tracking With `ENABLE_FORWARD_USER_INFO_HEADERS=true` enabled, you can track detailed usage analytics by user email, model, and more. ### Visual Analytics Dashboard View a complete breakdown of spending per customer in the Requesty analytics dashboard: [**View Spend by User Email →**](https://app.requesty.ai/analytics/advanced?groupBy=extra.X-Openwebui-User-Email&metric=cost&aggMethod=sum&timeRange=7d&timeGroup=day®ularChart=bar&breakdownChart=bar&hidden=vertex%2Fgemini-2.5-flash%2Copenai%2Fgpt-5-mini%3Apriority%2Cbedrock%2Fclaude-sonnet-4-5%2Cbedrock%2Fclaude-sonnet-4-5%40us-west-2%2Cbedrock%2Fclaude-opus-4-5%2Cbedrock%2Fclaude-sonnet-4-5%40us-east-1%2Copenai%2Fgpt-5%3Apriority%2Cbedrock%2Fclaude-sonnet-4%40us-east-1%2Cbedrock%2Fclaude-opus-4%40us-east-2%2Cbedrock%2Fclaude-haiku-4-5%40eu-west-1%2Canthropic%2Fclaude-sonnet-4%2Cbedrock%2Fclaude-sonnet-4%40eu-west-1%2Cbedrock%2Fclaude-opus-4%2Cbedrock%2Fclaude-opus-4%40us-east-1%2Cbedrock%2Fclaude-3-7-sonnet%2Cbedrock%2Fclaude-sonnet-4-5%40eu-west-1%2Cbedrock%2Fclaude-sonnet-4%2Cbedrock%2Fclaude-3-7-sonnet%40eu-west-1%2Cbedrock%2Fclaude-haiku-4-5%40us-east-1%2Cbedrock%2Fclaude-3-7-sonnet%40us-east-1%2Cbedrock%2Fclaude-3-7-sonnet%40eu-west-3%2Cbedrock%2Fclaude-opus-4%40us-west-2%2Cbedrock%2Fclaude-3-7-sonnet%40eu-north-1%2Cbedrock%2Fclaude-sonnet-4%40eu-west-3%2Cbedrock%2Fclaude-sonnet-4-5%40eu-north-1%2Cbedrock%2Fclaude-sonnet-4%40eu-north-1%2Cbedrock%2Fclaude-sonnet-4%40eu-central-1%2Cbedrock%2Fclaude-3-7-sonnet%40eu-central-1%2Cbedrock%2Fclaude-sonnet-4%40us-west-2%2Cbedrock%2Fclaude-3-7-sonnet%40us-west-2%2Cbedrock%2Fclaude-haiku-4-5%2Cbedrock%2Fclaude-sonnet-4%40us-east-2%2Cbedrock%2Fclaude-sonnet-4-5%40eu-central-1%2Cbedrock%2Fclaude-3-7-sonnet%40us-east-2%2Cbedrock%2Fclaude-haiku-4-5%40eu-west-3%2Cbedrock%2Fclaude-haiku-4-5%40eu-north-1%2Cbedrock%2Fclaude-haiku-4-5%40us-east-2%2Cbedrock%2Fclaude-haiku-4-5%40eu-central-1%2Cbedrock%2Fclaude-sonnet-4-5%40us-east-2%2Cbedrock%2Fclaude-haiku-4-5%40us-west-2%2Cbedrock%2Fclaude-opus-4%40eu-west-1%2Copenai%2Fgpt-5.1-chat%2Cbedrock%2Fclaude-haiku-4-5%40eu-west-12%2Cxai%2Fgrok-4-1-fast%2Copenai-responses%2Fgpt-5.1-codex%2Cxai%2Fgrok-4-1-fast-non-reasoning%2Cvertex%2Fgemini-3-pro-image-preview&origin_title=Open+WebUI) Analytics Dashboard - Cost by User Email This dashboard provides real-time insights into: - Cost per user (grouped by email) - Usage patterns over time - Model-specific spending breakdowns - Interactive time-series visualizations ## Programmatic Analytics You can also retrieve usage statistics programmatically using the Requesty Management API. This is useful for building custom dashboards or billing integrations. ### Get Usage by User Email Use the `extra.X-Openwebui-User-Email` field to group usage by OpenWebUI user: ```bash curl -X GET "https://api-v2.requesty.ai/v1/manage/apikey/{id}/usage" \ -H "Authorization: Bearer REQUESTY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "start": "2025-11-01T00:00:00Z", "end": "2025-11-30T23:59:59Z", "group_by": ["extra.X-Openwebui-User-Email"], "resolution": "day" }' ``` **Example Response:** ```json { "usage": { "2025-11-29": { "spend": "0.13484955", "total_tokens": 36398, "grouped_data": [{ "group_by_values": { "extra.X-Openwebui-User-Email": "thibault@requesty.ai" }, "spend": "0.13484955", "total_tokens": 36398 }] } } } ``` ## Resources - [OpenWebUI Documentation](https://docs.openwebui.com/) - [Requesty API Keys](https://app.requesty.ai/api-keys) - [Requesty Platform](https://app.requesty.ai) --- ## Requests Source: https://docs.requesty.ai/frameworks/requests.md > Using Requesty router with Python Requests Building an application with Python Requests, or any other REST API client? Using Requesty with Python Requests is straightforward - you just need to point your HTTP requests to the Requesty router endpoint. This approach gives you maximum flexibility while still accessing all of Requesty's powerful features. This simple integration unlocks powerful features, such as: - [Fallback Policies](/features/fallback-policies) - [Load Balancing](/features/load-balancing) - [Auto Caching](/features/auto-caching) - [Request Metadata](/features/request-metadata) - ...and many more. All of this is available while maintaining full control over your HTTP requests. With Requesty, you can access over 250+ models from various providers. To specify a model, you must include the provider prefix, like `openai/gpt-4o-mini` or `anthropic/claude-sonnet-4-20250514`. You can find the full list of available models in the [Model Library](https://app.requesty.ai/model-list). ## Basic Usage Here's how to make a simple chat completion request using Python Requests: ```python def chat_completion(): # Safely load your API key from environment variables REQUESTY_API_KEY = os.environ.get("REQUESTY_API_KEY") if not REQUESTY_API_KEY: print("Error: REQUESTY_API_KEY environment variable not set.") return try: response = requests.post( 'https://router.requesty.ai/v1/chat/completions', headers={ 'Authorization': f'Bearer {REQUESTY_API_KEY}', 'Content-Type': 'application/json' }, json={ 'model': "openai/gpt-4o", 'messages': [ {'role': "user", 'content': "Hello, world!"} ] } ) response.raise_for_status() # Raise an HTTPError for bad responses (4xx or 5xx) print(response.json()['choices'][0]['message']['content']) except requests.exceptions.RequestException as e: print(f"Error: {e}") chat_completion() ``` ## Streaming Responses For streaming responses, you can use Server-Sent Events: ```python def streaming_chat(): REQUESTY_API_KEY = os.environ.get("REQUESTY_API_KEY") if not REQUESTY_API_KEY: print("Error: REQUESTY_API_KEY environment variable not set.") return try: response = requests.post( 'https://router.requesty.ai/v1/chat/completions', headers={ 'Authorization': f'Bearer {REQUESTY_API_KEY}', 'Content-Type': 'application/json' }, json={ 'model': "openai/gpt-4o", 'messages': [ {'role': "user", 'content': "Write a short story about AI"} ], 'stream': True }, stream=True # Important for streaming ) response.raise_for_status() for line in response.iter_lines(): decoded_line = line.decode('utf-8') trimmed_line = decoded_line.strip() if not trimmed_line.startswith('data:'): continue data = trimmed_line[len('data:'):].strip() if data == '[DONE]': print('\nStream completed') break try: parsed = json.loads(data) content = parsed.get('choices', [{}])[0].get('delta', {}).get('content') if content: print(content, end='') except json.JSONDecodeError: # Skip invalid JSON lines pass except requests.exceptions.RequestException as e: print(f"Error: {e}") streaming_chat() ``` --- ## OpenAI Source: https://docs.requesty.ai/frameworks/openai.md > Learn how to use Requesty with the OpenAI SDK Using Requesty with the OpenAI SDK is as simple as changing a single line of code. By pointing the `base_url` to the Requesty router, you can take advantage of all of Requesty's features without changing the rest of your code. This simple change unlocks powerful features, such as: - [Fallback Policies](/features/fallback-policies) - [Load Balancing](/features/load-balancing) - [Auto Caching](/features/auto-caching) - [Request Metadata](/features/request-metadata) - ...and many more. All of this is available while maintaining the familiar OpenAI SDK interface. With Requesty, you can access over 250+ models from various providers. To specify a model, you must include the provider prefix, like `openai/gpt-4.1-mini` or `anthropic/claude-sonnet-4-20250514`. You can find the full list of available models in the [Model Library](https://app.requesty.ai/model-list). ## Python To use the OpenAI Python client with Requesty, simply set the `base_url` when initializing the client. ```python # Safely load your API key from environment variables or a secret manager requesty_api_key = "YOUR_REQUESTY_API_KEY" client = openai.OpenAI( api_key=requesty_api_key, base_url="https://router.requesty.ai/v1", ) # Now you can use the client as you normally would # All requests will be routed through Requesty response = client.chat.completions.create( model="openai/gpt-4o", messages=[{"role": "user", "content": "Hello, world!"}], ) print(response.choices[0].message.content) ``` ## Javascript The same principle applies to the OpenAI Javascript client. Set the `baseURL` during initialization. ```javascript const client = new OpenAI({ // Safely load your API key from environment variables apiKey: "YOUR_REQUESTY_API_KEY", baseURL: "https://router.requesty.ai/v1", }); async function main() { // Now you can use the client as you normally would // All requests will be routed through Requesty const response = await client.chat.completions.create({ model: "openai/gpt-4o", messages: [{ role: "user", content: "Hello, world!" }], }); console.log(response.choices[0].message.content); } main(); ``` --- ## LangChain Source: https://docs.requesty.ai/frameworks/langchain.md > Using Requesty router with LangChain Building an application with LangChain? You can use the Requesty router to access any LLM, and get cost management, monitoring and fallbacks out-of-the-box. Here's an example script: ```python from langchain_openai import ChatOpenAI from langchain_core.prompts import PromptTemplate from langchain_core.runnables import RunnableLambda from os import getenv from dotenv import load_dotenv # Load environment variables load_dotenv() # Define the prompt template template = """You are an expert on Requesty router. The user has a question about this router: Question: {question} Answer: Let's think step by step.""" prompt = PromptTemplate(template=template, input_variables=["question"]) # Initialize the OpenAI LLM llm = ChatOpenAI( openai_api_key=getenv("REQUESTY_API_KEY"), openai_api_base=getenv("REQUESTY_BASE_URL"), model_name="openai/gpt-4o", ) # Create a Runnable Chain llm_chain = prompt | llm # Define the question question = "What application should I build now that Requesty router provides access to 150+ LLMs?" # Run the model and get the response response = llm_chain.invoke({"question": question}) print(response) ``` --- ## Haystack Source: https://docs.requesty.ai/frameworks/haystack.md > Using Requesty router with Haystack Building an application with Haystack? Integrating Requesty is a super simple 3 stage process: - Set your Requesty API key - Set your Requesty base URL - Choose one of the 300+ [supported models](https://app.requesty.ai/model-list) And get immediate value: - Access to all the best LLMs - A single API key to access all the providers - Very clear spending dashboards - Telemetry and logging out of the box ## Option no. 1 - Configure via environment variables Set: - OPENAI_API_KEY=[Your Requesty API key] - OPENAI_BASE_URL="https://router.requesty.ai/v1" Change the model parameter to any model, and you're done! (Yes, you can use Anthropic or any other model without changing anything but the `model` parameter) ```python from dotenv import load_dotenv from haystack.components.agents import Agent from haystack.components.generators.chat import OpenAIChatGenerator from haystack.dataclasses import ChatMessage # Initialize the agent with Requesty router agent = Agent( chat_generator=OpenAIChatGenerator( model="anthropic/claude-sonnet-4-20250514", ), system_prompt="You are a helpful web agent powered by Requesty router.", ) # Define the question question = "What are the benefits of using Requesty router with Haystack?" # Run the agent and get the response result = agent.run(messages=[ChatMessage.from_user(question)]) # Print the response print(result['last_message'].text) ``` ## Option no. 2 - Configure the client Load your Requesty API key any way you want. Pass the `api_key`, `api_base_url` and set the `model` parameter to any model, and you're done! (Yes, you can use xAI or any other model without changing anything but the `model` parameter) ```python from haystack.components.agents import Agent from haystack.components.generators.chat import OpenAIChatGenerator from haystack.dataclasses import ChatMessage from haystack.utils import Secret # Securely load your API key requesty_api_key = Secret.from_env_var("REQUESTY_API_KEY"), # Initialize the agent with Requesty router agent = Agent( chat_generator=OpenAIChatGenerator( model="xai/grok-4", api_key=requesty_api_key, api_base_url="https://router.requesty.ai/v1", ), system_prompt="You are a helpful web agent powered by Requesty router.", ) # Define the question question = "What are the benefits of using Requesty router with Haystack?" # Run the agent and get the response result = agent.run(messages=[ChatMessage.from_user(question)]) # Print the response print(result['last_message'].text) ``` --- ## PydanticAI Source: https://docs.requesty.ai/frameworks/pydantic-ai.md > Using Requesty router with PydanticAI Do you use PydanticAI? Integrating Requesty is a super simple 3 stage process: - Set your Requesty API key - Set your Requesty base URL - Choose one of the 300+ [supported models](https://app.requesty.ai/model-list) And get immediate value: - Access to all the best LLMs - A single API key to access all the providers - Very clear spending dashboards - Telemetry and logging out of the box You can use the Requesty router to access any LLM, and get cost management, monitoring and fallbacks out-of-the-box. ## Configure via environment variables Set: - OPENAI_API_KEY=[Your Requesty API key] - OPENAI_BASE_URL="https://router.requesty.ai/v1" Change the model parameter to any model, and you're done! (Yes, you can use Anthropic or any other model without changing anything but the `model` parameter) ````python from pydantic_ai import Agent from pydantic_ai.models.openai import OpenAIModel model = OpenAIModel( "anthropic/claude-sonnet-4-20250514", ) agent = Agent(model) async def main(): response = await agent.run("What should I build with Requesty router, now that I have access to 150+ LLMs?") print(response) if __name__ == "__main__": import asyncio asyncio.run(main()) ```` --- ## Axios Source: https://docs.requesty.ai/frameworks/axios.md > Using Requesty router with Axios Building an application with Axios, or any other REST API client? Using Requesty with Axios is straightforward - you just need to point your HTTP requests to the Requesty router endpoint. This approach gives you maximum flexibility while still accessing all of Requesty's powerful features. This simple integration unlocks powerful features, such as: - [Fallback Policies](/features/fallback-policies) - [Load Balancing](/features/load-balancing) - [Auto Caching](/features/auto-caching) - [Request Metadata](/features/request-metadata) - ...and many more. All of this is available while maintaining full control over your HTTP requests. With Requesty, you can access over 250+ models from various providers. To specify a model, you must include the provider prefix, like `openai/gpt-4o-mini` or `anthropic/claude-sonnet-4-20250514`. You can find the full list of available models in the [Model Library](https://app.requesty.ai/model-list). ## Basic Usage Here's how to make a simple chat completion request using Axios: ```javascript // Safely load your API key from environment variables const REQUESTY_API_KEY = process.env.REQUESTY_API_KEY; async function chatCompletion() { try { const response = await axios.post('https://router.requesty.ai/v1/chat/completions', { model: "openai/gpt-4o", messages: [ { role: "user", content: "Hello, world!" } ] }, { headers: { 'Authorization': `Bearer ${REQUESTY_API_KEY}`, 'Content-Type': 'application/json' } }); console.log(response.data.choices[0].message.content); } catch (error) { console.error('Error:', error.response?.data || error.message); } } chatCompletion(); ``` ## Streaming Responses For streaming responses, you can use Server-Sent Events: ```javascript async function streamingChat() { try { const response = await axios.post('https://router.requesty.ai/v1/chat/completions', { model: "openai/gpt-4o", messages: [ { role: "user", content: "Write a short story about AI" } ], stream: true }, { headers: { 'Authorization': `Bearer ${process.env.REQUESTY_API_KEY}`, 'Content-Type': 'application/json' }, responseType: 'stream' }); response.data.on('data', (chunk) => { const lines = chunk.toString().split('\n'); for (const line of lines) { const trimmedLine = line.trim(); if (!trimmedLine || !trimmedLine.startsWith('data:')) continue; const data = trimmedLine.substring(5).trim(); if (data === '[DONE]') { console.log('\nStream completed'); return; } try { const parsed = JSON.parse(data); const content = parsed.choices?.[0]?.delta?.content; if (content) { process.stdout.write(content); } } catch (e) { // Skip invalid JSON lines } } }); } catch (error) { console.error('Error:', error.response?.data || error.message); } } streamingChat(); ``` --- ## LlamaIndex TS Source: https://docs.requesty.ai/frameworks/llamaindex-ts.md > Using Requesty with LlamaIndex TS View adapter source code View adapter NPM package The Requesty adapter for LlamaIndex TypeScript provides a seamless integration to access over 300 large language models through the Requesty platform within your LlamaIndex applications. ## Setup ```bash # For pnpm pnpm add @requesty/llamaindex # For npm npm install @requesty/llamaindex # For yarn yarn add @requesty/llamaindex ``` ## API Key Setup For security, you should set your API key as an environment variable named exactly `REQUESTY_API_KEY`: ```bash # Linux/Mac export REQUESTY_API_KEY=your_api_key_here # Windows Command Prompt set REQUESTY_API_KEY=your_api_key_here # Windows PowerShell $env:REQUESTY_API_KEY="your_api_key_here" ``` ## Basic Usage The adapter provides a simple interface to use Requesty models within your LlamaIndex TypeScript applications: ```typescript const llm = requesty({ model: "openai/gpt-4o-mini", apiKey: process.env.REQUESTY_API_KEY, baseURL: "https://your-requesty-endpoint.com/v1" }); const response = await llm.chat({ messages: [{ role: "user", content: "Hello!" }] }); ``` ## Supported Models You can use any model available through Requesty. Find the complete list of available models at [app.requesty.ai/models](https://app.requesty.ai/models). ## Features Access models from OpenAI, Anthropic, Google, Mistral, and many other providers Full support for streaming responses for real-time applications Support for structured output using Zod schemas Utilize function/tool calling capabilities with supported models Support for complex multi-agent workflow configurations Powerful telemetry and analytics capabilities built-in ## Getting Started For detailed usage examples, configuration options, and advanced features, please refer to the [GitHub repository](https://github.com/requestyai/llamaindex-ts) which contains comprehensive documentation and examples to help you get started with the integration. The adapter is designed to work seamlessly with existing LlamaIndex TypeScript applications while providing access to Requesty's powerful model routing and analytics capabilities. --- ## Vercel AI SDK Source: https://docs.requesty.ai/frameworks/vercel-ai-sdk.md > Using Requesty router the Vercel AI SDK View the source code View the package The Requesty provider for the Vercel AI SDK gives access to over 300 large language models through the Requesty chat and completion APIs. ## Setup ```bash # For pnpm pnpm add @requesty/ai-sdk # For npm npm install @requesty/ai-sdk # For yarn yarn add @requesty/ai-sdk ``` ## API Key Setup For security, you should set your API key as an environment variable named exactly `REQUESTY_API_KEY`: ```bash # Linux/Mac export REQUESTY_API_KEY=your_api_key_here # Windows Command Prompt set REQUESTY_API_KEY=your_api_key_here # Windows PowerShell $env:REQUESTY_API_KEY="your_api_key_here" ``` ## Provider Instance You can import the default provider instance `requesty` from `@requesty/ai-sdk`: ```javascript ``` ## Example ```javascript const { text } = await generateText({ model: requesty('openai/gpt-4o'), prompt: 'Write a vegetarian lasagna recipe for 4 people.', }); ``` ## Supported Models This list is not a definitive list of models supported by Requesty, as it constantly changes as we add new models (and deprecate old ones) to our system. You can find the latest list of models supported by Requesty [here](https://app.requesty.ai/models). You can find the latest list of tool-supported models supported by Requesty [here](https://app.requesty.ai/tools). (Note: This list may contain models that are not compatible with the AI SDK.) ## Passing Extra Body to Requesty There are 3 ways to pass extra body to Requesty: ### 1. Via the `providerOptions.requesty` property: ```javascript const requesty = createRequesty({ apiKey: process.env.REQUESTY_API_KEY }); const model = requesty('anthropic/claude-3.7-sonnet'); await streamText({ model, messages: [{ role: 'user', content: 'Hello' }], providerOptions: { requesty: { custom_field: 'value', }, }, }); ``` ### 2. Via the `extraBody` property in the model settings: ```javascript const requesty = createRequesty({ apiKey: process.env.REQUESTY_API_KEY }); const model = requesty('anthropic/claude-3.7-sonnet', { extraBody: { custom_field: 'value', }, }); await streamText({ model, messages: [{ role: 'user', content: 'Hello' }], }); ``` ### 3. Via the `extraBody` property in the model factory: ```javascript const requesty = createRequesty({ apiKey: process.env.REQUESTY_API_KEY, extraBody: { custom_field: 'value', }, }); const model = requesty('anthropic/claude-3.7-sonnet'); await streamText({ model, messages: [{ role: 'user', content: 'Hello' }], }); ``` ## Features Use a single API to access models from OpenAI, Anthropic, Google, Mistral, and many more Full support for streaming responses for real-time applications Utilize function/tool calling capabilities with supported models Built with TypeScript for enhanced developer experience Seamless integration with the AI SDK ecosystem ## Advanced Configuration ### Custom API URL You can configure Requesty to use a custom API URL: ```javascript const requesty = createRequesty({ apiKey: process.env.REQUESTY_API_KEY, baseURL: 'https://router.requesty.ai/v1', }); ``` ### Headers Add custom headers to all requests: ```javascript const requesty = createRequesty({ apiKey: process.env.REQUESTY_API_KEY, headers: { 'Custom-Header': 'custom-value', }, }); ``` ### Model Settings Configure model-specific settings: ```javascript const requesty = createRequesty({ apiKey: process.env.REQUESTY_API_KEY }); const model = requesty('openai/gpt-4o', { // Specific model to use with this request models: ['openai/gpt-4o', 'anthropic/claude-3-opus'], // Control the bias of specific tokens in the model's vocabulary logitBias: { 50256: -100 }, // Request token-level log probabilities logprobs: 5, // User identifier for tracking or rate limiting user: 'user-123', // Additional body parameters extraBody: { custom_field: 'value', }, }); ``` --- ## Overview Source: https://docs.requesty.ai/api-reference/overview.md > An overview of Requestys API Requesty normalizes the schema across models and providers, so you don't waste time with custom integrations. ## Endpoints Requesty provides two main endpoints: ### Chat Completions (`/v1/chat/completions`) For generating text completions and conversations with AI models. ### Embeddings (`/v1/embeddings`) For creating vector embeddings from text, which can be used for semantic search, similarity matching, and other AI applications. ## Chat Completions Request Structure Your request body to `/v1/chat/completions` closely follows the OpenAI Chat Completion schema: * **Required Fields:** * `messages`: An array of message objects with `role` and `content` * Roles can be `user`, `assistant`, `system`, or `tool` * `model`: The model name. If omitted, defaults to the user's or payer's default model. Here is a [full list of the supported models](https://app.requesty.ai/model-list) * **Optional Fields:** * `prompt`: Alternative to `messages` for some providers. * `stream`: A boolean to enable Server-Sent Events (SSE) streaming responses. * `max_tokens`, `temperature`, `top_p`, etc.: Standard language model parameters. * `tools / functions` : Allows function calling with a schema defined. See OpenAI's [function calling documentation](https://platform.openai.com/docs/guides/structured-outputs) for the structure of these requests. * `tool_choice` : Specifies how tool calling should be handled. * `response_format` : For structured responses (some models only). ### [](https://docs.requesty.ai/router/requesty-router#example-request-body)Example Request Body ```json { "model": "openai/gpt-4o-mini", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the capital of France?"} ], "max_tokens": 200, "temperature": 0.7, "stream": true, "tools": [ { "type": "function", "function": { "name": "get_current_weather", "description": "Get the current weather in a given location", "parameters": { "type": "object", "properties": { "location": {"type": "string", "description": "City and state"}, "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]} }, "required": ["location"] } } } ] } ``` Here, we also provide a tool (`get_current_weather`) that the model can call if it decides the user request involves weather data. Some request fields require a different function, for example if you use `response_format` you'll need to update the request to `client.beta.chat.completions.parse` and you may want to use the Pydantic or Zod format for your structure. ## Response Structure The response is normalized to an OpenAI-style ChatCompletion object: 1. Streaming: If `stream: true`, responses arrive incrementally as SSE events with `data: lines`. See [Streaming](https://requesty.mintlify.app/features/streaming) for documentation on streaming. 2. Function Calls (Tool Calls): If the model decides to call a tool, it will return a `function_call` in the assistant message. You then execute the tool, append the tool's result as a `role: "tool"` message, and send a follow-up request. The LLM will then integrate the tool output into its final answer. ### Non-Streaming Response Example ```json { "id": "chatcmpl-xyz123", "object": "chat.completion", "created": 1687623702, "model": "openai/gpt-4o", "usage": { "prompt_tokens": 10, "completion_tokens": 50, "total_tokens": 60 }, "choices": [ { "index": 0, "message": { "role": "assistant", "content": "The capital of France is Paris." }, "finish_reason": "stop" } ] } ``` Function Call Example: If the model decides it needs the weather tool: ```json { "id": "chatcmpl-abc456", "object": "chat.completion", "created": 1687623800, "model": "openai/gpt-4o", "choices": [ { "index": 0, "message": { "role": "assistant", "content": null, "function_call": { "name": "get_current_weather", "arguments": "{ "location": "Boston, MA"}" } }, "finish_reason": "function_call" } ] } ``` You would then call the get_current_weather function externally, get the result, and send it back as: ```json { "model": "openai/gpt-4o", "messages": [ {"role": "user", "content": "What is the weather in Boston?"}, { "role": "assistant", "content": null, "function_call": { "name": "get_current_weather", "arguments": "{ "location": "Boston, MA" }" } }, { "role": "tool", "name": "get_current_weather", "content": "{"temperature": "22", "unit": "celsius", "description": "Sunny"}" } ] } ``` The next completion will return a final answer integrating the tool's response. ## Embeddings Request Structure Your request body to `/v1/embeddings` follows the OpenAI Embeddings schema: * **Required Fields:** * `input`: The text to embed. Can be a string, array of strings, array of tokens, or array of token arrays * `model`: The model name to use for embedding generation (e.g., `openai/text-embedding-3-small`) * **Optional Fields:** * `dimensions`: The number of dimensions for the output embeddings (only supported in text-embedding-3 and later models) * `encoding_format`: The format to return embeddings in (`float` or `base64`, defaults to `float`) * `user`: A unique identifier representing your end-user ### Example Embeddings Request Body ```json { "model": "openai/text-embedding-3-small", "input": "The food was delicious and the service was excellent.", "encoding_format": "float" } ``` For multiple texts: ```json { "model": "openai/text-embedding-3-small", "input": [ "The food was delicious and the service was excellent.", "The restaurant had poor service and cold food.", "Amazing atmosphere with friendly staff." ], "encoding_format": "float" } ``` ## Embeddings Response Structure The response is normalized to an OpenAI-style Embedding object: ```json { "data": [ { "embedding": [0.0023064255, -0.009327292, ...], "index": 0, "object": "embedding" } ], "model": "openai/text-embedding-3-small", "object": "list", "usage": { "prompt_tokens": 8, "total_tokens": 8 } } ``` --- ## Create Chat Completion Source: https://docs.requesty.ai/api-reference/endpoint/chat-completions-create.md ## PDF Support Send PDFs using the `input_file` content type. You can provide the PDF as either base64-encoded data or a URL. ### Using Base64-Encoded PDF ```bash curl https://router.requesty.ai/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer YOUR_REQUESTY_API_KEY" \ -d '{ "model": "anthropic/claude-sonnet-4-20250514", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Summarize this PDF" }, { "type": "input_file", "filename": "document.pdf", "file_data": "data:application/pdf;base64," } ] } ] }' ``` ### Using PDF URL ```bash curl https://router.requesty.ai/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer YOUR_REQUESTY_API_KEY" \ -d '{ "model": "anthropic/claude-sonnet-4-20250514", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Summarize this PDF" }, { "type": "input_file", "filename": "document.pdf", "file_data": "https://example.com/document.pdf" } ] } ] }' ``` ### Parameters - `type`: Must be `"input_file"` - `filename`: The name of the PDF file (e.g., `"document.pdf"`) - `file_data`: Either base64-encoded PDF content or a URL to the PDF file --- ## Create Message Source: https://docs.requesty.ai/api-reference/endpoint/messages-create.md Send a message to an Anthropic-compatible model and receive a response. This endpoint follows the Anthropic Messages API format and supports all Anthropic models as well as compatible models from other providers through Requesty's routing. ## Base URL ``` https://router.requesty.ai/v1/messages ``` ## Authentication Include your Requesty API key in the request headers using Anthropic's standard format: ```bash x-api-key: YOUR_REQUESTY_API_KEY ``` ## Headers | Header | Required | Description | | ------------------- | -------- | ---------------------------------------- | | `x-api-key` | ✅ | Your Requesty API key (Anthropic format) | | `Content-Type` | ✅ | Must be `application/json` | | `anthropic-version` | ❌ | API version (defaults to `2023-06-01`) | ## Example Request ```bash curl https://router.requesty.ai/v1/messages \ -H "Content-Type: application/json" \ -H "x-api-key: YOUR_REQUESTY_API_KEY" \ -H "anthropic-version: 2023-06-01" \ -d '{ "model": "anthropic/claude-sonnet-4-20250514", "max_tokens": 1024, "messages": [ { "role": "user", "content": "Hello, Claude!" } ] }' ``` ## Model Selection You can use any model available in the [Model Library](https://app.requesty.ai/model-list). Examples: - **Anthropic Models:** `anthropic/claude-sonnet-4-20250514`, `anthropic/claude-3-7-sonnet` - **OpenAI Models:** `openai/gpt-4o`, `openai/gpt-4o-mini` - **Google Models:** `google/gemini-2.0-flash-exp` - **Other Providers:** `mistral/mistral-large-2411`, `meta/llama-3.3-70b-instruct` > **Info:** While this endpoint uses the Anthropic Messages format, Requesty automatically handles format conversion for non-Anthropic models, so you can use any supported model with this endpoint. ## Streaming Enable streaming responses by setting `stream: true`: ```json { "model": "anthropic/claude-sonnet-4-20250514", "max_tokens": 1024, "stream": true, "messages": [ { "role": "user", "content": "Write a short story" } ] } ``` ## Vision Support Send images using the content blocks format: ```json { "model": "anthropic/claude-sonnet-4-20250514", "max_tokens": 1024, "messages": [ { "role": "user", "content": [ { "type": "text", "text": "What do you see in this image?" }, { "type": "image", "source": { "type": "base64", "media_type": "image/jpeg", "data": "/9j/4AAQSkZJRgABAQAAAQABAAD..." } } ] } ] } ``` ## PDF Support You can send PDFs, encoded in base 64 format: ```json { "model": "anthropic/claude-sonnet-4-20250514", "max_tokens": 1024, "messages": [ { "role": "user", "content": [ { "type": "text", "text": "What is in this PDF?" }, { "type": "document", "source": { "type": "base64", "media_type": "application/pdf", "data": "JVBERi0=..." } } ] } ] } ``` ## Tool Use Define tools that the model can call: ```json { "model": "anthropic/claude-sonnet-4-20250514", "max_tokens": 1024, "tools": [ { "name": "get_weather", "description": "Get the current weather in a given location", "input_schema": { "type": "object", "properties": { "location": { "type": "string", "description": "The city and state, e.g. San Francisco, CA" } }, "required": ["location"] } } ], "messages": [ { "role": "user", "content": "What's the weather like in New York?" } ] } ``` ## System Prompts Include system instructions using the `system` parameter: ```json { "model": "anthropic/claude-sonnet-4-20250514", "max_tokens": 1024, "system": "You are a helpful assistant that always responds in a friendly, professional manner.", "messages": [ { "role": "user", "content": "Hello!" } ] } ``` ## Error Handling The API returns standard HTTP status codes: - `200` - Success - `400` - Bad Request (invalid parameters) - `401` - Unauthorized (invalid API key) - `403` - Forbidden (insufficient permissions) - `429` - Rate Limited - `500` - Internal Server Error Example error response: ```json { "error": { "type": "invalid_request_error", "message": "max_tokens is required" } } ``` ## Response Format Successful responses follow the Anthropic Messages format: ```json { "id": "msg_01ABC123", "type": "message", "role": "assistant", "content": [ { "type": "text", "text": "Hello! I'm Claude, an AI assistant. How can I help you today?" } ], "model": "anthropic/claude-sonnet-4-20250514", "stop_reason": "end_turn", "usage": { "input_tokens": 12, "output_tokens": 18 } } ``` ## Key Differences from OpenAI Chat Completions - **Authentication:** Uses `x-api-key` header instead of `Authorization: Bearer` - **Required `max_tokens`:** Unlike OpenAI's API, the `max_tokens` parameter is required - **Content Blocks:** Messages use content blocks for rich content (text, images, tool calls) - **System Parameter:** System prompts are specified as a separate `system` parameter, not as a message - **Role Restrictions:** Only `user` and `assistant` roles are supported in messages (no `system` role) > **Tip:** For the most seamless experience with Anthropic models, use this endpoint. For broader compatibility across all providers, consider using the [Chat Completions endpoint](/api-reference/endpoint/chat-completions-create) instead. --- ## Create Response Source: https://docs.requesty.ai/api-reference/endpoint/responses-create.md Send input to an OpenAI-compatible model and receive a response. This endpoint follows the OpenAI Responses API format and supports all OpenAI models that expose the Responses API natively, as well as compatible models from other providers through Requesty's routing. ## Base URL ``` https://router.requesty.ai/v1/responses ``` ## Authentication The Responses endpoint accepts either OpenAI-style bearer auth or Anthropic-style `x-api-key` auth. ```bash Authorization: Bearer YOUR_REQUESTY_API_KEY ``` ```bash x-api-key: YOUR_REQUESTY_API_KEY ``` ## Example Request ```bash curl https://router.requesty.ai/v1/responses \ -H "Content-Type: application/json" \ -H "Authorization: Bearer YOUR_REQUESTY_API_KEY" \ -d '{ "model": "openai-responses/gpt-5", "input": "Tell me a three sentence bedtime story about a unicorn." }' ``` ## Using the OpenAI SDK ```python from openai import OpenAI client = OpenAI( api_key="YOUR_REQUESTY_API_KEY", base_url="https://router.requesty.ai/v1", ) response = client.responses.create( model="openai-responses/gpt-5", input="Tell me a three sentence bedtime story about a unicorn.", ) print(response.output_text) ``` ## Input Formats The `input` field accepts either a plain string or an array of typed input items (messages, tool calls, tool results, reasoning). ## Instructions Use the `instructions` parameter to set a system-level prompt that applies to the entire request. ## Streaming Enable streaming by setting `stream: true`. Events are delivered using the OpenAI Responses event format (`response.created`, `response.output_text.delta`, `response.completed`, etc.). ## Tool Use Tools use the flatter Responses shape: `name`, `description`, and `parameters` live at the top level of each tool entry. Tool results are returned as `function_call_output` items in `input` on the next turn. ## Reasoning For reasoning-capable models (e.g. `openai-responses/gpt-5`, `openai-responses/o3`), configure `reasoning.effort` (`low`, `medium`, `high`) and `reasoning.summary` (`auto`, `concise`, `detailed`). ## Structured Outputs Set `text.format` to enforce JSON-mode or a strict JSON Schema on the output. ## Response Format A successful response follows the OpenAI Responses format with `output` items (`message`, `function_call`, `reasoning`, etc.) and a `usage` block. The `usage.cost` field is a Requesty extension reporting the USD cost of the request, returned by default on non-streaming responses and on the final `response.completed` event when streaming. ## Key Differences from OpenAI Chat Completions - **`input` instead of `messages`:** Accepts a string or a list of typed items. - **`instructions` instead of system messages.** - **Flat tool shape:** No nested `function` wrapper. - **Content types are prefixed:** `input_text`, `input_image`, `input_file`; outputs use `output_text` and `output_refusal`. - **Event-typed streaming:** Named events rather than choice deltas. - **`max_output_tokens` instead of `max_tokens`.** --- ## Create Image Source: https://docs.requesty.ai/api-reference/endpoint/images-generations-create.md Generate images from a text prompt using OpenAI-compatible image generation models through Requesty's routing. ## Base URL ``` https://router.requesty.ai/v1/images/generations ``` ## Authentication Include your Requesty API key in the request headers: ```bash Authorization: Bearer YOUR_REQUESTY_API_KEY ``` ## Example Request ```bash curl https://router.requesty.ai/v1/images/generations \ -H "Content-Type: application/json" \ -H "Authorization: Bearer YOUR_REQUESTY_API_KEY" \ -d '{ "model": "azure/openai/gpt-image-1", "prompt": "A watercolor painting of a Japanese garden in autumn", "n": 1, "size": "1024x1024", "quality": "auto" }' ``` ## Supported Models - `azure/openai/gpt-image-1` -- OpenAI's GPT Image 1 model via Azure - `azure/openai/gpt-image-1.5` -- OpenAI's GPT Image 1.5 model via Azure ## Transparent Backgrounds Use the `background` parameter to generate images with transparent backgrounds (useful for logos, icons, and design assets): ```bash curl https://router.requesty.ai/v1/images/generations \ -H "Content-Type: application/json" \ -H "Authorization: Bearer YOUR_REQUESTY_API_KEY" \ -d '{ "model": "azure/openai/gpt-image-1", "prompt": "A simple icon of a rocket ship", "background": "transparent", "output_format": "png" }' ``` ## Error Handling The API returns standard HTTP status codes: - `200` - Success - `400` - Bad Request (invalid parameters) - `401` - Unauthorized (invalid API key) - `429` - Rate Limited - `500` - Internal Server Error > **Info:** This endpoint is fully compatible with the OpenAI Images API. You can use the OpenAI SDK's `client.images.generate()` method directly. > **Tip:** For models that generate images as part of a conversational response (e.g., Gemini), use the [Chat Completions endpoint](/api-reference/endpoint/chat-completions-create) instead. See the [Image Generation feature guide](/features/image-generation) for a full comparison. --- ## Create Embedding Source: https://docs.requesty.ai/api-reference/endpoint/embeddings-create.md --- ## List Models Source: https://docs.requesty.ai/api-reference/endpoint/models-list.md --- ## List API Keys Source: https://docs.requesty.ai/api-reference/endpoint/manage-apikey/manage-api-key-list.md List all API keys in your organization. Returns information about each API key including its ID, name, limits, permissions, labels, and creator. --- ## Create API Key Source: https://docs.requesty.ai/api-reference/endpoint/manage-apikey/manage-api-key-create.md Create a new API key for your organization. The API key will be created with the specified name and monthly limit. > **Warning:** The API key string is only returned once upon creation. Make sure to save it securely as it cannot be retrieved later. --- ## Get API Key Usage Source: https://docs.requesty.ai/api-reference/endpoint/manage-apikey/manage-api-key-get-usage.md Get usage statistics for a specific API key within a date range. Supports aggregation by different time periods and optional grouping by user, model, or custom fields. --- ## Update API Key Limit Source: https://docs.requesty.ai/api-reference/endpoint/manage-apikey/manage-api-key-update-limit.md Update the monthly spending limit for an API key. --- ## Update API Key Labels Source: https://docs.requesty.ai/api-reference/endpoint/manage-apikey/manage-api-key-update-label.md Update labels for an API key. Labels are key-value pairs that can be used for organization and filtering. Setting an empty object will remove all labels. > **Tip:** Labels are useful for organizing and filtering API keys. Common use cases include tagging by environment (production, staging), team, or project. --- ## Update API Key Expiry Source: https://docs.requesty.ai/api-reference/endpoint/manage-apikey/manage-api-key-update-expiry.md Update the expiry date for an API key. If `expires_at` is not set or is null, the current expiry will be removed, making the API key non-expiring. > **Tip:** Setting an expiry date helps ensure API keys are automatically invalidated after a certain time period, improving security for temporary access scenarios. > **Warning:** You cannot update the expiry for an API key that has already expired. Once an API key is expired, it cannot be un-expired. --- ## Delete API Key Source: https://docs.requesty.ai/api-reference/endpoint/manage-apikey/manage-api-key-delete.md Delete an API key from your organization. This action cannot be undone. > **Warning:** Deleting an API key is permanent and cannot be undone. All requests using this API key will fail immediately after deletion. --- ## List Groups Source: https://docs.requesty.ai/api-reference/endpoint/manage-group/manage-group-list.md List all groups in your organization. --- ## Create Group Source: https://docs.requesty.ai/api-reference/endpoint/manage-group/manage-group-create.md Create a new group in your organization. --- ## Get Group Source: https://docs.requesty.ai/api-reference/endpoint/manage-group/manage-group-get.md Get detailed information about a specific group including its members. --- ## Delete Group Source: https://docs.requesty.ai/api-reference/endpoint/manage-group/manage-group-delete.md Delete a group from your organization. --- ## Add Group Member Source: https://docs.requesty.ai/api-reference/endpoint/manage-group-member/manage-group-member-add.md Add a member to a group in your organization. The member will be assigned the specified role. --- ## Update Group Member Source: https://docs.requesty.ai/api-reference/endpoint/manage-group-member/manage-group-member-update.md Update the role of a member in a group. --- ## Remove Group Member Source: https://docs.requesty.ai/api-reference/endpoint/manage-group-member/manage-group-member-remove.md Remove a member from a group. --- ## Get Organization Source: https://docs.requesty.ai/api-reference/endpoint/manage-org-get.md Get information about your organization, including name and current balance. ---