Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.requesty.ai/llms.txt

Use this file to discover all available pages before exploring further.

Switch from OpenAI in 2 lines

If you’re already using the OpenAI SDK, point it at Requesty and you’re done. No SDK changes, no new client to learn.
from openai import OpenAI

client = OpenAI(
    api_key="REQUESTY_API_KEY",                       # was: OPENAI_API_KEY
    base_url="https://router.requesty.ai/v1",         # was: https://api.openai.com/v1
    default_headers={
        "HTTP-Referer": "https://yourapp.com",        # Optional – your site URL for analytics
        "X-Title": "My App",                          # Optional – your app name for analytics
    },
)
Every SDK and framework that speaks OpenAI (LangChain, Vercel AI SDK, LlamaIndex, Haystack, Pydantic AI) works with Requesty out of the box. Same for the Anthropic SDK against /anthropic/v1/messages.

Three steps to your first request

1

Get your API key

Sign up at app.requesty.ai and create a key on the API Keys page. New accounts include free credits to start routing immediately.Export it so the snippets below just work:
export REQUESTY_API_KEY="sk-..."
2

Install the SDK

pip install openai
3

Make your first request

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["REQUESTY_API_KEY"],
    base_url="https://router.requesty.ai/v1",
    default_headers={
        "HTTP-Referer": "https://yourapp.com",  # Optional – your site URL
        "X-Title": "My App",                    # Optional – your app name
    },
)

response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Hello, who are you?"}],
)

print(response.choices[0].message.content)
Requesty returns an OpenAI-compatible response with a few extra response headers so you can see which provider served the request and whether it hit cache.
Response body
{
  "id": "chatcmpl-9pV...",
  "object": "chat.completion",
  "created": 1738956032,
  "model": "openai/gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "I'm an AI assistant made by OpenAI, served to you through Requesty."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": { "prompt_tokens": 13, "completion_tokens": 17, "total_tokens": 30 }
}
Response headers
x-requesty-provider: openai
x-requesty-cache: MISS
x-requesty-latency-ms: 412
x-requesty-request-id: req_01HYZ...
The HTTP-Referer and X-Title headers are optional but recommended. Requesty uses them to tag your requests in analytics β€” HTTP-Referer identifies your site URL and X-Title gives your app a human-readable name. Both appear in your analytics dashboards so you can filter traffic by origin.
4

Make it production-ready (bonus)

Two upgrades turn this from a toy into something you can ship. Neither requires new infra.Add metadata so every request is attributable in analytics. Tag by feature, user, or trace ID to slice spend and latency the way you already think about your product.Route to a policy instead of a single model. Create a Fallback Policy once, then reference it by name. If the primary model errors or times out, Requesty tries the next one. No retry logic in your app.
response = client.chat.completions.create(
    model="policy/sonnet-with-fallback",  # set up once in the dashboard
    messages=[{"role": "user", "content": "Hello, who are you?"}],
    extra_body={
        "requesty": {
            "tags": ["quickstart", "chat"],
            "user_id": "user_1234",
            "trace_id": "session_abc123",
            "extra": {
                "feature": "onboarding",
                "environment": "production",
            },
        },
    },
)
Learn more: Request Metadata Β· Fallback Policies Β· Load Balancing.

Pick a model

Every model lives behind one endpoint. Swap model in the request to switch providers. No other code changes, no new SDK, no new auth.

Frontier

Claude Opus, GPT-5, Gemini 2.5 Pro. Maximum capability for hard tasks.

Fast & cheap

Haiku, GPT-5 mini, Gemini Flash. Sub-second latency, pennies per million tokens.

Open source

Llama, Qwen, DeepSeek. Hosted or bring-your-own endpoints.

What Requesty adds on top

Fallback routing

Auto-reroute failed requests to backup models. No more 5xx surprises.

Auto-caching

Cut costs up to 80% on repeated prompts with zero configuration.

Usage analytics

Track spend, latency, and errors per key, user, model, or project.

Load balancing

Distribute traffic across providers by cost, latency, or custom weights.

Bring your own keys

Use your own provider accounts and keep existing pricing.

Guardrails & RBAC

Content filtering, approved-model lists, and role-based access.

Use your favorite framework

LangChain

Vercel AI SDK

LlamaIndex

Haystack

Pydantic AI

Axios

Requests

Anthropic SDK

Common questions

Direct provider calls give you one model, one auth method, one point of failure. Requesty gives you one endpoint across 300+ models, automatic fallback, shared caching, unified analytics, and one bill. For teams running production AI, that’s the difference between a side project and an SLA.
No. You pay provider prices, and you can see exact per-request costs in the analytics dashboard. Use Bring Your Own Keys to keep provider discounts and committed-use pricing.
No. Requesty is a pass-through gateway. We don’t train on your requests or responses. See our data handling and EU routing for residency options.
Point the Anthropic SDK at https://router.requesty.ai/anthropic/v1/messages. See the Anthropic Agent SDKs guide for code samples.

Next steps

API Reference

Full endpoint documentation with an interactive playground.

Use with Claude Code

Point Claude Code at Requesty for unified billing and routing.

Configure routing

Set up fallbacks, load balancing, and latency-aware routing.

Join the community

Ask questions, share builds, and meet the team on Discord.
Last modified on May 4, 2026