Requesty Router
The Requesty Router provides a universal interface with multiple LLM providers. It includes many additional benefits out-of-the-box such as logging, analytics, security and function calling.
Last updated
Was this helpful?
The Requesty Router provides a universal interface with multiple LLM providers. It includes many additional benefits out-of-the-box such as logging, analytics, security and function calling.
Last updated
Was this helpful?
Requesty Router acts as a universal interface to multiple LLM providers, similar to to traditional LLM routers, but comes with a lot of additional value:
Analytics & Logging: Every request and response is tracked, and insights can be sent to a dedicated endpoint for usage analytics, latency measurements, and performance tuning.
Auto-Tagging: Automatically tags requests with metadata (like function calls used, latencies, topics, tone of voice and much more.) to help with observability and future optimization.
Function Calling & Tool Use: Similar to OpenAI’s function calling and OpenRouter’s tool interface, our router supports tool invocation to augment LLM capabilities.
All of this is done through a simple, OpenAI-compatible API structure. If you’re already using the OpenAI Python SDK or curl, you can integrate our router by just changing the base_url and providing our API key.
API Key: You need to login at https://app.requesty.ai/sign-up and setup an API-key at https://app.requesty.ai/insight-api.
Once you have your API-key you will need to change the base_url
of your Openai Client and change it to
https://router.requesty.ai/v1
This endpoint adheres to a request structure similar to the OpenAI Chat Completion API (link, but with extended features.
You can use the standard OpenAI Python client by simply changing openai.api_base
and using our API key header:
This makes a request to the local router which proxies it to the appropriate model provider, returning a completion response in OpenAI format. Additionally you can add additional information to the headers
you can specify (like HTTP-Referer
and X-Title
) which can help with analytics and app discoverability.
Your request body to /v1/chat/completions
closely follows the OpenAI Chat Completion schema:
Required Fields:
messages
: An array of message objects with role
and content
. Roles can be user
, assistant
, system
, or tool
.
model
: The model name. If omitted, defaults to the user’s or payer’s default model. Here is a full list of the supported models.
Optional Fields:
prompt
: Alternative to messages
for some providers.
stream
: A boolean to enable Server-Sent Events (SSE) streaming responses.
max_tokens
, temperature
, top_p
, etc.: Standard language model parameters.
tools / functions
: Allows function calling with a schema defined. See OpenAI's function calling documentation for the structure of these requests.
tool_choice
: Specifies how tool calling should be handled.
response_format
: For structured responses (some models only).
Security Level (optional):
You can include a custom header X-Security-Level
to enforce content security validation. Levels range from none
, basic
, advanced
, enterprise
, etc. This ensures that your prompts and responses adhere to domain-specific compliance and security checks. More information about security at the end of this document
Here, we also provide a tool (get_current_weather
) that the model can call if it decides the user request involves weather data.
Some request fields require a different function, for example if you use response_format
you'll need to update the request to client.beta.chat.completions.parse
and you may want to use the Pydantic or Zod format for your structure.
The response is normalized to an OpenAI-style ChatCompletion
object:
Streaming: If stream: true
, responses arrive incrementally as SSE events with data: lines.
Function Calls (Tool Calls): If the model decides to call a tool, it will return a function_call
in the assistant message. You then execute the tool, append the tool’s result as a role: "tool"
message, and send a follow-up request. The LLM will then integrate the tool output into its final answer.
If the model decides it needs the weather tool:
You would then call the get_current_weather
function externally, get the result, and send it back as:
The next completion will return a final answer integrating the tool’s response.
The router supports streaming responses from all providers (OpenAI, Anthropic, Mistral) using Server-Sent Events (SSE). Streaming allows you to receive and process the response token by token instead of waiting for the complete response.
Enable streaming by setting stream=True
in your request
The response will be a stream of chunks that you need to iterate over
Each chunk contains a delta of the response in the same format as the OpenAI API
Important Notes
Content Access:
Always check if delta.content
is not None before using it
Content comes in small chunks that you may want to collect into a full response
Function Calls:
Function calls are also streamed and come through the delta.function_call
property
Check for both name and arguments as they might come in separate chunks
Error Handling:
Wrap streaming code in try/except to handle potential connection issues
The stream might end early if there are errors
Best Practices:
Use flush=True
when printing to see output immediately
Consider collecting chunks if you need the complete response
For production, implement proper error handling and retry logic
Every request and response is logged. Additional metadata like latencies you can add additional information such as user_id or location or any metadata sending it in the meta
field , enabling you to:
Measure end-to-end latency.
Track usage and cost.
Inspect tool calls and security violations.
Optimize prompt design based on user behavior.
No additional configuration is needed if you’re okay with default behavior.
Our router provides a familiar, OpenAI-like interface enriched with analytics, safety, and advanced routing. With minimal changes (just switch base_url
and provide the right API keys and headers), you can leverage multiple LLM providers, function calling, logging, auto-tagging, and security checks.
Key Points:
Drop-in replacement for OpenAI endpoints.
Integrate tool calls easily.
Receive detailed analytics for every request.
Use streaming SSE responses for real-time token generation.
Enjoy multimodal and structured response format support.
Try it out, explore advanced settings, and build secure, observable, and powerful LLM-driven applications with ease!