Two new ways to control which models your teams can reach. Mark an access list as restricted so only org admins can attach it to groups or API keys, while group admins stay free to attach the rest. Set a group default access list that applies to any group without its own explicit list, giving every team a baseline without per-group setup. The resolution order now runs through four layers: API key list, then group list, then the group default, then your organization’s approved models.Access Lists docs →
Spending alert webhooks now include the resource ID alongside the name in every payload. User, group, and API key alerts carry
user_id, group_id, and api_key_id fields, and the Slack, Teams, and JSON formats all show the ID next to the resource name. No more guessing which “Production” key or “Engineering” group an alert refers to when names collide or change.Spending Alerts docs →Organizations that need a hard data residency guarantee can now enforce EU routing server-side. With strict EU enforcement enabled, every request from your organization must go through
https://router.eu.requesty.ai, and requests to non-EU endpoints are rejected. Contact [email protected] to activate it for your organization.EU Routing docs →The Requesty n8n package (
@requesty/n8n-nodes-requesty v1.1.0) now includes a Requesty Image Generation node. Generate images from text prompts with models like azure/openai/gpt-image-1, with control over size, quality, background (including transparent), and output format. Get binary image data with a preview in the n8n output panel, or URLs for downstream use. The node also works as an AI Agent tool, so an agent can generate images on its own when a user asks.Set up n8n →Four models are now free to use through the gateway:
nvidia/nemotron-3-ultra-550b-a55b, nvidia/nemotron-3-super-120b-a12b, poolside/laguna-xs.2, and poolside/laguna-m.1. New organizations get 50 free requests per day and paying organizations get 200, shared across all free models. Free for now, any pricing change will be announced here first.Free Models docs →Automate member offboarding alongside your existing API-based onboarding. Call
DELETE /v1/manage/org/member/{user_id} with a write-manage API key to remove a member from your organization: they lose access to the organization, all API keys they created in it are invalidated, and they are removed from all of its groups. Look up user IDs with the List Organization Members endpoint.Delete Organization Member reference →Models now carry a
retires_at date so you can plan migrations before a model goes away. The List Models response includes the field for any model with a scheduled retirement, and the date is shown on each model card in the Model Library. Once a model’s retires_at date passes, requests to it fail, so switch to a successor or add a fallback policy ahead of time.List Models reference →Enable real-time web search with a single tool definition that works across the Messages, Chat Completions, and Responses APIs. Pass
{ "type": "web_search" } and Requesty translates it to each provider’s native web search format automatically, normalizing citations and results behind one interface. Look for supports_web_search: true in the List Models response to find compatible models.Web Search docs →Get guaranteed, schema-valid JSON from supported models. Set
response_format to { "type": "json_schema" } with your schema and the model output conforms to it exactly. The List Models response now includes a supports_json_schema flag so you can find compatible models programmatically.Structured Outputs docs →Build AI workflows in n8n with 300+ models through Requesty. Install the Requesty Chat Model community node (
@requesty/n8n-nodes-requesty) and drop it into any AI Agent or Basic LLM Chain. Get strict JSON Schema structured output, native web search, reasoning effort control, and routing policies, all behind one API key.Set up n8n →Use GitHub Copilot Chat in VS Code with 300+ models through Requesty. Add Requesty as a Bring Your Own Key (BYOK) Custom Endpoint provider to get model routing, cost tracking, and fallback policies for Copilot Chat, tools, and MCP servers. Requires VS Code 1.122+.Set up GitHub Copilot →
A comprehensive reference for every router error code — what it means, where it originates, and how to fix it. Plus, you can now group by
status_code in Advanced Analytics to track error trends and costs in your dashboards.Error Codes docs →Pick an exact start and end date for any analytics view. Use relative presets or switch to Date Range mode for full control over the time window.Usage Analytics docs →
Inspect every LLM request in a searchable table with click-to-filter, configurable columns, and pagination. Switch to Traces mode to see multi-step agent runs grouped by
trace_id. Click any row to open a detail panel with the full message timeline, tool call arguments, metadata, guardrail violations, and a model arena for side-by-side comparison.Logs & Traces docs →Download your advanced analytics data as a professional PDF report or CSV spreadsheet. The PDF includes your Requesty logo, date range, and a formatted data table, perfect for sharing cost reports with your team or keeping monthly records.Analytics Exports docs →
Use OpenAI Codex with 300+ models through Requesty. Get model routing, cost tracking, and fallback policies for your Codex coding agent.Set up Codex →
The Get API Key and List API Keys endpoints now include a
group field showing which group each key belongs to, making it easier to manage keys programmatically.API Keys reference →Route requests through the EU endpoint with the newest Gemini models:
vertex/gemini-3.5-flash@eu and vertex/gemini-3.1-flash-lite@eu. Full data residency when combined with the EU endpoint.EU routing docs →Lock down access with Entra ID (Azure AD), Okta, or any OIDC/SAML provider. Members authenticate through your identity provider and land directly in Requesty.Set up SSO →
Create named model allow-lists and attach them to individual API keys or groups. Control exactly which models each key can call without touching your org-wide settings.Create an access list →
Route OpenAI
/v1/responses calls through the gateway with full analytics, fallback, and cost tracking. Custom tool types are supported.Responses API reference →API responses now include a
usage.cost field with the exact dollar amount. For streaming, set stream_options.include_usage to get cost on the final chunk.Cost tracking docs →A new Management API endpoint returns aggregated spend and token counts across your entire org, with the same time filters available on key-level usage.Org Usage API →
The
/v1/models endpoint now returns geolocation data for each model. The Model Library shows EU/US region chips so you can pick the right model before routing.EU routing docs →Connect the Pi coding agent for model routing, cost tracking, and fallback policies across your coding workflows.Set up Pi →
Context length overflows, unsupported image formats, and other provider errors are now translated into plain, actionable messages instead of generic errors.
Two new endpoints:
/v1/audio/speech for text-to-speech and /v1/audio/transcriptions for speech-to-text. Multiple providers including OpenAI and Mistral with automatic fallback.Speech API → · Transcription API →Send image edit requests through
/v1/images/edits with the same multi-provider routing and fallback as generation.Image Edits API →Pass
HTTP-Referer and X-Title headers to label requests by app or site. Filter your analytics dashboard by these values to see cost and latency per integration.Analytics headers docs →Set dollar thresholds on API keys and receive Slack or Microsoft Teams webhooks when spend crosses them. Configurable trigger percentages give you time to act.Configure alerts →
Use Requesty as the backend for the Claude Cowork desktop assistant. All traffic gets unified analytics, cost controls, and model policies.Set up Claude Cowork →
Route OpenCode terminal agent traffic through Requesty. A one-line installer adds analytics tracking to your setup.Set up OpenCode →
The admin panel now lets you set each guardrail policy to Disabled, Report, or Mask individually. A new violations column and detail tab in logs shows exactly what fired.Guardrails docs →
JSON Schema and json_object modes are now available on
/v1/responses, matching the Chat Completions feature set.Structured outputs docs →Azure deployments across European regions are now automatically recognized under the EU filter in the Model Library and routing engine.EU routing docs →
Provider grouping with expand/collapse, region and capability filters, bulk approve/remove, preset quick-filters, and a “New” tab that surfaces recently released models per provider.Manage approved models →
Pick two models, send the same prompt, and see which responds better. The redesigned chat playground also supports image attachments and markdown rendering.
Expandable table rows now show each service account’s API keys, monthly spend, and creator at a glance.Service accounts docs →
The gateway extracts and formats PDF content across providers that support document input. Just include the file in your chat completions request.PDF support docs →
Dedicated model aliases like
coding/ select the right model, provider, and parameters for your workload automatically.Dedicated models docs →Generate embeddings with Google’s latest Gemini Embedding 2.0 model through Requesty, with automatic provider selection.
The analytics dashboard now supports P95 and P99 latency percentiles, pivot tables for multi-dimensional breakdowns, and flexible time ranges including This Week, Month, Quarter, and Year.Usage analytics → · Performance monitoring →
The models list now shows which models support tool calling. Use this to filter models by capability before routing or to build smarter model selection.
Route OpenClaw autonomous agent workloads through Requesty for unified analytics and cost controls.Set up OpenClaw →
Groups can now have their own approved model list, independent of the org-wide setting. Regional model approval handles providers with location-specific deployments correctly.Groups docs →
Set dollar thresholds on your organization and receive Slack notifications when spend crosses them.Alerts docs →
The API keys table is easier to scan with cleaner columns, hover tooltips for long values, and one-click copy. The same improved layout appears in both admin and user views.
Select any two requests in the logs table and open a JSON diff viewer. Added, modified, and removed fields are highlighted with one-click filtering.
Filter the traces page by trace ID, API key name, or user email. Cached percentage is now visible per trace.Session reconstruction docs →
Groups now show spend percentage and budget overrides directly in the table. Admins can adjust limits without navigating away.Groups docs →
Org admins can change member roles at both the organization and group level. Safety checks prevent admins from accidentally demoting themselves.Users and roles docs →
A new column shows how many reasoning tokens each request consumed, giving visibility into model “thinking” costs.
Scope cost, latency, and usage breakdowns to specific API key labels for more targeted reporting.Usage analytics →