Changelog - Requesty Docs

July 10, 2026

New

Prompt Library response formats for reusable structured outputs

Prompts can now carry their own response format. In the Prompt Library, use Response Format to set free-form JSON output or a named JSON Schema with optional description and strict mode. Requests that reference the prompt with prompt_id inherit the saved response_format automatically, and prompt-level response formats override caller-provided response formats. The editor’s JSON view and version sidebar now surface response format settings alongside model parameters.Prompt Library docs →

July 3, 2026

New

Prompt Library with model parameters, versioning, and diffs

The Prompt Library now supports model parameters, version diffs, and a redesigned editor. Attach temperature, reasoning effort, max tokens, top_p, and n directly to a prompt so every request inherits them automatically. Compare any two versions with a line-by-line diff viewer that highlights added, removed, and modified messages. The new prompts list includes search, sortable columns, tags, and copyable prompt IDs. Reference any prompt in your API requests with a single prompt_id and optional prompt_variables for runtime customization.Prompt Library docs →

June 23, 2026

API

Access list and group member management endpoints in the Management API

Full programmatic control over access lists and group member settings is now available in the Management API. Create, read, update, and delete access lists, set the organization’s default group access list, update a member’s role within a group, and set per-member budget overrides — all via API key with manage permissions.

June 30, 2026

Integrations

Hermes Agent integration with automatic prompt caching

Hermes Agent by Nous Research now integrates with Requesty out of the box. Configure Hermes to route through the gateway using the native Anthropic Messages format, which enables automatic prompt caching on multi-turn conversations. Hermes sends a large system prompt with tool definitions on every turn, and Requesty’s auto-caching means subsequent turns reuse the cached prefix instead of reprocessing it from scratch. One config file gets you 300+ models, fallback routing, and cost tracking across all your Hermes sessions.Set up Hermes →

June 12, 2026

New

Restricted access lists and a group default access list

Two new ways to control which models your teams can reach. Mark an access list as restricted so only org admins can attach it to groups or API keys, while group admins stay free to attach the rest. Set a group default access list that applies to any group without its own explicit list, giving every team a baseline without per-group setup. The resolution order now runs through four layers: API key list, then group list, then the group default, then your organization’s approved models.Access Lists docs →

June 12, 2026

New

Resource IDs in spending alert webhooks

Spending alert webhooks now include the resource ID alongside the name in every payload. User, group, and API key alerts carry user_id, group_id, and api_key_id fields, and the Slack, Teams, and JSON formats all show the ID next to the resource name. No more guessing which “Production” key or “Engineering” group an alert refers to when names collide or change.Spending Alerts docs →

June 12, 2026

New

Strict EU enforcement for organizations

Organizations that need a hard data residency guarantee can now enforce EU routing server-side. With strict EU enforcement enabled, every request from your organization must go through https://router.eu.requesty.ai, and requests to non-EU endpoints are rejected. Contact [email protected] to activate it for your organization.EU Routing docs →

June 10, 2026

Integrations

Image generation in the n8n community node

The Requesty n8n package (@requesty/n8n-nodes-requesty v1.1.0) now includes a Requesty Image Generation node. Generate images from text prompts with models like azure/openai/gpt-image-1, with control over size, quality, background (including transparent), and output format. Get binary image data with a preview in the n8n output panel, or URLs for downstream use. The node also works as an AI Agent tool, so an agent can generate images on its own when a user asks.Set up n8n →

June 10, 2026

New

Free models on the gateway

Four models are now free to use through the gateway: nvidia/nemotron-3-ultra-550b-a55b, nvidia/nemotron-3-super-120b-a12b, poolside/laguna-xs.2, and poolside/laguna-m.1. New organizations get 50 free requests per day and paying organizations get 200, shared across all free models. Free for now, any pricing change will be announced here first.Free Models docs →

June 10, 2026

NewAPI

Member offboarding via the Management API

Automate member offboarding alongside your existing API-based onboarding. Call DELETE /v1/manage/org/member/{user_id} with a write-manage API key to remove a member from your organization: they lose access to the organization, all API keys they created in it are invalidated, and they are removed from all of its groups. Look up user IDs with the List Organization Members endpoint.Delete Organization Member reference →

June 9, 2026

API

Model retirement dates in the API

Models now carry a retires_at date so you can plan migrations before a model goes away. The List Models response includes the field for any model with a scheduled retirement, and the date is shown on each model card in the Model Library. Once a model’s retires_at date passes, requests to it fail, so switch to a successor or add a fallback policy ahead of time.List Models reference →

June 8, 2026

New

Unified web search across all inference endpoints

Enable real-time web search with a single tool definition that works across the Messages, Chat Completions, and Responses APIs. Pass { "type": "web_search" } and Requesty translates it to each provider’s native web search format automatically, normalizing citations and results behind one interface. Look for supports_web_search: true in the List Models response to find compatible models.Web Search docs →

June 6, 2026

New

Strict JSON Schema structured outputs

Get guaranteed, schema-valid JSON from supported models. Set response_format to { "type": "json_schema" } with your schema and the model output conforms to it exactly. The List Models response now includes a supports_json_schema flag so you can find compatible models programmatically.Structured Outputs docs →

June 4, 2026

Integrations

Requesty community node for n8n

Build AI workflows in n8n with 300+ models through Requesty. Install the Requesty Chat Model community node (@requesty/n8n-nodes-requesty) and drop it into any AI Agent or Basic LLM Chain. Get strict JSON Schema structured output, native web search, reasoning effort control, and routing policies, all behind one API key.Set up n8n →

May 31, 2026

Integrations

Connect GitHub Copilot to Requesty

Use GitHub Copilot Chat in VS Code with 300+ models through Requesty. Add Requesty as a Bring Your Own Key (BYOK) Custom Endpoint provider to get model routing, cost tracking, and fallback policies for Copilot Chat, tools, and MCP servers. Requires VS Code 1.122+.Set up GitHub Copilot →

May 28, 2026

NewAnalytics

Error codes reference page

A comprehensive reference for every router error code — what it means, where it originates, and how to fix it. Plus, you can now group by status_code in Advanced Analytics to track error trends and costs in your dashboards.Error Codes docs →

May 27, 2026

Analytics

Custom time ranges in analytics

Pick an exact start and end date for any analytics view. Use relative presets or switch to Date Range mode for full control over the time window.Usage Analytics docs →

May 26, 2026

New

Logs & Traces view for full request visibility

Inspect every LLM request in a searchable table with click-to-filter, configurable columns, and pagination. Switch to Traces mode to see multi-step agent runs grouped by trace_id. Click any row to open a detail panel with the full message timeline, tool call arguments, metadata, guardrail violations, and a model arena for side-by-side comparison.Logs & Traces docs →

May 22, 2026

Analytics

Export analytics as PDF or CSV

Download your advanced analytics data as a professional PDF report or CSV spreadsheet. The PDF includes your Requesty logo, date range, and a formatted data table, perfect for sharing cost reports with your team or keeping monthly records.Analytics Exports docs →

May 21, 2026

Integrations

Connect OpenAI Codex to Requesty

Use OpenAI Codex with 300+ models through Requesty. Get model routing, cost tracking, and fallback policies for your Codex coding agent.Set up Codex →

May 20, 2026

API

API keys now return group information

The Get API Key and List API Keys endpoints now include a group field showing which group each key belongs to, making it easier to manage keys programmatically.API Keys reference →

May 19, 2026

New

EU routing with Gemini models

Route requests through the EU endpoint with the newest Gemini models: vertex/gemini-3.5-flash@eu and vertex/gemini-3.1-flash-lite@eu. Full data residency when combined with the EU endpoint.EU routing docs →

May 16, 2026

Security

Enforce SSO for your organization

Lock down access with Entra ID (Azure AD), Okta, or any OIDC/SAML provider. Members authenticate through your identity provider and land directly in Requesty.Set up SSO →

May 14, 2026

Security

Restrict models per API key with Access Lists

Create named model allow-lists and attach them to individual API keys or groups. Control exactly which models each key can call without touching your org-wide settings.Create an access list →

May 12, 2026

API

Use the Responses API through Requesty

Route OpenAI /v1/responses calls through the gateway with full analytics, fallback, and cost tracking. Custom tool types are supported.Responses API reference →

May 9, 2026

API

See the cost of every request inline

API responses now include a usage.cost field with the exact dollar amount. For streaming, set stream_options.include_usage to get cost on the final chunk.Cost tracking docs →

May 7, 2026

API

Query organization-level usage

A new Management API endpoint returns aggregated spend and token counts across your entire org, with the same time filters available on key-level usage.Org Usage API →

May 5, 2026

API

Filter models by deployment region

The /v1/models endpoint now returns geolocation data for each model. The Model Library shows EU/US region chips so you can pick the right model before routing.EU routing docs →

May 2, 2026

Integrations

Route Pi through Requesty

Connect the Pi coding agent for model routing, cost tracking, and fallback policies across your coding workflows.Set up Pi →

May 1, 2026

Improved

Clearer error messages from every provider

Context length overflows, unsupported image formats, and other provider errors are now translated into plain, actionable messages instead of generic errors.

Apr 28, 2026

API

Generate speech and transcribe audio

Two new endpoints: /v1/audio/speech for text-to-speech and /v1/audio/transcriptions for speech-to-text. Multiple providers including OpenAI and Mistral with automatic fallback.Speech API → · Transcription API →

Apr 25, 2026

API

Edit images through the gateway

Send image edit requests through /v1/images/edits with the same multi-provider routing and fallback as generation.Image Edits API →

Apr 23, 2026

Analytics

Tag traffic by app with analytics headers

Pass HTTP-Referer and X-Title headers to label requests by app or site. Filter your analytics dashboard by these values to see cost and latency per integration.Analytics headers docs →

Apr 21, 2026

New

Get alerted before budgets run out

Set dollar thresholds on API keys and receive Slack or Microsoft Teams webhooks when spend crosses them. Configurable trigger percentages give you time to act.Configure alerts →

Apr 18, 2026

Integrations

Route Claude Cowork through your org

Use Requesty as the backend for the Claude Cowork desktop assistant. All traffic gets unified analytics, cost controls, and model policies.Set up Claude Cowork →

Apr 16, 2026

Integrations

Connect OpenCode with one-liner analytics

Route OpenCode terminal agent traffic through Requesty. A one-line installer adds analytics tracking to your setup.Set up OpenCode →

Apr 14, 2026

Security

Control guardrail actions per policy

The admin panel now lets you set each guardrail policy to Disabled, Report, or Mask individually. A new violations column and detail tab in logs shows exactly what fired.Guardrails docs →

Apr 10, 2026

API

Structured outputs work with the Responses API

JSON Schema and json_object modes are now available on /v1/responses, matching the Chat Completions feature set.Structured outputs docs →

Apr 7, 2026

New

Azure EU regions auto-detected

Azure deployments across European regions are now automatically recognized under the EU filter in the Model Library and routing engine.EU routing docs →

Mar 27, 2026

Improved

Redesigned model management

Provider grouping with expand/collapse, region and capability filters, bulk approve/remove, preset quick-filters, and a “New” tab that surfaces recently released models per provider.Manage approved models →

Mar 24, 2026

New

Compare models side by side in the playground

Pick two models, send the same prompt, and see which responds better. The redesigned chat playground also supports image attachments and markdown rendering.

Mar 20, 2026

Improved

See service account details at a glance

Expandable table rows now show each service account’s API keys, monthly spend, and creator at a glance.Service accounts docs →

Mar 17, 2026

New

Send PDFs in your requests

The gateway extracts and formats PDF content across providers that support document input. Just include the file in your chat completions request.PDF support docs →

Mar 12, 2026

New

Pick a use case, skip the model selection

Dedicated model aliases like coding/ select the right model, provider, and parameters for your workload automatically.Dedicated models docs →

Mar 6, 2026

New

Google Gemini Embedding 2.0 support

Generate embeddings with Google’s latest Gemini Embedding 2.0 model through Requesty, with automatic provider selection.

Feb 26, 2026

Analytics

Deeper analytics with percentiles and pivot tables

The analytics dashboard now supports P95 and P99 latency percentiles, pivot tables for multi-dimensional breakdowns, and flexible time ranges including This Week, Month, Quarter, and Year.Usage analytics → · Performance monitoring →

Feb 24, 2026

New

See which models support tool calling

The models list now shows which models support tool calling. Use this to filter models by capability before routing or to build smarter model selection.

Feb 20, 2026

Integrations

Connect OpenClaw agents

Route OpenClaw autonomous agent workloads through Requesty for unified analytics and cost controls.Set up OpenClaw →

Feb 17, 2026

Improved

Restrict models per group

Groups can now have their own approved model list, independent of the org-wide setting. Regional model approval handles providers with location-specific deployments correctly.Groups docs →

Feb 12, 2026

New

Spending alerts with Slack webhooks

Set dollar thresholds on your organization and receive Slack notifications when spend crosses them.Alerts docs →

Feb 6, 2026

Improved

Polished API keys table

The API keys table is easier to scan with cleaner columns, hover tooltips for long values, and one-click copy. The same improved layout appears in both admin and user views.

Jan 29, 2026

New

Compare two requests side by side

Select any two requests in the logs table and open a JSON diff viewer. Added, modified, and removed fields are highlighted with one-click filtering.

Jan 24, 2026

Analytics

Filter traces by key, user, or ID

Filter the traces page by trace ID, API key name, or user email. Cached percentage is now visible per trace.Session reconstruction docs →

Jan 21, 2026

Improved

Manage group budgets inline

Groups now show spend percentage and budget overrides directly in the table. Admins can adjust limits without navigating away.Groups docs →

Jan 16, 2026

Improved

Update member roles from the dashboard

Org admins can change member roles at both the organization and group level. Safety checks prevent admins from accidentally demoting themselves.Users and roles docs →

Jan 12, 2026

Analytics

Reasoning tokens visible in logs

A new column shows how many reasoning tokens each request consumed, giving visibility into model “thinking” costs.

Jan 6, 2026

Analytics

Filter analytics by API key label

Scope cost, latency, and usage breakdowns to specific API key labels for more targeted reporting.Usage analytics →