Send input to an OpenAI-compatible model using the Responses API format and receive a response.
Send input to an OpenAI-compatible model and receive a response. This endpoint follows the OpenAI Responses API format and supports all OpenAI models that expose the Responses API natively, as well as compatible models from other providers through Requesty’s routing.Documentation Index
Fetch the complete documentation index at: https://docs.requesty.ai/llms.txt
Use this file to discover all available pages before exploring further.
x-api-key auth. Use whichever your client library expects.
| Header | Required | Description |
|---|---|---|
Authorization | ✅ * | Bearer token with your Requesty key |
x-api-key | ✅ * | Your Requesty API key (alternative) |
Content-Type | ✅ | Must be application/json |
Authorization or x-api-key.
base_url at Requesty:
openai-responses/gpt-5, openai-responses/gpt-5-mini, openai-responses/gpt-4.1, openai-responses/gpt-4oanthropic/claude-sonnet-4-5, anthropic/claude-opus-4google/gemini-2.5-pro, google/gemini-2.5-flashmistral/mistral-large-2411, meta/llama-3.3-70b-instructresponse.* event stream), use the openai-responses/ prefix. The standard openai/ prefix routes through Chat Completions under the hood.input field accepts either a plain string or an array of input items. Use the array form for multi-turn conversations, tool results, and rich content.
instructions parameter to set a system-level prompt that applies to the entire request. It is equivalent to a system or developer message at the start of the conversation.
stream: true. The response is delivered as Server-Sent Events using the OpenAI Responses event format (response.created, response.output_text.delta, response.completed, etc.).
usage block with cost on streaming requests, no additional parameter is required. The response.completed event includes the full usage object.
input_image content type. You can pass an image URL or a base64 data URL.
input_file content type. You can provide the PDF as either a base64 data URL or a remote URL.
name, description, and parameters live at the top level of each tool entry.
function_call_output item in input:
openai-responses/gpt-5, openai-responses/o3), configure reasoning effort and the optional summary:
effort: low, medium, or high. Lower effort produces faster responses with fewer reasoning tokens.summary: auto, concise, or detailed. Controls whether the model returns a reasoning summary alongside the final answer.text.format to enforce JSON-mode or a strict JSON Schema on the output.
cost field inside usage is a Requesty extension and reports the USD cost of the request. It is returned by default on non-streaming responses, and on the final response.completed event when streaming. See Cost Tracking.
200 - Success400 - Bad Request (invalid parameters)401 - Unauthorized (invalid API key)403 - Forbidden (insufficient permissions)429 - Rate Limited500 - Internal Server Errorinput instead of messages: Accepts a string or a list of typed items (messages, tool calls, tool results, reasoning).instructions instead of system messages: System prompts are passed via the top-level instructions field.name, description, and parameters directly, without the nested function wrapper.input_text, input_image, input_file for user inputs; output_text and output_refusal for model outputs.response.created, response.output_text.delta, response.completed) rather than choice deltas.max_output_tokens instead of max_tokens: Caps the total of visible and reasoning tokens.Your Requesty API key. Alternative to the standard Authorization: Bearer header.
The model to use for the response. To route OpenAI models through their native Responses API, use the openai-responses/ prefix (e.g. openai-responses/gpt-5).
"openai-responses/gpt-5"
Text, image, or file inputs to the model. Either a plain string or an array of typed input items.
"Tell me a three sentence bedtime story about a unicorn."
Inserts a system (or developer) message as the first item in the model's context.
Upper bound for the number of tokens that can be generated, including visible output tokens and reasoning tokens.
x >= 1If true, the response is streamed to the client as it is generated using server-sent events.
Sampling temperature between 0 and 2. Higher values produce more random output.
0 <= x <= 2Nucleus sampling: consider tokens with cumulative probability mass up to top_p.
0 <= x <= 1Whether to allow the model to run tool calls in parallel.
Controls which (if any) tool is called by the model.
auto, none, required Tools the model may call.
Reasoning configuration for reasoning-capable models.
Output text configuration, including structured output format.
Specify additional output data to include in the model response.
Set of key-value pairs that can be attached to the request.
Whether to store the generated model response for later retrieval via API.
The truncation strategy to use for the model response.
A unique identifier representing your end-user.
Response
Unique identifier for this response.
Object type.
response Unix timestamp (in seconds) of when the response was created.
Model ID used to generate the response.
Status of the response generation.
completed, failed, in_progress, incomplete Output items from the model. Typically one or more message, function_call, or reasoning items.