Create Response - Requesty Docs

curl https://router.requesty.ai/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_REQUESTY_API_KEY" \
  -d '{
    "model": "openai-responses/gpt-5",
    "input": "Tell me a three sentence bedtime story about a unicorn."
  }'

{
  "id": "resp_abc123",
  "object": "response",
  "created_at": 1748200000,
  "model": "openai-responses/gpt-5",
  "output": [
    {
      "type": "message",
      "role": "assistant",
      "content": [
        {
          "type": "output_text",
          "text": "Once upon a time, a tiny unicorn named Sparkle discovered a rainbow bridge leading to a hidden meadow. She danced under the stars with fireflies until the moon sang her a lullaby. And every night after, Sparkle dreamed of adventures yet to come."
        }
      ]
    }
  ],
  "usage": {
    "input_tokens": 15,
    "output_tokens": 58,
    "total_tokens": 73
  }
}

POST

responses

curl https://router.requesty.ai/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_REQUESTY_API_KEY" \
  -d '{
    "model": "openai-responses/gpt-5",
    "input": "Tell me a three sentence bedtime story about a unicorn."
  }'

{
  "id": "resp_abc123",
  "object": "response",
  "created_at": 1748200000,
  "model": "openai-responses/gpt-5",
  "output": [
    {
      "type": "message",
      "role": "assistant",
      "content": [
        {
          "type": "output_text",
          "text": "Once upon a time, a tiny unicorn named Sparkle discovered a rainbow bridge leading to a hidden meadow. She danced under the stars with fireflies until the moon sang her a lullaby. And every night after, Sparkle dreamed of adventures yet to come."
        }
      ]
    }
  ],
  "usage": {
    "input_tokens": 15,
    "output_tokens": 58,
    "total_tokens": 73
  }
}

curl https://router.requesty.ai/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_REQUESTY_API_KEY" \
  -d '{
    "model": "openai-responses/gpt-5",
    "input": "Tell me a three sentence bedtime story about a unicorn."
  }'

{
  "id": "resp_abc123",
  "object": "response",
  "created_at": 1748200000,
  "model": "openai-responses/gpt-5",
  "output": [
    {
      "type": "message",
      "role": "assistant",
      "content": [
        {
          "type": "output_text",
          "text": "Once upon a time, a tiny unicorn named Sparkle discovered a rainbow bridge leading to a hidden meadow. She danced under the stars with fireflies until the moon sang her a lullaby. And every night after, Sparkle dreamed of adventures yet to come."
        }
      ]
    }
  ],
  "usage": {
    "input_tokens": 15,
    "output_tokens": 58,
    "total_tokens": 73
  }
}

Send input to an OpenAI-compatible model and receive a response. This endpoint follows the OpenAI Responses API format and supports all OpenAI models that expose the Responses API natively, as well as compatible models from other providers through Requesty’s routing.

Base URL

https://router.requesty.ai/v1/responses

Authentication

The Responses endpoint accepts either OpenAI-style bearer auth or Anthropic-style x-api-key auth. Use whichever your client library expects.

Authorization: Bearer YOUR_REQUESTY_API_KEY

x-api-key: YOUR_REQUESTY_API_KEY

Headers

Header	Required	Description
`Authorization`	✅ *	Bearer token with your Requesty key
`x-api-key`	✅ *	Your Requesty API key (alternative)
`Content-Type`	✅	Must be `application/json`

* Provide one of Authorization or x-api-key.

Example Request

curl https://router.requesty.ai/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_REQUESTY_API_KEY" \
  -d '{
    "model": "openai-responses/gpt-5",
    "input": "Tell me a three sentence bedtime story about a unicorn."
  }'

Using the OpenAI SDK

The Responses endpoint is fully compatible with the official OpenAI SDK. Just point base_url at Requesty:

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_REQUESTY_API_KEY",
    base_url="https://router.requesty.ai/v1",
)

response = client.responses.create(
    model="openai-responses/gpt-5",
    input="Tell me a three sentence bedtime story about a unicorn.",
)

print(response.output_text)

Model Selection

You can use any model available in the Model Library. Requesty translates the request shape for non-OpenAI providers automatically.

OpenAI Models: openai-responses/gpt-5, openai-responses/gpt-5-mini, openai-responses/gpt-4.1, openai-responses/gpt-4o
Anthropic Models: anthropic/claude-sonnet-4-5, anthropic/claude-opus-4
Google Models: google/gemini-2.5-pro, google/gemini-2.5-flash
Other Providers: mistral/mistral-large-2411, meta/llama-3.3-70b-instruct

To route OpenAI models through their native Responses API (required for full feature parity, including file inputs and the response.* event stream), use the openai-responses/ prefix. The standard openai/ prefix routes through Chat Completions under the hood.

While this endpoint uses the OpenAI Responses format, Requesty automatically handles format conversion for non-OpenAI providers, so you can use any supported model with this endpoint.

Input Formats

The input field accepts either a plain string or an array of input items. Use the array form for multi-turn conversations, tool results, and rich content.

String input

{
	"model": "openai-responses/gpt-5",
	"input": "Write a haiku about routers."
}

Multi-turn input

{
	"model": "openai-responses/gpt-5",
	"input": [
		{ "role": "user", "content": "Hi, my name is John." },
		{ "role": "assistant", "content": "Hello John, nice to meet you." },
		{ "role": "user", "content": "What is my name?" }
	]
}

Instructions

Use the instructions parameter to set a system-level prompt that applies to the entire request. It is equivalent to a system or developer message at the start of the conversation.

{
	"model": "openai-responses/gpt-5",
	"instructions": "You are a helpful assistant that always responds in JSON.",
	"input": "Summarize the weather in Paris today."
}

Streaming

Enable streaming by setting stream: true. The response is delivered as Server-Sent Events using the OpenAI Responses event format (response.created, response.output_text.delta, response.completed, etc.).

{
	"model": "openai-responses/gpt-5",
	"input": "Write a short story.",
	"stream": true
}

To receive a final usage block with cost on streaming requests, no additional parameter is required. The response.completed event includes the full usage object.

Vision Support

Send images using the input_image content type. You can pass an image URL or a base64 data URL.

{
	"model": "openai-responses/gpt-5",
	"input": [
		{
			"role": "user",
			"content": [
				{ "type": "input_text", "text": "What is in this image?" },
				{
					"type": "input_image",
					"image_url": "https://example.com/image.jpg"
				}
			]
		}
	]
}

PDF Support

Send PDFs using the input_file content type. You can provide the PDF as either a base64 data URL or a remote URL.

{
	"model": "openai-responses/gpt-5",
	"input": [
		{
			"role": "user",
			"content": [
				{ "type": "input_text", "text": "Summarize this PDF." },
				{
					"type": "input_file",
					"filename": "document.pdf",
					"file_data": "data:application/pdf;base64,<base64-encoded-pdf-data>"
				}
			]
		}
	]
}

See the PDF Support guide for the full list of supported providers.

Tool Use

Define tools the model may call. The Responses API uses a flatter shape than Chat Completions: name, description, and parameters live at the top level of each tool entry.

{
	"model": "openai-responses/gpt-5",
	"input": "What is the weather like in New York?",
	"tools": [
		{
			"type": "function",
			"name": "get_weather",
			"description": "Get the current weather in a given location",
			"parameters": {
				"type": "object",
				"properties": {
					"location": {
						"type": "string",
						"description": "The city and state, e.g. San Francisco, CA"
					}
				},
				"required": ["location"]
			},
			"strict": true
		}
	]
}

To return a tool result on the next turn, send a function_call_output item in input:

{
	"model": "openai-responses/gpt-5",
	"input": [
		{
			"type": "function_call",
			"name": "get_weather",
			"call_id": "call_abc123",
			"arguments": "{\"location\": \"New York, NY\"}"
		},
		{
			"type": "function_call_output",
			"call_id": "call_abc123",
			"output": "{\"temperature\": 68, \"conditions\": \"sunny\"}"
		}
	]
}

Reasoning

For reasoning-capable models (e.g. openai-responses/gpt-5, openai-responses/o3), configure reasoning effort and the optional summary:

{
	"model": "openai-responses/gpt-5",
	"input": "Plan a three day trip to Tokyo.",
	"reasoning": {
		"effort": "medium",
		"summary": "auto"
	}
}

effort: low, medium, or high. Lower effort produces faster responses with fewer reasoning tokens.
summary: auto, concise, or detailed. Controls whether the model returns a reasoning summary alongside the final answer.

Structured Outputs

Set text.format to enforce JSON-mode or a strict JSON Schema on the output.

{
	"model": "openai-responses/gpt-5",
	"input": "Extract entities from: The quick brown fox jumps over the lazy dog.",
	"text": {
		"format": {
			"type": "json_schema",
			"name": "Entities",
			"strict": true,
			"schema": {
				"type": "object",
				"properties": {
					"animals": { "type": "array", "items": { "type": "string" } }
				},
				"required": ["animals"]
			}
		}
	}
}

See the Structured Outputs guide for full examples.

Response Format

A successful response follows the OpenAI Responses format:

{
	"id": "resp_01ABC123",
	"object": "response",
	"created_at": 1730000000,
	"model": "openai-responses/gpt-5",
	"status": "completed",
	"output": [
		{
			"id": "msg_01ABC123",
			"type": "message",
			"role": "assistant",
			"status": "completed",
			"content": [
				{
					"type": "output_text",
					"text": "Once upon a time, a unicorn..."
				}
			]
		}
	],
	"usage": {
		"input_tokens": 12,
		"input_tokens_details": { "cached_tokens": 0 },
		"output_tokens": 27,
		"output_tokens_details": { "reasoning_tokens": 0 },
		"total_tokens": 39,
		"cost": 0.000234
	}
}

The cost field inside usage is a Requesty extension and reports the USD cost of the request. It is returned by default on non-streaming responses, and on the final response.completed event when streaming. See Cost Tracking.

Error Handling

The API returns standard HTTP status codes:

200 - Success
400 - Bad Request (invalid parameters)
401 - Unauthorized (invalid API key)
403 - Forbidden (insufficient permissions)
429 - Rate Limited
500 - Internal Server Error

Key Differences from OpenAI Chat Completions

input instead of messages: Accepts a string or a list of typed items (messages, tool calls, tool results, reasoning).
instructions instead of system messages: System prompts are passed via the top-level instructions field.
Flat tool shape: Tools declare name, description, and parameters directly, without the nested function wrapper.
Content types are prefixed: input_text, input_image, input_file for user inputs; output_text and output_refusal for model outputs.
Event-typed streaming: Streaming uses named events (response.created, response.output_text.delta, response.completed) rather than choice deltas.
max_output_tokens instead of max_tokens: Caps the total of visible and reasoning tokens.

For seamless compatibility with the OpenAI Python and Node SDKs’ responses.create(...) interface, use this endpoint. For broader portability across providers, consider the Chat Completions endpoint instead.

Headers

x-api-key

string

Your Requesty API key. Alternative to the standard Authorization: Bearer header.

Body

application/json

model

string

default:openai-responses/gpt-5

required

The model to use for the response. To route OpenAI models through their native Responses API, use the openai-responses/ prefix (e.g. openai-responses/gpt-5).

Example:

"openai-responses/gpt-5"

input

required

Text, image, or file inputs to the model. Either a plain string or an array of typed input items.

Example:

"Tell me a three sentence bedtime story about a unicorn."

instructions

string

Inserts a system (or developer) message as the first item in the model's context.

max_output_tokens

integer

Upper bound for the number of tokens that can be generated, including visible output tokens and reasoning tokens.

Required range: x >= 1

stream

boolean

If true, the response is streamed to the client as it is generated using server-sent events.

temperature

number

Sampling temperature between 0 and 2. Higher values produce more random output.

Required range: 0 <= x <= 2

top_p

number

Nucleus sampling: consider tokens with cumulative probability mass up to top_p.

Required range: 0 <= x <= 1

parallel_tool_calls

boolean

Whether to allow the model to run tool calls in parallel.

tool_choice

Controls which (if any) tool is called by the model.

Available options:

auto,

none,

required

tools

object[]

Tools the model may call.

Show child attributes

reasoning

object

Reasoning configuration for reasoning-capable models.

Show child attributes

text

object

Output text configuration, including structured output format.

Show child attributes

include

string[]

Specify additional output data to include in the model response.

metadata

object

Set of key-value pairs that can be attached to the request.

Show child attributes

store

boolean

Whether to store the generated model response for later retrieval via API.

truncation

string

The truncation strategy to use for the model response.

user

string

A unique identifier representing your end-user.

Response

string

required

Unique identifier for this response.

object

enum<string>

required

Object type.

Available options:

response

created_at

integer

required

Unix timestamp (in seconds) of when the response was created.

model

string

required

Model ID used to generate the response.

status

enum<string>

required

Status of the response generation.

Available options:

completed,

failed,

in_progress,

incomplete

output

object[]

required

Output items from the model. Typically one or more message, function_call, or reasoning items.

Show child attributes

incomplete_details

object

Show child attributes

error

object

Show child attributes

usage

object

Show child attributes

Last modified on May 26, 2026

Create Message Create Embedding

⌘I

Documentation Index

​Base URL

​Authentication

​Headers

​Example Request

​Using the OpenAI SDK

​Model Selection

​Input Formats

​String input

​Multi-turn input

​Instructions

​Streaming

​Vision Support

​PDF Support

​Tool Use

​Reasoning

​Structured Outputs

​Response Format

​Error Handling

​Key Differences from OpenAI Chat Completions

Headers

Body

Response

Base URL

Authentication

Headers

Example Request

Using the OpenAI SDK

Model Selection

Input Formats

String input

Multi-turn input

Instructions

Streaming

Vision Support

PDF Support

Tool Use

Reasoning

Structured Outputs

Response Format

Error Handling

Key Differences from OpenAI Chat Completions