Skip to main content
POST
/
v1
/
audio
/
speech
Create speech
curl --request POST \
  --url https://router.requesty.ai/v1/audio/speech \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "openai/gpt-4o-mini-tts",
  "input": "The quick brown fox jumped over the lazy dog.",
  "voice": "alloy",
  "instructions": "Speak in a warm, friendly tone.",
  "response_format": "mp3",
  "speed": 1,
  "stream_format": "<string>"
}
'
"<string>"

Documentation Index

Fetch the complete documentation index at: https://docs.requesty.ai/llms.txt

Use this file to discover all available pages before exploring further.

Synthesize natural sounding speech from text using OpenAIโ€™s text-to-speech models through Requestyโ€™s routing.

Base URL

https://router.requesty.ai/v1/audio/speech

Authentication

Include your Requesty API key in the request headers:
Authorization: Bearer YOUR_REQUESTY_API_KEY

Example Request

curl https://router.requesty.ai/v1/audio/speech \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_REQUESTY_API_KEY" \
  --output speech.mp3 \
  -d '{
    "model": "openai/gpt-4o-mini-tts",
    "input": "The quick brown fox jumped over the lazy dog.",
    "voice": "alloy",
    "response_format": "mp3"
  }'
The response body is the raw audio stream, written directly to speech.mp3.
The response is a binary audio payload, so the API playground on this page renders it as an unreadable byte stream. Save the response to a file (or use the OpenAI SDK examples below) to actually hear it.

OpenAI SDK

The endpoint is fully compatible with the OpenAI SDK. Just point the client at Requestyโ€™s base URL:
from openai import OpenAI

client = OpenAI(
    base_url="https://router.requesty.ai/v1",
    api_key="YOUR_REQUESTY_API_KEY",
)

with client.audio.speech.with_streaming_response.create(
    model="openai/gpt-4o-mini-tts",
    input="The quick brown fox jumped over the lazy dog.",
    voice="alloy",
    response_format="mp3",
) as response:
    response.stream_to_file("speech.mp3")
import OpenAI from "openai";
import fs from "node:fs";

const client = new OpenAI({
  baseURL: "https://router.requesty.ai/v1",
  apiKey: process.env.REQUESTY_API_KEY,
});

const response = await client.audio.speech.create({
  model: "openai/gpt-4o-mini-tts",
  input: "The quick brown fox jumped over the lazy dog.",
  voice: "alloy",
  response_format: "mp3",
});

const buffer = Buffer.from(await response.arrayBuffer());
await fs.promises.writeFile("speech.mp3", buffer);

Supported Models

Browse the full catalog on the Speech model library. Today the available speech models are all from OpenAI:
ModelBest forNotes
openai/gpt-4o-mini-ttsMost use casesHighest quality. Supports instructions and SSE streaming.
openai/tts-1Real time, low latencyLightweight, no instructions, no SSE.
openai/tts-1-hdHigher fidelity offline useNo instructions, no SSE.
Date pinned snapshots (for example openai/gpt-4o-mini-tts-2025-12-15) are also available when you need a stable model version.

Voices

The following voices are available across the supported models. Audio previews are on the OpenAI text to speech guide. alloy, ash, ballad, coral, echo, fable, onyx, nova, sage, shimmer, verse

Voice Steering with instructions

Use instructions to steer tone, accent, pacing, and emotion. Only openai/gpt-4o-mini-tts supports this field. It is ignored by openai/tts-1 and openai/tts-1-hd.
curl https://router.requesty.ai/v1/audio/speech \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_REQUESTY_API_KEY" \
  --output greeting.mp3 \
  -d '{
    "model": "openai/gpt-4o-mini-tts",
    "input": "Welcome aboard. Sit back and enjoy the flight.",
    "voice": "nova",
    "instructions": "Speak in a calm, reassuring flight attendant voice."
  }'

Output Formats

Set response_format to control the audio container of the returned bytes.
FormatContent-TypeNotes
mp3 (default)audio/mpegCompressed. Good for storage and general playback.
opusaudio/opusCompressed, very low latency. Good for streaming.
aacaudio/aacCompressed, broad device compatibility.
flacaudio/flacLossless compression.
wavaudio/wavUncompressed. Easy to decode.
pcmaudio/pcmRaw 24 kHz, 16 bit, mono PCM samples. Lowest latency for real time pipelines.

Streaming with Server-Sent Events

Set stream_format to sse to receive a Server-Sent Events stream of speech.audio.delta events with base64 encoded audio chunks, terminated by a speech.audio.done event with usage information. Only openai/gpt-4o-mini-tts supports SSE.
stream_format is optional and most clients should omit it. Without it, every supported model returns the raw audio bytes in the requested response_format. Set stream_format to sse only with openai/gpt-4o-mini-tts to opt in to the SSE event stream. Setting sse with openai/tts-1 or openai/tts-1-hd, or audio with openai/gpt-4o-mini-tts, returns a 400.
curl https://router.requesty.ai/v1/audio/speech \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_REQUESTY_API_KEY" \
  -N \
  -d '{
    "model": "openai/gpt-4o-mini-tts",
    "input": "Streaming speech, chunk by chunk.",
    "voice": "alloy",
    "stream_format": "sse"
  }'
Each event looks like:
event: speech.audio.delta
data: {"type":"speech.audio.delta","audio":"<base64 audio chunk>"}

event: speech.audio.done
data: {"type":"speech.audio.done","usage":{"input_tokens":12,"output_tokens":48,"total_tokens":60}}

data: [DONE]
Decode each delta.audio field with base64 and concatenate the bytes to get the full audio payload.

Speed

Use speed to scale playback (0.25 to 4.0, default 1.0).
{
  "model": "openai/gpt-4o-mini-tts",
  "input": "Reading at one and a half times speed.",
  "voice": "alloy",
  "speed": 1.5
}

Pricing

Speech models are priced per character of input for character billed models, and per token for token billed models. The exact rate per model is on the Speech model library. Charges appear in your usage dashboard immediately after the request completes.

Error Handling

The API returns standard HTTP status codes:
  • 200 Success
  • 400 Bad Request (invalid parameters, unsupported response_format, or unsupported stream_format for the chosen model)
  • 401 Unauthorized (invalid API key)
  • 404 Model not found or not approved for your organization
  • 429 Rate limited
  • 500 Internal Server Error
This endpoint is fully compatible with the OpenAI Audio Speech API. You can use the OpenAI SDKโ€™s client.audio.speech.create() method directly.
To go the other direction and turn audio into text, use the Create Transcription endpoint.

Authorizations

Authorization
string
header
required

API key for authentication

Body

application/json
model
string
required

The text-to-speech model to use, prefixed with the provider slug. Currently only OpenAI models are supported.

Example:

"openai/gpt-4o-mini-tts"

input
string
required

The text to synthesize into speech. Maximum length is 4096 characters.

Maximum string length: 4096
Example:

"The quick brown fox jumped over the lazy dog."

voice
enum<string>
required

The voice to use when generating the audio.

Available options:
alloy,
ash,
ballad,
coral,
echo,
fable,
onyx,
nova,
sage,
shimmer,
verse
Example:

"alloy"

instructions
string

Additional steering for the voice (tone, accent, pacing). Supported by openai/gpt-4o-mini-tts only. Ignored by openai/tts-1 and openai/tts-1-hd.

Example:

"Speak in a warm, friendly tone."

response_format
enum<string>
default:mp3

The audio container format for the synthesized output.

Available options:
mp3,
opus,
aac,
flac,
wav,
pcm
Example:

"mp3"

speed
number<float>
default:1

Playback speed of the generated audio. 1.0 is normal speed.

Required range: 0.25 <= x <= 4
Example:

1

stream_format
string

Optional and not recommended for most clients. Omit this field to get the default response shape: raw audio bytes in the requested response_format. Set to sse only with openai/gpt-4o-mini-tts to receive a Server-Sent Events stream of speech.audio.delta and speech.audio.done events with base64-encoded audio chunks. The router rejects sse with openai/tts-1 or openai/tts-1-hd, and rejects audio with openai/gpt-4o-mini-tts.

Response

Audio bytes stream (when stream_format is audio) or Server-Sent Events stream (when stream_format is sse).

The response is of type file.

Last modified on May 2, 2026