Create Speech

Synthesize natural sounding speech from text using OpenAI’s text-to-speech models through Requesty’s routing.

Base URL

https://router.requesty.ai/v1/audio/speech

Authentication

Include your Requesty API key in the request headers:

Authorization: Bearer YOUR_REQUESTY_API_KEY

Example Request

curl https://router.requesty.ai/v1/audio/speech \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_REQUESTY_API_KEY" \
  --output speech.mp3 \
  -d '{
    "model": "openai/gpt-4o-mini-tts",
    "input": "The quick brown fox jumped over the lazy dog.",
    "voice": "alloy",
    "response_format": "mp3"
  }'

The response body is the raw audio stream, written directly to speech.mp3.

The response is a binary audio payload, so the API playground on this page renders it as an unreadable byte stream. Save the response to a file (or use the OpenAI SDK examples below) to actually hear it.

OpenAI SDK

The endpoint is fully compatible with the OpenAI SDK. Just point the client at Requesty’s base URL:

from openai import OpenAI

client = OpenAI(
    base_url="https://router.requesty.ai/v1",
    api_key="YOUR_REQUESTY_API_KEY",
)

with client.audio.speech.with_streaming_response.create(
    model="openai/gpt-4o-mini-tts",
    input="The quick brown fox jumped over the lazy dog.",
    voice="alloy",
    response_format="mp3",
) as response:
    response.stream_to_file("speech.mp3")

import OpenAI from "openai";
import fs from "node:fs";

const client = new OpenAI({
  baseURL: "https://router.requesty.ai/v1",
  apiKey: process.env.REQUESTY_API_KEY,
});

const response = await client.audio.speech.create({
  model: "openai/gpt-4o-mini-tts",
  input: "The quick brown fox jumped over the lazy dog.",
  voice: "alloy",
  response_format: "mp3",
});

const buffer = Buffer.from(await response.arrayBuffer());
await fs.promises.writeFile("speech.mp3", buffer);

Supported Models

Browse the full catalog on the Speech model library. Today the available speech models are all from OpenAI:

Model	Best for	Notes
`openai/gpt-4o-mini-tts`	Most use cases	Highest quality. Supports `instructions` and SSE streaming.
`openai/tts-1`	Real time, low latency	Lightweight, no `instructions`, no SSE.
`openai/tts-1-hd`	Higher fidelity offline use	No `instructions`, no SSE.

Date pinned snapshots (for example openai/gpt-4o-mini-tts-2025-12-15) are also available when you need a stable model version.

Voices

The following voices are available across the supported models. Audio previews are on the OpenAI text to speech guide. alloy, ash, ballad, coral, echo, fable, onyx, nova, sage, shimmer, verse

Voice Steering with `instructions`

Use instructions to steer tone, accent, pacing, and emotion. Only openai/gpt-4o-mini-tts supports this field. It is ignored by openai/tts-1 and openai/tts-1-hd.

curl https://router.requesty.ai/v1/audio/speech \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_REQUESTY_API_KEY" \
  --output greeting.mp3 \
  -d '{
    "model": "openai/gpt-4o-mini-tts",
    "input": "Welcome aboard. Sit back and enjoy the flight.",
    "voice": "nova",
    "instructions": "Speak in a calm, reassuring flight attendant voice."
  }'

Output Formats

Set response_format to control the audio container of the returned bytes.

Format	Content-Type	Notes
`mp3` (default)	`audio/mpeg`	Compressed. Good for storage and general playback.
`opus`	`audio/opus`	Compressed, very low latency. Good for streaming.
`aac`	`audio/aac`	Compressed, broad device compatibility.
`flac`	`audio/flac`	Lossless compression.
`wav`	`audio/wav`	Uncompressed. Easy to decode.
`pcm`	`audio/pcm`	Raw 24 kHz, 16 bit, mono PCM samples. Lowest latency for real time pipelines.

Streaming with Server-Sent Events

Set stream_format to sse to receive a Server-Sent Events stream of speech.audio.delta events with base64 encoded audio chunks, terminated by a speech.audio.done event with usage information. Only openai/gpt-4o-mini-tts supports SSE.

stream_format is optional and most clients should omit it. Without it, every supported model returns the raw audio bytes in the requested response_format. Set stream_format to sse only with openai/gpt-4o-mini-tts to opt in to the SSE event stream. Setting sse with openai/tts-1 or openai/tts-1-hd, or audio with openai/gpt-4o-mini-tts, returns a 400.

curl https://router.requesty.ai/v1/audio/speech \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_REQUESTY_API_KEY" \
  -N \
  -d '{
    "model": "openai/gpt-4o-mini-tts",
    "input": "Streaming speech, chunk by chunk.",
    "voice": "alloy",
    "stream_format": "sse"
  }'

Each event looks like:

event: speech.audio.delta
data: {"type":"speech.audio.delta","audio":"<base64 audio chunk>"}

event: speech.audio.done
data: {"type":"speech.audio.done","usage":{"input_tokens":12,"output_tokens":48,"total_tokens":60}}

data: [DONE]

Decode each delta.audio field with base64 and concatenate the bytes to get the full audio payload.

Speed

Use speed to scale playback (0.25 to 4.0, default 1.0).

{
  "model": "openai/gpt-4o-mini-tts",
  "input": "Reading at one and a half times speed.",
  "voice": "alloy",
  "speed": 1.5
}

Pricing

Speech models are priced per character of input for character billed models, and per token for token billed models. The exact rate per model is on the Speech model library. Charges appear in your usage dashboard immediately after the request completes.

Error Handling

The API returns standard HTTP status codes:

200 Success
400 Bad Request (invalid parameters, unsupported response_format, or unsupported stream_format for the chosen model)
401 Unauthorized (invalid API key)
404 Model not found or not approved for your organization
429 Rate limited
500 Internal Server Error

This endpoint is fully compatible with the OpenAI Audio Speech API. You can use the OpenAI SDK’s client.audio.speech.create() method directly.

To go the other direction and turn audio into text, use the Create Transcription endpoint.

Authorizations

Authorization

string

header

required

API key for authentication

Body

application/json

model

string

required

The text-to-speech model to use, prefixed with the provider slug. Currently only OpenAI models are supported.

Example:

"openai/gpt-4o-mini-tts"

input

string

required

The text to synthesize into speech. Maximum length is 4096 characters.

Maximum string length: 4096

Example:

"The quick brown fox jumped over the lazy dog."

voice

enum<string>

required

The voice to use when generating the audio.

Available options:

alloy,

ash,

ballad,

coral,

echo,

fable,

onyx,

nova,

sage,

shimmer,

verse

Example:

"alloy"

instructions

string

Additional steering for the voice (tone, accent, pacing). Supported by openai/gpt-4o-mini-tts only. Ignored by openai/tts-1 and openai/tts-1-hd.

Example:

"Speak in a warm, friendly tone."

response_format

enum<string>

default:mp3

The audio container format for the synthesized output.

Available options:

mp3,

opus,

aac,

flac,

wav,

pcm

Example:

"mp3"

speed

number<float>

default:1

Playback speed of the generated audio. 1.0 is normal speed.

Required range: 0.25 <= x <= 4

Example:

1

stream_format

string

Optional and not recommended for most clients. Omit this field to get the default response shape: raw audio bytes in the requested response_format. Set to sse only with openai/gpt-4o-mini-tts to receive a Server-Sent Events stream of speech.audio.delta and speech.audio.done events with base64-encoded audio chunks. The router rejects sse with openai/tts-1 or openai/tts-1-hd, and rejects audio with openai/gpt-4o-mini-tts.

Response

Audio bytes stream (when stream_format is audio) or Server-Sent Events stream (when stream_format is sse).

The response is of type file.

🚀 Getting Started

🌟 Features

🏢 Organization

🔗 Integrations

⚡ Frameworks

📡 Inference APIs

🔧 Management APIs

Base URL

Authentication

Example Request

OpenAI SDK

Supported Models

Voices

Voice Steering with `instructions`

Output Formats

Streaming with Server-Sent Events

Speed

Pricing

Error Handling

Authorizations

Body

Response

🚀 Getting Started

🌟 Features

🏢 Organization

🔗 Integrations

⚡ Frameworks

📡 Inference APIs

🔧 Management APIs

Documentation Index

​Base URL

​Authentication

​Example Request

​OpenAI SDK

​Supported Models

​Voices

​Voice Steering with instructions

​Output Formats

​Streaming with Server-Sent Events

​Speed

​Pricing

​Error Handling

Authorizations

Body

Response

Base URL

Authentication

Example Request

OpenAI SDK

Supported Models

Voices

Voice Steering with `instructions`

Output Formats

Streaming with Server-Sent Events

Speed

Pricing

Error Handling