Synthesizes audio from input text using a text-to-speech model. By default the response is a binary audio stream in the requested format. When stream_format is sse, the response is a Server-Sent Events stream of speech.audio.delta and speech.audio.done events with base64-encoded audio chunks.
Synthesize natural sounding speech from text using OpenAIโs text-to-speech models through Requestyโs routing.Documentation Index
Fetch the complete documentation index at: https://docs.requesty.ai/llms.txt
Use this file to discover all available pages before exploring further.
speech.mp3.
| Model | Best for | Notes |
|---|---|---|
openai/gpt-4o-mini-tts | Most use cases | Highest quality. Supports instructions and SSE streaming. |
openai/tts-1 | Real time, low latency | Lightweight, no instructions, no SSE. |
openai/tts-1-hd | Higher fidelity offline use | No instructions, no SSE. |
openai/gpt-4o-mini-tts-2025-12-15) are also available when you need a stable model version.
alloy, ash, ballad, coral, echo, fable, onyx, nova, sage, shimmer, verse
instructionsinstructions to steer tone, accent, pacing, and emotion. Only openai/gpt-4o-mini-tts supports this field. It is ignored by openai/tts-1 and openai/tts-1-hd.
response_format to control the audio container of the returned bytes.
| Format | Content-Type | Notes |
|---|---|---|
mp3 (default) | audio/mpeg | Compressed. Good for storage and general playback. |
opus | audio/opus | Compressed, very low latency. Good for streaming. |
aac | audio/aac | Compressed, broad device compatibility. |
flac | audio/flac | Lossless compression. |
wav | audio/wav | Uncompressed. Easy to decode. |
pcm | audio/pcm | Raw 24 kHz, 16 bit, mono PCM samples. Lowest latency for real time pipelines. |
stream_format to sse to receive a Server-Sent Events stream of speech.audio.delta events with base64 encoded audio chunks, terminated by a speech.audio.done event with usage information. Only openai/gpt-4o-mini-tts supports SSE.
stream_format is optional and most clients should omit it. Without it, every supported model returns the raw audio bytes in the requested response_format. Set stream_format to sse only with openai/gpt-4o-mini-tts to opt in to the SSE event stream. Setting sse with openai/tts-1 or openai/tts-1-hd, or audio with openai/gpt-4o-mini-tts, returns a 400.delta.audio field with base64 and concatenate the bytes to get the full audio payload.
speed to scale playback (0.25 to 4.0, default 1.0).
200 Success400 Bad Request (invalid parameters, unsupported response_format, or unsupported stream_format for the chosen model)401 Unauthorized (invalid API key)404 Model not found or not approved for your organization429 Rate limited500 Internal Server Errorclient.audio.speech.create() method directly.API key for authentication
The text-to-speech model to use, prefixed with the provider slug. Currently only OpenAI models are supported.
"openai/gpt-4o-mini-tts"
The text to synthesize into speech. Maximum length is 4096 characters.
4096"The quick brown fox jumped over the lazy dog."
The voice to use when generating the audio.
alloy, ash, ballad, coral, echo, fable, onyx, nova, sage, shimmer, verse "alloy"
Additional steering for the voice (tone, accent, pacing). Supported by openai/gpt-4o-mini-tts only. Ignored by openai/tts-1 and openai/tts-1-hd.
"Speak in a warm, friendly tone."
The audio container format for the synthesized output.
mp3, opus, aac, flac, wav, pcm "mp3"
Playback speed of the generated audio. 1.0 is normal speed.
0.25 <= x <= 41
Optional and not recommended for most clients. Omit this field to get the default response shape: raw audio bytes in the requested response_format. Set to sse only with openai/gpt-4o-mini-tts to receive a Server-Sent Events stream of speech.audio.delta and speech.audio.done events with base64-encoded audio chunks. The router rejects sse with openai/tts-1 or openai/tts-1-hd, and rejects audio with openai/gpt-4o-mini-tts.
Audio bytes stream (when stream_format is audio) or Server-Sent Events stream (when stream_format is sse).
The response is of type file.