> ## Documentation Index
> Fetch the complete documentation index at: https://docs.requesty.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Create Speech

> Synthesizes audio from input text using a text-to-speech model. By default the response is a binary audio stream in the requested format. When `stream_format` is `sse`, the response is a Server-Sent Events stream of `speech.audio.delta` and `speech.audio.done` events with base64-encoded audio chunks.

Synthesize natural sounding speech from text using OpenAI's text-to-speech models through Requesty's routing.

<Frame caption="Sample generated with openai/gpt-4o-mini-tts and voice=&#x22;alloy&#x22;.">
  <video controls src="https://mintcdn.com/requesty/ZayBvo71EieKO2sP/images/audio/speech-sample-alloy.mp3?fit=max&auto=format&n=ZayBvo71EieKO2sP&q=85&s=365095236349a54b12d8cf74c012a15e" preload="metadata" style={{ width: '100%', height: '54px' }} data-path="images/audio/speech-sample-alloy.mp3" />
</Frame>

## Base URL

```
https://router.requesty.ai/v1/audio/speech
```

## Authentication

Include your Requesty API key in the request headers:

```bash theme={"dark"}
Authorization: Bearer YOUR_REQUESTY_API_KEY
```

## Example Request

```bash theme={"dark"}
curl https://router.requesty.ai/v1/audio/speech \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_REQUESTY_API_KEY" \
  --output speech.mp3 \
  -d '{
    "model": "openai/gpt-4o-mini-tts",
    "input": "The quick brown fox jumped over the lazy dog.",
    "voice": "alloy",
    "response_format": "mp3"
  }'
```

The response body is the raw audio stream, written directly to `speech.mp3`.

<Tip>
  The response is a binary audio payload, so the API playground on this page renders it as an unreadable byte stream. Save the response to a file (or use the OpenAI SDK examples below) to actually hear it.
</Tip>

### OpenAI SDK

The endpoint is fully compatible with the OpenAI SDK. Just point the client at Requesty's base URL:

```python theme={"dark"}
from openai import OpenAI

client = OpenAI(
    base_url="https://router.requesty.ai/v1",
    api_key="YOUR_REQUESTY_API_KEY",
)

with client.audio.speech.with_streaming_response.create(
    model="openai/gpt-4o-mini-tts",
    input="The quick brown fox jumped over the lazy dog.",
    voice="alloy",
    response_format="mp3",
) as response:
    response.stream_to_file("speech.mp3")
```

```typescript theme={"dark"}
import OpenAI from "openai";
import fs from "node:fs";

const client = new OpenAI({
  baseURL: "https://router.requesty.ai/v1",
  apiKey: process.env.REQUESTY_API_KEY,
});

const response = await client.audio.speech.create({
  model: "openai/gpt-4o-mini-tts",
  input: "The quick brown fox jumped over the lazy dog.",
  voice: "alloy",
  response_format: "mp3",
});

const buffer = Buffer.from(await response.arrayBuffer());
await fs.promises.writeFile("speech.mp3", buffer);
```

## Supported Models

Browse the full catalog on the [Speech model library](https://app.requesty.ai/model-library/speech). Today the available speech models are all from OpenAI:

| Model                    | Best for                    | Notes                                                       |
| ------------------------ | --------------------------- | ----------------------------------------------------------- |
| `openai/gpt-4o-mini-tts` | Most use cases              | Highest quality. Supports `instructions` and SSE streaming. |
| `openai/tts-1`           | Real time, low latency      | Lightweight, no `instructions`, no SSE.                     |
| `openai/tts-1-hd`        | Higher fidelity offline use | No `instructions`, no SSE.                                  |

Date pinned snapshots (for example `openai/gpt-4o-mini-tts-2025-12-15`) are also available when you need a stable model version.

## Voices

The following voices are available across the supported models. Audio previews are on the [OpenAI text to speech guide](https://platform.openai.com/docs/guides/text-to-speech).

`alloy`, `ash`, `ballad`, `coral`, `echo`, `fable`, `onyx`, `nova`, `sage`, `shimmer`, `verse`

## Voice Steering with `instructions`

Use `instructions` to steer tone, accent, pacing, and emotion. Only `openai/gpt-4o-mini-tts` supports this field. It is ignored by `openai/tts-1` and `openai/tts-1-hd`.

```bash theme={"dark"}
curl https://router.requesty.ai/v1/audio/speech \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_REQUESTY_API_KEY" \
  --output greeting.mp3 \
  -d '{
    "model": "openai/gpt-4o-mini-tts",
    "input": "Welcome aboard. Sit back and enjoy the flight.",
    "voice": "nova",
    "instructions": "Speak in a calm, reassuring flight attendant voice."
  }'
```

## Output Formats

Set `response_format` to control the audio container of the returned bytes.

| Format          | Content-Type | Notes                                                                         |
| --------------- | ------------ | ----------------------------------------------------------------------------- |
| `mp3` (default) | `audio/mpeg` | Compressed. Good for storage and general playback.                            |
| `opus`          | `audio/opus` | Compressed, very low latency. Good for streaming.                             |
| `aac`           | `audio/aac`  | Compressed, broad device compatibility.                                       |
| `flac`          | `audio/flac` | Lossless compression.                                                         |
| `wav`           | `audio/wav`  | Uncompressed. Easy to decode.                                                 |
| `pcm`           | `audio/pcm`  | Raw 24 kHz, 16 bit, mono PCM samples. Lowest latency for real time pipelines. |

## Streaming with Server-Sent Events

Set `stream_format` to `sse` to receive a Server-Sent Events stream of `speech.audio.delta` events with base64 encoded audio chunks, terminated by a `speech.audio.done` event with usage information. Only `openai/gpt-4o-mini-tts` supports SSE.

<Info>
  `stream_format` is optional and most clients should omit it. Without it, every supported model returns the raw audio bytes in the requested `response_format`. Set `stream_format` to `sse` only with `openai/gpt-4o-mini-tts` to opt in to the SSE event stream. Setting `sse` with `openai/tts-1` or `openai/tts-1-hd`, or `audio` with `openai/gpt-4o-mini-tts`, returns a 400.
</Info>

```bash theme={"dark"}
curl https://router.requesty.ai/v1/audio/speech \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_REQUESTY_API_KEY" \
  -N \
  -d '{
    "model": "openai/gpt-4o-mini-tts",
    "input": "Streaming speech, chunk by chunk.",
    "voice": "alloy",
    "stream_format": "sse"
  }'
```

Each event looks like:

```text theme={"dark"}
event: speech.audio.delta
data: {"type":"speech.audio.delta","audio":"<base64 audio chunk>"}

event: speech.audio.done
data: {"type":"speech.audio.done","usage":{"input_tokens":12,"output_tokens":48,"total_tokens":60}}

data: [DONE]
```

Decode each `delta.audio` field with base64 and concatenate the bytes to get the full audio payload.

## Speed

Use `speed` to scale playback (`0.25` to `4.0`, default `1.0`).

```json theme={"dark"}
{
  "model": "openai/gpt-4o-mini-tts",
  "input": "Reading at one and a half times speed.",
  "voice": "alloy",
  "speed": 1.5
}
```

## Pricing

Speech models are priced per character of input for character billed models, and per token for token billed models. The exact rate per model is on the [Speech model library](https://app.requesty.ai/model-library/speech). Charges appear in your [usage dashboard](https://app.requesty.ai/analytics) immediately after the request completes.

## Error Handling

The API returns standard HTTP status codes:

* `200` Success
* `400` Bad Request (invalid parameters, unsupported `response_format`, or unsupported `stream_format` for the chosen model)
* `401` Unauthorized (invalid API key)
* `404` Model not found or not approved for your organization
* `429` Rate limited
* `500` Internal Server Error

<Info>
  This endpoint is fully compatible with the OpenAI Audio Speech API. You can use the OpenAI SDK's `client.audio.speech.create()` method directly.
</Info>

<Tip>
  To go the other direction and turn audio into text, use the [Create Transcription endpoint](/api-reference/endpoint/audio-transcriptions-create).
</Tip>


## OpenAPI

````yaml POST /v1/audio/speech
openapi: 3.0.3
info:
  title: Requesty API
  description: Requesty API for AI model routing and key management
  version: 1.0.0
servers:
  - url: https://api-v2.requesty.ai
    description: Management API endpoint
  - url: https://router.requesty.ai
    description: Inference router endpoint
security:
  - BearerAuth: []
paths:
  /v1/audio/speech:
    servers:
      - url: https://router.requesty.ai
        description: Inference router endpoint
    post:
      summary: Create speech
      description: >-
        Synthesizes audio from input text using a text-to-speech model. By
        default the response is a binary audio stream in the requested format.
        When `stream_format` is `sse`, the response is a Server-Sent Events
        stream of `speech.audio.delta` and `speech.audio.done` events with
        base64-encoded audio chunks.
      operationId: createSpeech
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/SpeechRequest'
      responses:
        '200':
          description: >-
            Audio bytes stream (when `stream_format` is `audio`) or Server-Sent
            Events stream (when `stream_format` is `sse`).
          content:
            application/octet-stream:
              schema:
                type: string
                format: binary
            text/event-stream:
              schema:
                type: string
                description: >-
                  Server-Sent Events stream of `speech.audio.delta` and
                  `speech.audio.done` events.
        '400':
          description: Bad request - malformed payload or invalid parameters.
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
        '401':
          description: Unauthorized - missing or empty Authorization header.
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
        '402':
          description: Payment required - organization balance exhausted.
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
        '403':
          description: Forbidden - invalid token or model not in access list.
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
        '404':
          description: Not found - provider/model not supported.
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
        '429':
          description: Rate limit exceeded. Retry after the Retry-After header value.
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
        '500':
          description: Internal server error.
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
        '502':
          description: Bad gateway - upstream provider returned an invalid response.
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
components:
  schemas:
    SpeechRequest:
      type: object
      required:
        - model
        - input
        - voice
      properties:
        model:
          type: string
          description: >-
            The text-to-speech model to use, prefixed with the provider slug.
            Currently only OpenAI models are supported.
          example: openai/gpt-4o-mini-tts
        input:
          type: string
          description: >-
            The text to synthesize into speech. Maximum length is 4096
            characters.
          maxLength: 4096
          example: The quick brown fox jumped over the lazy dog.
        voice:
          type: string
          description: The voice to use when generating the audio.
          enum:
            - alloy
            - ash
            - ballad
            - coral
            - echo
            - fable
            - onyx
            - nova
            - sage
            - shimmer
            - verse
          example: alloy
        instructions:
          type: string
          description: >-
            Additional steering for the voice (tone, accent, pacing). Supported
            by `openai/gpt-4o-mini-tts` only. Ignored by `openai/tts-1` and
            `openai/tts-1-hd`.
          example: Speak in a warm, friendly tone.
        response_format:
          type: string
          description: The audio container format for the synthesized output.
          enum:
            - mp3
            - opus
            - aac
            - flac
            - wav
            - pcm
          default: mp3
          example: mp3
        speed:
          type: number
          format: float
          description: Playback speed of the generated audio. `1.0` is normal speed.
          minimum: 0.25
          maximum: 4
          default: 1
          example: 1
        stream_format:
          type: string
          description: >-
            Optional and not recommended for most clients. Omit this field to
            get the default response shape: raw audio bytes in the requested
            `response_format`. Set to `sse` only with `openai/gpt-4o-mini-tts`
            to receive a Server-Sent Events stream of `speech.audio.delta` and
            `speech.audio.done` events with base64-encoded audio chunks. The
            router rejects `sse` with `openai/tts-1` or `openai/tts-1-hd`, and
            rejects `audio` with `openai/gpt-4o-mini-tts`.
    ErrorResponse:
      type: object
      required:
        - error
      properties:
        error:
          type: object
          required:
            - origin
            - message
          properties:
            origin:
              type: string
              enum:
                - router
                - provider
              description: >-
                Whether the error originated from Requesty's router or an
                upstream provider.
            message:
              type: string
              description: Human-readable error description.
  securitySchemes:
    BearerAuth:
      type: http
      scheme: bearer
      description: API key for authentication

````