Skip to main content
POST
/
v1
/
audio
/
transcriptions
Create transcription
curl --request POST \
  --url https://router.requesty.ai/v1/audio/transcriptions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: multipart/form-data' \
  --form file='@example-file' \
  --form model=openai/gpt-4o-transcribe \
  --form 'language=<string>'
{
  "text": "Hello, world.",
  "usage": {
    "type": "tokens",
    "input_tokens": 123,
    "output_tokens": 123,
    "total_tokens": 123,
    "input_token_details": {
      "audio_tokens": 123,
      "text_tokens": 123
    }
  }
}

Documentation Index

Fetch the complete documentation index at: https://docs.requesty.ai/llms.txt

Use this file to discover all available pages before exploring further.

Transcribe audio into text using OpenAI’s speech-to-text models through Requesty’s routing.

Base URL

https://router.requesty.ai/v1/audio/transcriptions

Authentication

Include your Requesty API key in the request headers:
Authorization: Bearer YOUR_REQUESTY_API_KEY

Example Request

The endpoint accepts multipart/form-data. Send the audio as the file field and the model identifier as the model field.
curl https://router.requesty.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer YOUR_REQUESTY_API_KEY" \
  -F "model=openai/gpt-4o-transcribe" \
  -F "file=@./meeting.mp3"
Example response:
{
  "text": "Hello, this is a transcription of the audio.",
  "usage": {
    "type": "tokens",
    "input_tokens": 14,
    "output_tokens": 11,
    "total_tokens": 25,
    "input_token_details": {
      "audio_tokens": 14,
      "text_tokens": 0
    }
  }
}

OpenAI SDK

The endpoint is fully compatible with the OpenAI SDK. Just point the client at Requesty’s base URL:
from openai import OpenAI

client = OpenAI(
    base_url="https://router.requesty.ai/v1",
    api_key="YOUR_REQUESTY_API_KEY",
)

with open("meeting.mp3", "rb") as audio:
    transcript = client.audio.transcriptions.create(
        model="openai/gpt-4o-transcribe",
        file=audio,
    )

print(transcript.text)
import OpenAI from "openai";
import fs from "node:fs";

const client = new OpenAI({
  baseURL: "https://router.requesty.ai/v1",
  apiKey: process.env.REQUESTY_API_KEY,
});

const transcript = await client.audio.transcriptions.create({
  model: "openai/gpt-4o-transcribe",
  file: fs.createReadStream("meeting.mp3"),
});

console.log(transcript.text);

Supported Models

Browse the full catalog on the Transcription model library. Today the available transcription models are all from OpenAI:
ModelBest forBilling
openai/gpt-4o-transcribeHighest accuracy, multilingualToken based
openai/gpt-4o-mini-transcribeFast and cost efficientToken based
openai/whisper-1Drop in replacement for legacy WhisperDuration based (per second of audio)
Date pinned snapshots (for example openai/gpt-4o-mini-transcribe-2025-12-15) are also available when you need a stable model version.

Supported Audio Formats

The file field accepts the following formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm. The maximum upload size per request is 32 MB. For longer recordings, split the audio into chunks and concatenate the resulting transcripts on your side.

Language Hint

Set language to the ISO 639-1 code of the spoken language to improve accuracy and latency. When omitted, the model auto detects the language.
curl https://router.requesty.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer YOUR_REQUESTY_API_KEY" \
  -F "model=openai/gpt-4o-transcribe" \
  -F "language=fr" \
  -F "file=@./conference.m4a"

Response Format

The response is always a JSON object with the transcribed text and a usage block. The usage block has two possible shapes depending on the model:

Token usage (gpt-4o-transcribe, gpt-4o-mini-transcribe)

{
  "text": "Hello, world.",
  "usage": {
    "type": "tokens",
    "input_tokens": 14,
    "output_tokens": 11,
    "total_tokens": 25,
    "input_token_details": {
      "audio_tokens": 14,
      "text_tokens": 0
    }
  }
}

Duration usage (whisper-1)

{
  "text": "Hello, world.",
  "usage": {
    "type": "duration",
    "seconds": 4.2
  }
}
Use the type discriminator to decide how to render or aggregate usage on your side.

Pricing

Transcription models are priced either per token of input audio (for gpt-4o-transcribe and gpt-4o-mini-transcribe) or per second of input audio (for whisper-1). The exact rate per model is on the Transcription model library. Charges appear in your usage dashboard immediately after the request completes.

Error Handling

The API returns standard HTTP status codes:
  • 200 Success
  • 400 Bad Request (missing file or model, unsupported audio format)
  • 401 Unauthorized (invalid API key)
  • 404 Model not found or not approved for your organization
  • 413 Payload Too Large (audio file exceeds 32 MB)
  • 429 Rate limited
  • 500 Internal Server Error
This endpoint is fully compatible with the OpenAI Audio Transcriptions API. You can use the OpenAI SDK’s client.audio.transcriptions.create() method directly.
To go the other direction and turn text into audio, use the Create Speech endpoint.

Authorizations

Authorization
string
header
required

API key for authentication

Body

multipart/form-data
file
file
required

The audio file to transcribe. Supported formats are flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, and webm. Maximum upload size is 32 MB.

model
string
required

The speech-to-text model to use, prefixed with the provider slug. Currently only OpenAI models are supported.

Example:

"openai/gpt-4o-transcribe"

language
string

The language of the input audio in ISO 639-1 format (for example, en, fr, ja). Supplying the language improves accuracy and latency. Auto-detected when omitted.

Response

Transcription result

text
string
required

The transcribed text.

Example:

"Hello, world."

usage
object
required

Usage stats for the transcription. The shape depends on how the model is billed: token-based (gpt-4o-transcribe, gpt-4o-mini-transcribe) or duration-based (whisper-1).

Last modified on May 2, 2026