Transcribes audio into text using a speech-to-text model. The audio file is sent as multipart/form-data.
Transcribe audio into text using OpenAIβs speech-to-text models through Requestyβs routing.Documentation Index
Fetch the complete documentation index at: https://docs.requesty.ai/llms.txt
Use this file to discover all available pages before exploring further.
multipart/form-data. Send the audio as the file field and the model identifier as the model field.
| Model | Best for | Billing |
|---|---|---|
openai/gpt-4o-transcribe | Highest accuracy, multilingual | Token based |
openai/gpt-4o-mini-transcribe | Fast and cost efficient | Token based |
openai/whisper-1 | Drop in replacement for legacy Whisper | Duration based (per second of audio) |
openai/gpt-4o-mini-transcribe-2025-12-15) are also available when you need a stable model version.
file field accepts the following formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm.
The maximum upload size per request is 32 MB. For longer recordings, split the audio into chunks and concatenate the resulting transcripts on your side.
language to the ISO 639-1 code of the spoken language to improve accuracy and latency. When omitted, the model auto detects the language.
text and a usage block. The usage block has two possible shapes depending on the model:
gpt-4o-transcribe, gpt-4o-mini-transcribe)whisper-1)type discriminator to decide how to render or aggregate usage on your side.
gpt-4o-transcribe and gpt-4o-mini-transcribe) or per second of input audio (for whisper-1). The exact rate per model is on the Transcription model library. Charges appear in your usage dashboard immediately after the request completes.
200 Success400 Bad Request (missing file or model, unsupported audio format)401 Unauthorized (invalid API key)404 Model not found or not approved for your organization413 Payload Too Large (audio file exceeds 32 MB)429 Rate limited500 Internal Server Errorclient.audio.transcriptions.create() method directly.API key for authentication
The audio file to transcribe. Supported formats are flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, and webm. Maximum upload size is 32 MB.
The speech-to-text model to use, prefixed with the provider slug. Currently only OpenAI models are supported.
"openai/gpt-4o-transcribe"
The language of the input audio in ISO 639-1 format (for example, en, fr, ja). Supplying the language improves accuracy and latency. Auto-detected when omitted.