These tokens offer insight into the model’s reasoning process, providing a transparent view of its thought steps. Since Reasoning Tokens are considered output tokens, they are billed accordingly.To enable reasoning, specify reasoning_effort with one of the supported values in your API request.
Anthropic expects a specific number that sets the upper limit of thinking tokens. The limit must be less than the specified max tokens value.OpenAI models expect one of the following ‘effort’ values:
low
medium
high
Google Gemini expects a specific number when using Vertex AI, and supports OpenAI’s reasoning efforts via the Google AI Studio (their OpenAI-compatible API).Requesty introduces new ‘effort’ values: ‘max’, ‘min’, and ‘none’ to support more granular control over reasoning.
“none” or “min” are synonyms and work with all models. For reasoning models, it either disables reasoning or uses the minimal effort for it.
So, for example, “none” or “min”, would use 128 with Gemini 2.5 Pro, or 0 with Gemini 2.5 Flash.
If the client specifies a standard reasoning effort string, i.e. “low”/“medium”/“high”, Requesty forwards the same value to OpenAI.
If the client specifies the ‘max’ reasoning effort string, Requesty forwards the value ‘high’ to OpenAI.
If the client specifies ‘none’ or ‘min’ as the reasoning effort string, Requesty will use “low”, as this is the minimal amount of reasoning the models support.
If the client specifies a reasoning budget string (e.g. “10000”), Requesty converts it to an effort, based on the conversion table below.
If the client specifies a reasoning effort string (“low”/“medium”/“high”/“max”, “min”, or “none”), Requesty converts it to a budget, based on the conversion table below.
If the client specifies a reasoning budget string (e.g. “10000”), Requesty passes this value to Google. If the budget is larger than the model’s maximum output tokens, it will automatically be reduced to stay within that token limit.
Converstion table from effort to budget:
“min” / “none” / “low” -> 1024
“medium” -> 8192
“high” -> 16384
“max” -> max output tokens for model minus 1 (i.e. 63999 for Sonnet 3.7 or 4, 31999 for Opus 4)
If the client specifies a reasoning effort string (“low”/“medium”/“high”/“max”, “min”, or “none”), Requesty converts it to a budget, based on the conversion table below.
If the client specifies a reasoning budget string (e.g. “10000”), Requesty passes this value to Google. If the budget is larger than the model’s maximum output tokens, it will automatically be reduced to stay within that token limit.
Converstion table from effort to budget:
“min” / “none” -> 0 for Gemini Flash and Flash lite, 128 for Gemini Pro models