reasoning_effort
with one of the supported values in your API request.
Notes
- OpenAI does NOT share the actual reasoning tokens. You will not see them in the response.
- Deepseek reasoning models enable reasoning automatically, you donβt need to specify anything in the request to enable that.
- When using Deepseek and Anthropic, the reasoning content in the response will be under βreasoning_contentβ.
Reasoning effort values
Anthropic expects a specific number that sets the upper limit of thinking tokens. The limit must be less than the specified max tokens value. OpenAI models expect one of the following βeffortβ values:- low
- medium
- high
βnoneβ or βminβ effort
βnoneβ or βminβ are synonyms and work with all models. For reasoning models, it either disables reasoning or uses the minimal effort for it. So, for example, βnoneβ or βminβ, would use 128 with Gemini 2.5 Pro, or 0 with Gemini 2.5 Flash.When using OpenAI via Requesty:
- If the client specifies a standard reasoning effort string, i.e. βlowβ/βmediumβ/βhighβ, Requesty forwards the same value to OpenAI.
- If the client specifies the βmaxβ reasoning effort string, Requesty forwards the value βhighβ to OpenAI.
- If the client specifies βnoneβ or βminβ as the reasoning effort string, Requesty will use βlowβ, as this is the minimal amount of reasoning the models support.
- If the client specifies a reasoning budget string (e.g. β10000β), Requesty converts it to an effort, based on the conversion table below.
- 0-1024 -> βlowβ
- 1025-8192 -> βmediumβ
- 8193 or higher -> βhighβ
When using Anthropic via Requesty:
- If the client specifies a reasoning effort string (βlowβ/βmediumβ/βhighβ/βmaxβ, βminβ, or βnoneβ), Requesty converts it to a budget, based on the conversion table below.
- If the client specifies a reasoning budget string (e.g. β10000β), Requesty passes this value to Google. If the budget is larger than the modelβs maximum output tokens, it will automatically be reduced to stay within that token limit.
- βminβ / βnoneβ / βlowβ -> 1024
- βmediumβ -> 8192
- βhighβ -> 16384
- βmaxβ -> max output tokens for model minus 1 (i.e. 63999 for Sonnet 3.7 or 4, 31999 for Opus 4)
When using Vertex AI via Requesty:
- If the client specifies a reasoning effort string (βlowβ/βmediumβ/βhighβ/βmaxβ, βminβ, or βnoneβ), Requesty converts it to a budget, based on the conversion table below.
- If the client specifies a reasoning budget string (e.g. β10000β), Requesty passes this value to Google. If the budget is larger than the modelβs maximum output tokens, it will automatically be reduced to stay within that token limit.
- βminβ / βnoneβ -> 0 for Gemini Flash and Flash lite, 128 for Gemini Pro models
- βlowβ -> 1024
- βmediumβ -> 8192
- βhighβ -> 24576
- βmaxβ -> max output tokens for model
When using Google AI Studio via Requesty:
Same as using OpenAI. See above.Reasoning code example
For both tests, you can use either an OpenAI, Anthropic or Gemini reasoning model, for example:- βopenai/o3-miniβ
- βanthropic/claude-sonnet-4-0β
- βvertex/google/gemini-2.5-proβ