> ## Documentation Index
> Fetch the complete documentation index at: https://docs.requesty.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Create Chat Completion

> Create a chat completion using various AI models. Compatible with the OpenAI Chat Completions format.

<RequestExample>
  ```bash cURL theme={"dark"}
  curl https://router.requesty.ai/v1/chat/completions \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer YOUR_REQUESTY_API_KEY" \
    -d '{
      "model": "openai/gpt-4o-mini",
      "messages": [
        {
          "role": "system",
          "content": "You are a helpful assistant."
        },
        {
          "role": "user",
          "content": "What is an LLM gateway?"
        }
      ],
      "max_tokens": 1024,
      "temperature": 0.7
    }'
  ```

  ```python Python theme={"dark"}
  from openai import OpenAI

  client = OpenAI(
      api_key="YOUR_REQUESTY_API_KEY",
      base_url="https://router.requesty.ai/v1",
  )

  response = client.chat.completions.create(
      model="openai/gpt-4o-mini",
      messages=[
          {"role": "system", "content": "You are a helpful assistant."},
          {"role": "user", "content": "What is an LLM gateway?"},
      ],
      max_tokens=1024,
      temperature=0.7,
  )

  print(response.choices[0].message.content)
  ```

  ```typescript TypeScript theme={"dark"}
  import OpenAI from "openai";

  const client = new OpenAI({
    apiKey: process.env.REQUESTY_API_KEY,
    baseURL: "https://router.requesty.ai/v1",
  });

  const response = await client.chat.completions.create({
    model: "openai/gpt-4o-mini",
    messages: [
      { role: "system", content: "You are a helpful assistant." },
      { role: "user", content: "What is an LLM gateway?" },
    ],
    max_tokens: 1024,
    temperature: 0.7,
  });

  console.log(response.choices[0].message.content);
  ```
</RequestExample>

<ResponseExample>
  ```json Response 200 theme={"dark"}
  {
    "id": "chatcmpl-abc123def456",
    "object": "chat.completion",
    "created": 1748200000,
    "model": "openai/gpt-4o-mini",
    "choices": [
      {
        "index": 0,
        "message": {
          "role": "assistant",
          "content": "An LLM gateway is a unified API layer that routes requests to multiple large language model providers. It normalizes different API formats, handles failover, load balancing, and provides centralized monitoring and cost tracking across providers like OpenAI, Anthropic, Google, and others."
        },
        "finish_reason": "stop"
      }
    ],
    "usage": {
      "prompt_tokens": 24,
      "completion_tokens": 52,
      "total_tokens": 76,
      "cost": 0.000038
    }
  }
  ```
</ResponseExample>

## Web Search

Enable real-time web search by adding `{ "type": "web_search" }` to the `tools` array. Requesty translates this to each provider's native web search format automatically.

```json theme={"dark"}
{
  "model": "anthropic/claude-sonnet-4-20250514",
  "messages": [
    { "role": "user", "content": "What are the latest news in London today?" }
  ],
  "tools": [{ "type": "web_search" }],
  "stream": true
}
```

Works with Anthropic, Vertex/Gemini, OpenAI, xAI, and Perplexity models. See the [Web Search guide](/features/web-search) for response format details and provider-specific behavior.

## PDF Support

Send PDFs using the `input_file` content type. You can provide the PDF as either base64-encoded data or a URL.

### Using Base64-Encoded PDF

```bash theme={"dark"}
curl https://router.requesty.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_REQUESTY_API_KEY" \
  -d '{
    "model": "anthropic/claude-sonnet-4-20250514",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "Summarize this PDF"
          },
          {
            "type": "input_file",
            "filename": "document.pdf",
            "file_data": "data:application/pdf;base64,<base64-encoded-pdf-data>"
          }
        ]
      }
    ]
  }'
```

### Using PDF URL

```bash theme={"dark"}
curl https://router.requesty.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_REQUESTY_API_KEY" \
  -d '{
    "model": "anthropic/claude-sonnet-4-20250514",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "Summarize this PDF"
          },
          {
            "type": "input_file",
            "filename": "document.pdf",
            "file_data": "https://example.com/document.pdf"
          }
        ]
      }
    ]
  }'
```

### Parameters

* `type`: Must be `"input_file"`
* `filename`: The name of the PDF file (e.g., `"document.pdf"`)
* `file_data`: Either base64-encoded PDF content or a URL to the PDF file


## OpenAPI

````yaml POST /v1/chat/completions
openapi: 3.0.3
info:
  title: Requesty API
  description: Requesty API for AI model routing and key management
  version: 1.0.0
servers:
  - url: https://api-v2.requesty.ai
    description: Management API endpoint
  - url: https://router.requesty.ai
    description: Inference router endpoint
security:
  - BearerAuth: []
paths:
  /v1/chat/completions:
    servers:
      - url: https://router.requesty.ai
        description: Inference router endpoint
    post:
      summary: Create chat completion
      description: >-
        Create a chat completion using various AI models. Compatible with the
        OpenAI Chat Completions format.
      operationId: createChatCompletion
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/ChatCompletionRequest'
            example:
              model: openai/gpt-4o-mini
              messages:
                - role: system
                  content: You are a helpful assistant.
                - role: user
                  content: What is an LLM gateway?
              max_tokens: 1024
              temperature: 0.7
      responses:
        '200':
          description: Chat completion response
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ChatCompletionResponse'
              example:
                id: chatcmpl-abc123def456
                object: chat.completion
                created: 1748200000
                model: openai/gpt-4o-mini
                choices:
                  - index: 0
                    message:
                      role: assistant
                      content: >-
                        An LLM gateway is a unified API layer that routes
                        requests to multiple large language model providers. It
                        normalizes different API formats, handles failover, load
                        balancing, and provides centralized monitoring and cost
                        tracking across providers like OpenAI, Anthropic,
                        Google, and others.
                    finish_reason: stop
                usage:
                  prompt_tokens: 24
                  completion_tokens: 52
                  total_tokens: 76
                  cost: 0.000038
        '400':
          description: Bad request - malformed payload or invalid parameters.
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
        '401':
          description: Unauthorized - missing or empty Authorization header.
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
        '402':
          description: Payment required - organization balance exhausted.
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
        '403':
          description: Forbidden - invalid token or model not in access list.
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
        '404':
          description: Not found - provider/model not supported.
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
        '429':
          description: Rate limit exceeded. Retry after the Retry-After header value.
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
        '500':
          description: Internal server error.
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
        '502':
          description: Bad gateway - upstream provider returned an invalid response.
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
components:
  schemas:
    ChatCompletionRequest:
      type: object
      required:
        - messages
      properties:
        model:
          type: string
          description: The model name. If omitted, defaults to openai/gpt-4o-mini.
          default: openai/gpt-4o-mini
          example: openai/gpt-4o-mini
        messages:
          type: array
          items:
            $ref: '#/components/schemas/Message'
          description: An array of message objects with role and content
        max_tokens:
          type: integer
          description: Maximum number of tokens to generate
        temperature:
          type: number
          description: Controls randomness of the output
        top_p:
          type: number
          description: Controls diversity of the output
        stream:
          type: boolean
          description: Enable Server-Sent Events (SSE) streaming responses
        tools:
          type: array
          items:
            $ref: '#/components/schemas/Tool'
          description: >-
            Available tools for the model. Supports `function` tools for custom
            function calling and `web_search` for real-time web search.
        tool_choice:
          type: string
          description: Specifies how tool calling should be handled
        response_format:
          type: object
          description: For structured responses (some models only)
    ChatCompletionResponse:
      type: object
      required:
        - id
        - object
        - created
        - model
        - choices
      properties:
        id:
          type: string
          description: Unique identifier for the completion
        object:
          type: string
          description: Object type
        created:
          type: integer
          description: Timestamp of creation
        model:
          type: string
          description: Model used for completion
        usage:
          $ref: '#/components/schemas/Usage'
        choices:
          type: array
          items:
            $ref: '#/components/schemas/Choice'
    ErrorResponse:
      type: object
      required:
        - error
      properties:
        error:
          type: object
          required:
            - origin
            - message
          properties:
            origin:
              type: string
              enum:
                - router
                - provider
              description: >-
                Whether the error originated from Requesty's router or an
                upstream provider.
            message:
              type: string
              description: Human-readable error description.
    Message:
      type: object
      required:
        - role
        - content
      properties:
        role:
          type: string
          enum:
            - user
            - assistant
            - system
            - tool
          description: The role of the message sender
        content:
          type: string
          description: The content of the message
        name:
          type: string
          description: The name of the tool (for tool messages)
    Tool:
      type: object
      required:
        - type
      properties:
        type:
          type: string
          enum:
            - function
            - web_search
          description: >-
            The type of tool. Use `function` for custom function calling, or
            `web_search` to enable the model to search the web for real-time
            information. When `type=web_search`, Requesty translates the tool to
            the provider's native web search format (Anthropic, Vertex/Gemini,
            OpenAI, xAI, Perplexity).
        function:
          $ref: '#/components/schemas/Function'
          description: Required when `type=function`. The function definition.
    Usage:
      type: object
      properties:
        completion_tokens:
          type: integer
          format: int32
          description: Number of tokens in the generated completion.
        completion_tokens_details:
          $ref: '#/components/schemas/CompletionTokenDetails'
        prompt_tokens:
          type: integer
          format: int32
          description: Number of tokens in the prompt.
        prompt_tokens_details:
          $ref: '#/components/schemas/PromptTokenDetails'
        total_tokens:
          type: integer
          format: int32
          description: Total number of tokens used (prompt + completion).
        cost:
          type: number
          format: double
          description: >-
            Requesty's USD cost for this request. Returned by default on
            non-streaming responses. For streaming, pass `stream_options:
            {"include_usage": true}` to receive a final chunk with `usage`
            (including `cost`).
    Choice:
      type: object
      properties:
        index:
          type: integer
        message:
          $ref: '#/components/schemas/Message'
        finish_reason:
          type: string
    Function:
      type: object
      required:
        - name
        - description
        - parameters
      properties:
        name:
          type: string
          description: The name of the function
        description:
          type: string
          description: The description of the function
        parameters:
          type: object
          description: The parameters schema for the function
    CompletionTokenDetails:
      type: object
      properties:
        reasoning_tokens:
          type: integer
          format: int32
          description: Tokens generated for reasoning.
    PromptTokenDetails:
      type: object
      properties:
        cached_tokens:
          type: integer
          format: int32
          description: Cached tokens present in the prompt.
        caching_tokens:
          type: integer
          format: int32
          description: Tokens that were cached following this prompt.
  securitySchemes:
    BearerAuth:
      type: http
      scheme: bearer
      description: API key for authentication

````