Streaming

The router supports streaming responses from all providers (OpenAI, Anthropic, Mistral) using Server-Sent Events (SSE). Streaming allows you to receive and process the response token by token instead of waiting for the complete response.

How to Use Streaming

Enable streaming by setting stream=Truein your request
The response will be a stream of chunks that you need to iterate over
Each chunk contains a delta of the response in the same format as the OpenAI API

Python Example with Streaming:

import openai

client = openai.OpenAI(
    api_key=ROUTER_API_KEY,
    base_url="https://router.requesty.ai/v1",
)

response = client.chat.completions.create(
    model="openai/gpt-4",
    messages=[{"role": "user", "content": "Write a poem about the stars."}],
    stream=True
)

# Iterate over the stream and handle chunks
for chunk in response:
    # Access content from the chunk (if present)
    if chunk.choices[0].delta.content is not None:
        content = chunk.choices[0].delta.content
        print(content, end="", flush=True)  # Print content as it arrives
    # Handle function calls in streaming (if present)
    if hasattr(chunk.choices[0].delta, 'function_call'):
        fc = chunk.choices[0].delta.function_call
        if hasattr(fc, 'name') and fc.name:
            print(f"\nFunction Call: {fc.name}")
        if hasattr(fc, 'arguments') and fc.arguments:
            print(f"Arguments: {fc.arguments}")

Important Notes

Content Access:

Always check if delta.content is not None before using it
Content comes in small chunks that you may want to collect into a full response

Function Calls

Function calls are also streamed and come through the delta.function_call property
Check for both name and arguments as they might come in separate chunks

Error Handling

Wrap streaming code in try/except to handle potential connection issues
The stream might end early if there are errors

Best Practices

Use flush=True when printing to see output immediately
Consider collecting chunks if you need the complete response
For production, implement proper error handling and retry logic

Example: Collecting Complete Response

collected_messages = []
for chunk in response:
    if chunk.choices[0].delta.content is not None:
        content = chunk.choices[0].delta.content
        collected_messages.append(content)

full_response = "".join(collected_messages)

Supported Features in Streaming

Text completion streaming
Function calling streaming
Tool calls streaming
System messages
Temperature and other parameters

Get Started

Features

Enterprise Features

Applications

Frameworks

API Reference

How to Use Streaming

Python Example with Streaming:

Important Notes

Example: Collecting Complete Response

Supported Features in Streaming

Get Started

Features

Enterprise Features

Applications

Frameworks

API Reference

​How to Use Streaming

​Python Example with Streaming:

​Important Notes

​Example: Collecting Complete Response

​Supported Features in Streaming

How to Use Streaming

Python Example with Streaming:

Important Notes

Example: Collecting Complete Response

Supported Features in Streaming