The router supports streaming responses from all providers (OpenAI, Anthropic, Mistral) using Server-Sent Events (SSE). Streaming allows you to receive and process the response token by token instead of waiting for the complete response.

How to Use Streaming

  • Enable streaming by setting stream=Truein your request
  • The response will be a stream of chunks that you need to iterate over
  • Each chunk contains a delta of the response in the same format as the OpenAI API

Python Example with Streaming:

import openai
client = openai.OpenAI(
    api_key=ROUTER_API_KEY,
    base_url="https://router.requesty.ai/v1",
    default_headers={"Authorization": f"Bearer {ROUTER_API_KEY}"}
)
response = client.chat.completions.create(
    model="openai/gpt-4",
    messages=[{"role": "user", "content": "Write a poem about the stars."}],
    stream=True
)
# Iterate over the stream and handle chunks
for chunk in response:
    # Access content from the chunk (if present)
    if chunk.choices[0].delta.content is not None:
        content = chunk.choices[0].delta.content
        print(content, end="", flush=True)  # Print content as it arrives
    # Handle function calls in streaming (if present)
    if hasattr(chunk.choices[0].delta, 'function_call'):
        fc = chunk.choices[0].delta.function_call
        if hasattr(fc, 'name') and fc.name:
            print(f"\nFunction Call: {fc.name}")
        if hasattr(fc, 'arguments') and fc.arguments:
            print(f"Arguments: {fc.arguments}")

Important Notes

  1. Content Access:
  • Always check if delta.content is not None before using it
  • Content comes in small chunks that you may want to collect into a full response
  1. Function Calls
  • Function calls are also streamed and come through the delta.function_call property
  • Check for both name and arguments as they might come in separate chunks
  1. Error Handling
  • Wrap streaming code in try/except to handle potential connection issues
  • The stream might end early if there are errors
  1. Best Practices
  • Use flush=True when printing to see output immediately
  • Consider collecting chunks if you need the complete response
  • For production, implement proper error handling and retry logic

Example: Collecting Complete Response

collected_messages = []
for chunk in response:
    if chunk.choices[0].delta.content is not None:
        content = chunk.choices[0].delta.content
        collected_messages.append(content)

full_response = "".join(collected_messages)

Supported Features in Streaming

  • Text completion streaming
  • Function calling streaming
  • Tool calls streaming
  • System messages
  • Temperature and other parameters