Features
Streaming
Receive the response in a stream
The router supports streaming responses from all providers (OpenAI, Anthropic, Mistral) using Server-Sent Events (SSE). Streaming allows you to receive and process the response token by token instead of waiting for the complete response.
How to Use Streaming
- Enable streaming by setting
stream=True
in your request - The response will be a stream of chunks that you need to iterate over
- Each chunk contains a delta of the response in the same format as the OpenAI API
Python Example with Streaming:
Important Notes
- Content Access:
- Always check if
delta.content
is not None before using it - Content comes in small chunks that you may want to collect into a full response
- Function Calls
- Function calls are also streamed and come through the
delta.function_call
property - Check for both name and arguments as they might come in separate chunks
- Error Handling
- Wrap streaming code in try/except to handle potential connection issues
- The stream might end early if there are errors
- Best Practices
- Use
flush=True
when printing to see output immediately - Consider collecting chunks if you need the complete response
- For production, implement proper error handling and retry logic
Example: Collecting Complete Response
Supported Features in Streaming
- Text completion streaming
- Function calling streaming
- Tool calls streaming
- System messages
- Temperature and other parameters