Skip to main content
Streaming responses provide immediate feedback to users by delivering content token-by-token as it’s generated, dramatically improving perceived performance and user experience.

Overview

Requesty supports streaming responses from all major providers (OpenAI, Anthropic, Google, Mistral) using Server-Sent Events (SSE). Instead of waiting for the complete response, your applications can display content as it’s being generated.

Why Use Streaming?

Improved User Experience

Users see responses immediately, reducing perceived wait time by up to 80%

Better Engagement

Real-time content delivery keeps users engaged during longer responses

Reduced Timeouts

Avoid timeout issues on slow or complex requests

Progressive Display

Enable progressive UI updates as content becomes available

Implementation

Basic Streaming Setup

Enable streaming by setting the stream parameter to true in your request:
import openai

client = openai.OpenAI(
    api_key="your_requesty_api_key",
    base_url="https://router.requesty.ai/v1",
)

response = client.chat.completions.create(
    model="openai/gpt-4",
    messages=[{"role": "user", "content": "Write a poem about the stars."}],
    stream=True
)

# Process streaming response
for chunk in response:
    if chunk.choices[0].delta.content is not None:
        content = chunk.choices[0].delta.content
        print(content, end="", flush=True)

Advanced Streaming Patterns

  • Collecting Complete Response
  • Function Call Streaming
  • Error Handling
Accumulate streaming chunks to build the full response:
collected_content = []

for chunk in response:
    if chunk.choices[0].delta.content is not None:
        content = chunk.choices[0].delta.content
        collected_content.append(content)

full_response = "".join(collected_content)
print(f"Complete response: {full_response}")

Streaming Features

Supported Capabilities

  • Text Generation: Standard chat completions
  • Function Calling: Streaming function calls and arguments
  • Tool Usage: Tool calls with streaming responses
  • Multi-turn Conversations: Streaming in conversation contexts
  • System Messages: Full prompt template support
  • Parameters: Temperature, max_tokens, and other standard parameters

Provider Compatibility

All major providers support streaming through Requesty:
  • OpenAI: GPT-4, GPT-3.5, and all variants
  • Anthropic: Claude 3.5 Sonnet, Claude 3 Haiku/Opus
  • Google: Gemini Pro, Gemini Flash
  • Mistral: All Mistral models
  • Meta: Llama models

Best Practices

  • Display content immediately as it arrives
  • Use typing indicators or progress bars
  • Handle partial responses gracefully
  • Implement smooth scrolling for long content
  • Implement connection retry logic
  • Gracefully handle stream interruptions
  • Provide fallback to non-streaming mode
  • Monitor stream health and performance
  • Use flush=True for immediate output display
  • Batch UI updates for better performance
  • Implement efficient chunk processing
  • Consider client-side buffering strategies
  • Implement proper error boundaries
  • Log streaming metrics and performance
  • Test with various network conditions
  • Plan for graceful degradation

Common Use Cases

Chat Applications

Real-time messaging with immediate response display

Content Generation

Progressive article, blog, or document creation

Code Generation

Live code generation with syntax highlighting

Data Analysis

Streaming analysis results and insights

Creative Writing

Story, poem, or creative content generation

Technical Documentation

Progressive documentation and explanation generation

Integration Examples

React Component

function StreamingChat() {
  const [content, setContent] = useState('');

  const handleStream = async () => {
    const response = await fetch('/api/chat/stream', {
      method: 'POST',
      body: JSON.stringify({ message: userInput })
    });

    const reader = response.body.getReader();
    while (true) {
      const { done, value } = await reader.read();
      if (done) break;

      const chunk = new TextDecoder().decode(value);
      setContent(prev => prev + chunk);
    }
  };

  return <div>{content}</div>;
}

Troubleshooting

Always implement proper error handling when using streaming responses, as network interruptions can cause incomplete responses.

Common Issues

  • Stream Interruption: Implement retry logic and graceful fallbacks
  • Partial Responses: Handle incomplete function calls or content
  • Performance: Optimize chunk processing for large responses
  • Browser Compatibility: Test streaming across different browsers and devices
I