Skip to main content
Hermes Agent is an open source AI agent by Nous Research that runs across CLI, TUI, desktop, and messaging platforms. It features persistent memory, tool use, browser automation, and multi-session delegation. Using the Requesty integration, you can:
  • Access 300+ models from OpenAI, Anthropic, Google, Mistral, and many other providers through one API key.
  • Get automatic prompt caching on Anthropic models, reducing cost significantly on multi-turn conversations.
  • Track and manage your spend in a single location.
  • Apply fallback policies, load balancing, and latency routing to keep your agent responsive.

Prerequisites

Configuration

The recommended setup uses the native Anthropic Messages format (api_mode: anthropic_messages), which enables automatic prompt caching on supported models. This is the optimal configuration for Hermes because its large system prompt and tool definitions benefit heavily from prefix caching across turns. Create or replace your global Hermes config:
cat > ~/.hermes/config.yaml << 'EOF'
model:
  default: "anthropic/claude-sonnet-4-5"
  provider: "requesty"
  default_headers:
    HTTP-Referer: "https://hermes-agent.nousresearch.com"
    X-Origin-Title: "Hermes"

custom_providers:
  - name: requesty
    base_url: "https://router.requesty.ai"
    api_key: "your_requesty_api_key"
    api_mode: anthropic_messages
EOF
Then start a session:
hermes
The api_mode: anthropic_messages setting tells Hermes to use the native Anthropic Messages API format. This is what enables Requesty’s automatic prompt caching, which can reduce costs by up to 90% on long conversations by caching the system prompt and tool definitions between turns.

Why anthropic_messages matters

Hermes sends a large system prompt (often 20,000+ tokens including tool definitions) on every turn. With the OpenAI chat completions format, this entire prompt is re-processed from scratch each time. With the native Anthropic Messages format, Requesty automatically applies cache control breakpoints so subsequent turns in the same conversation reuse the cached prefix, paying only for new user messages and responses.

Model selection

You can use any model from the Model Library. Set the default in config.yaml or switch mid-session:
model:
  default: "anthropic/claude-sonnet-4-5"
Override per session with the --model flag:
hermes chat --model "openai/gpt-4o"
hermes chat --model "google/gemini-2.5-pro"
Or use a Routing Policy to change models without editing config. Instead of hard coding a model, point Hermes at a Routing Policy. A policy is a named alias that resolves on the Requesty side. You swap the underlying model from the Routing Policies page without touching your config.
model:
  default: "policy/reliable-sonnet"
  provider: "requesty"
  default_headers:
    HTTP-Referer: "https://hermes-agent.nousresearch.com"
    X-Origin-Title: "Hermes"

custom_providers:
  - name: requesty
    base_url: "https://router.requesty.ai"
    api_key: "your_requesty_api_key"
    api_mode: anthropic_messages
Policy types that work well with Hermes:
  • Fallback Policy for reliability. If your primary model is down, Requesty retries the next in the chain.
  • Latency Routing for speed. Requesty picks whichever provider is currently fastest.
  • Load Balancing for gradual rollouts between models.

EU routing

To pin all traffic to the EU region:
custom_providers:
  - name: requesty
    base_url: "https://router.eu.requesty.ai"
    api_key: "your_requesty_api_key"
    api_mode: anthropic_messages
See EU Routing for details on the regional endpoint.

Verifying the integration

Start Hermes and send any message:
hermes chat -q "What model are you?"
Then check the Requesty analytics dashboard to confirm the request was logged. You should see the request tagged with HTTP-Referer: https://hermes-agent.nousresearch.com in the request metadata.

Troubleshooting

Verify your API key is correct and active. Open the API Keys page to confirm. Make sure the key in your ~/.hermes/config.yaml matches exactly, including any trailing = characters.
Confirm you are using api_mode: anthropic_messages in your config. The OpenAI chat completions format does not support automatic prompt caching. Also ensure you are using an Anthropic model (Claude family) as caching is provider-specific.
Check that the model ID format is correct (provider/model-name) and that the model is available in the Model Library. If your organization uses approved models, ensure the model is on the approved list.
Hermes defaults to a 120 second read timeout. For long-running requests on reasoning models, you can increase it in your provider config. See the Hermes configuration docs for timeout settings.

References

Last modified on June 30, 2026