> ## Documentation Index
> Fetch the complete documentation index at: https://docs.requesty.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Hermes Agent

> Connect Hermes Agent to Requesty for access to 300+ models, automatic prompt caching, cost tracking, and fallback routing.

[Hermes Agent](https://hermes-agent.nousresearch.com) is an open source AI agent by Nous Research that runs across CLI, TUI, desktop, and messaging platforms. It features persistent memory, tool use, browser automation, and multi-session delegation.

Using the Requesty integration, you can:

* Access **300+ models** from OpenAI, Anthropic, Google, Mistral, and many other providers through one API key.
* Get **automatic prompt caching** on Anthropic models, reducing cost significantly on multi-turn conversations.
* Track and manage your spend in a single location.
* Apply [fallback policies](/features/fallback-policies), [load balancing](/features/load-balancing-policies), and [latency routing](/features/latency-routing) to keep your agent responsive.

## Prerequisites

* [Hermes Agent installed](https://hermes-agent.nousresearch.com/docs/getting-started/installation) (`curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash`).
* A Requesty API key from the [API Keys page](https://app.requesty.ai/api-keys).

## Configuration

The recommended setup uses the native Anthropic Messages format (`api_mode: anthropic_messages`), which enables automatic prompt caching on supported models. This is the optimal configuration for Hermes because its large system prompt and tool definitions benefit heavily from prefix caching across turns.

Create or replace your global Hermes config:

```bash theme={"dark"}
cat > ~/.hermes/config.yaml << 'EOF'
model:
  default: "anthropic/claude-sonnet-4-5"
  provider: "requesty"
  default_headers:
    HTTP-Referer: "https://hermes-agent.nousresearch.com"
    X-Origin-Title: "Hermes"

custom_providers:
  - name: requesty
    base_url: "https://router.requesty.ai"
    api_key: "your_requesty_api_key"
    api_mode: anthropic_messages
EOF
```

Then start a session:

```bash theme={"dark"}
hermes
```

<Info>
  The `api_mode: anthropic_messages` setting tells Hermes to use the native Anthropic Messages API format. This is what enables Requesty's automatic prompt caching, which can reduce costs by up to 90% on long conversations by caching the system prompt and tool definitions between turns.
</Info>

### Why anthropic\_messages matters

Hermes sends a large system prompt (often 20,000+ tokens including tool definitions) on every turn. With the OpenAI chat completions format, this entire prompt is re-processed from scratch each time. With the native Anthropic Messages format, Requesty automatically applies cache control breakpoints so subsequent turns in the same conversation reuse the cached prefix, paying only for new user messages and responses.

## Model selection

You can use any model from the [Model Library](https://app.requesty.ai/model-list). Set the default in `config.yaml` or switch mid-session:

```yaml theme={"dark"}
model:
  default: "anthropic/claude-sonnet-4-5"
```

Override per session with the `--model` flag:

```bash theme={"dark"}
hermes chat --model "openai/gpt-4o"
hermes chat --model "google/gemini-2.5-pro"
```

Or use a [Routing Policy](#recommended-use-a-routing-policy) to change models without editing config.

## Recommended: use a Routing Policy

Instead of hard coding a model, point Hermes at a **Routing Policy**. A policy is a named alias that resolves on the Requesty side. You swap the underlying model from the [Routing Policies](https://app.requesty.ai/routing-policies) page without touching your config.

```yaml theme={"dark"}
model:
  default: "policy/reliable-sonnet"
  provider: "requesty"
  default_headers:
    HTTP-Referer: "https://hermes-agent.nousresearch.com"
    X-Origin-Title: "Hermes"

custom_providers:
  - name: requesty
    base_url: "https://router.requesty.ai"
    api_key: "your_requesty_api_key"
    api_mode: anthropic_messages
```

Policy types that work well with Hermes:

* **[Fallback Policy](/features/fallback-policies)** for reliability. If your primary model is down, Requesty retries the next in the chain.
* **[Latency Routing](/features/latency-routing)** for speed. Requesty picks whichever provider is currently fastest.
* **[Load Balancing](/features/load-balancing-policies)** for gradual rollouts between models.

## EU routing

To pin all traffic to the EU region:

```yaml theme={"dark"}
custom_providers:
  - name: requesty
    base_url: "https://router.eu.requesty.ai"
    api_key: "your_requesty_api_key"
    api_mode: anthropic_messages
```

See [EU Routing](/features/eu-routing) for details on the regional endpoint.

## Verifying the integration

Start Hermes and send any message:

```bash theme={"dark"}
hermes chat -q "What model are you?"
```

Then check the [Requesty analytics dashboard](https://app.requesty.ai/analytics) to confirm the request was logged. You should see the request tagged with `HTTP-Referer: https://hermes-agent.nousresearch.com` in the request metadata.

## Troubleshooting

<AccordionGroup>
  <Accordion title="403 Invalid authorization token">
    Verify your API key is correct and active. Open the [API Keys page](https://app.requesty.ai/api-keys) to confirm. Make sure the key in your `~/.hermes/config.yaml` matches exactly, including any trailing `=` characters.
  </Accordion>

  <Accordion title="Caching not working (cached_tokens stays at 0)">
    Confirm you are using `api_mode: anthropic_messages` in your config. The OpenAI chat completions format does not support automatic prompt caching. Also ensure you are using an Anthropic model (Claude family) as caching is provider-specific.
  </Accordion>

  <Accordion title="Model not found">
    Check that the model ID format is correct (`provider/model-name`) and that the model is available in the [Model Library](https://app.requesty.ai/model-list). If your organization uses approved models, ensure the model is on the approved list.
  </Accordion>

  <Accordion title="Connection timeout">
    Hermes defaults to a 120 second read timeout. For long-running requests on reasoning models, you can increase it in your provider config. See the [Hermes configuration docs](https://hermes-agent.nousresearch.com/docs/user-guide/configuration) for timeout settings.
  </Accordion>
</AccordionGroup>

## References

* [Hermes Agent docs](https://hermes-agent.nousresearch.com/docs)
* [Hermes configuration guide](https://hermes-agent.nousresearch.com/docs/user-guide/configuration)
* [Hermes provider setup](https://hermes-agent.nousresearch.com/docs/integrations/providers)
* [Requesty API Keys](https://app.requesty.ai/api-keys)
* [Requesty Model Library](https://app.requesty.ai/model-list)
* [Requesty Auto-Caching](/features/auto-caching)
