Hermes Agent, from Nous Research, is the fastest-growing agent framework of 2026 — 95,600+ GitHub stars in seven weeks. A self-improving autonomous agent with persistent three-layer memory and 118 bundled skills. Run it on ARK and its loops collapse from thousands of tokens to tens per call.
Hermes Agent is an open-source autonomous AI agent built by Nous Research and released in February 2026. It is designed as a persistent assistant that learns from its interactions, retains memory across sessions, and can operate across multiple messaging platforms simultaneously.
v0.10 ships with 118 bundled skills, a three-layer memory system (working, episodic, long-term), FTS5 recall with LLM summarization, six messaging integrations (Telegram, Discord, Slack, WhatsApp, Signal, Email, CLI), and a closed learning loop that creates reusable skills from experience.
It runs anywhere — a $5 VPS, a dedicated GPU instance, or serverless infrastructure like Daytona or Modal. Point it at ARK and every step of its planning, tool use, and summarization loop runs inside the EU, with agent-sized token bills.
Hermes' loops replay the same context on every step — planning, reflection, tool observation, summarization. ARK's session-level KV-cache persistence means that context is computed once per session, not once per turn.
From ~4,150 tokens per call stateless to ~46 tokens stateful, measured on real loops. Your Hermes bill scales with intent, not with context length.
TTFT drops from ~1.07s to ~0.14s in stateful mode — the difference between a sluggish agent and a responsive one on long loops.
Persistent agents need a persistent backend. ARK's shard-redundant runtime keeps serving even at 99% hardware failure — your Hermes instance keeps its train of thought.
Hermes reads a standard config file. Set ARK as the inference provider, pick a reasoning-capable model (DeepSeek R1, QwQ-32B, or Llama 3.3 70B all work well), and enable stateful mode so the KV-cache persists across the agent's loop.
The memory, skills, and messaging layers all keep running on whatever infra you chose — only the inference calls change addresses.
Hermes docs →[inference] provider = "openai" base_url = "https://api.ark-labs.cloud/api/v1" api_key = "${ARK_API_KEY}" model = "deepseek-r1" # or qwq-32b, llama-3.3-70b-instruct stateful = true # keep the KV-cache warm [memory] layers = ["working", "episodic", "long_term"] [channels] enabled = ["telegram", "slack", "email"] # $ hermes up --config hermes.toml
Run multi-step research loops overnight. Hermes plans, searches, summarises, and reports back in Slack — with ARK's stateful mode keeping token cost sane.
Hermes' closed learning loop turns resolved tickets into reusable skills. Pair it with ARK's persistent inference and the agent gets cheaper and faster over time.
Hermes in Signal, Telegram, and Email; ARK in the EU. A genuinely autonomous personal operator that respects your data boundaries.
Run Hermes on ARK — free credits on ARK Cloud, or self-hosted on ARK Core / Tailored for full sovereignty.