Ship a private knowledge workspace, an always-on personal agent, an enterprise-secure assistant, or a self-improving autonomous agent — all on ARK's OpenAI v1-compatible inference layer. Point, run, done. No wrappers, no lock-in, no data leaving the region.
Each stack below is optimized for a different kind of AI workload. Pick the one that matches the outcome you're trying to ship — the inference layer is shared.
Ingest your documents, your wiki, your contracts. Ask questions, get citations, never send a byte to a public model endpoint.
An assistant that lives in the channels your team already uses — WhatsApp, Slack, Telegram, Discord — and remembers context across sessions.
Sandboxed execution, policy-based guardrails, on-premises deployment. For regulated environments that need autonomy without losing control.
Each card is a full, production-grade stack you can deploy today. Click through for what it is, what it's best at, and how to wire it to ARK in under five minutes.
Config-first personal AI agent that runs on any OS and speaks in the channels you already use — WhatsApp, Slack, Telegram, Discord, iMessage, and 20+ more. You write a SOUL.md, run one command, it's live.
NVIDIA's enterprise wrapper around OpenClaw. Adds OpenShell sandboxing, policy-based guardrails, and lifecycle management — so autonomous agents run safely inside regulated environments.
The all-in-one private AI workspace. Ingest PDFs, Word docs, whole websites, and team wikis. Chunk, embed, and query them with full citations — multi-user, role-based, and MIT-licensed.
Nous Research's self-improving autonomous agent. Persistent three-layer memory, 118 bundled skills, a closed learning loop, and six messaging integrations out of the box — runs anywhere, from a $5 VPS to a GPU cluster.
Every stack below speaks OpenAI v1 and points at the same ARK gateway — so knowledge workspaces, personal agents, sandboxed autonomous agents, and self-improving agents all share the same inference, the same audit trail, the same region.
All four stacks speak OpenAI v1. Set your base URL to https://api.ark-labs.cloud/api/v1, drop in your ARK API key, and every chat, agent, RAG query, and tool call runs inside the EU, under your compliance regime, with zero retention by default.
Running ARK Core or Tailored on your own hardware? Same config — just swap the hostname for your internal gateway.
Read the API docs →# Point any OpenAI-compatible client at ARK: OPENAI_BASE_URL="https://api.ark-labs.cloud/api/v1" OPENAI_API_KEY="ark_sk_live_...redacted..." # Pick any frontier open-source model: MODEL="llama-3.3-70b-instruct" # or: deepseek-r1, qwen2.5-72b, mistral-nemo-instruct-2407, # gpt-oss-120b, ... # That's it. Every stack below reads these three vars. # Active stack → AnythingLLM # OpenClaw# NVIDIA NemoClaw# AnythingLLM# Hermes Agent
Agents and RAG loops hammer the inference layer with short, repetitive calls. ARK is engineered for exactly that traffic shape — and for the compliance constraints that come with running agents in regulated environments.
Every stack on this page works unchanged. Swap one URL, keep your SDK, keep your prompts.
Every token stays in the region. Nothing logged, nothing retained, nothing used for training by default.
Session-level KV-cache persistence cuts tokens per call by 98.9% and TTFT by 87% for multi-step reasoning loops.
Dynamic shard redundancy means the platform keeps serving even when hardware fails. Agents don't care about your GPU weather.
Llama, Qwen, DeepSeek, Mistral, GPT-OSS, and more — benchmark-selected, always-on, served through the same API.
Add or remove GPUs at runtime, mix vendors, mix generations. No model reloads, no restarts, no cluster downtime.
No. All four speak OpenAI v1 natively. You set OPENAI_BASE_URL to https://api.ark-labs.cloud/api/v1 and pass your ARK API key — that's the integration. Every tool call, embedding request, and chat completion flows through ARK.
For most builders: ARK Cloud — managed, EU-hosted, free credits to start. If you need data sovereignty on your own hardware, run the same stacks against ARK Tailored or ARK Core. Same API surface, same SDK integrations, different deployment topology.
Yes. Because they all hit the same ARK gateway with the same API key, multiple stacks can share a single inference backend, a single audit trail, and a single credit pool. Use AnythingLLM as the knowledge layer and Hermes as the autonomous operator — both authenticated, both EU-resident, both governed by the same policies.
Agent loops re-send the same conversation context on every step. ARK's session-level KV-cache persistence means only new tokens are re-computed — a 98.9% reduction in tokens per call and an 87% reduction in time-to-first-token. For customers running on their own hardware, that translates directly into higher session density on the same GPUs.
ARK serves 10+ curated frontier open-source models — Llama 3.3, Qwen 2.5, DeepSeek R1/V3, Mistral-Nemo, GPT-OSS, and embedding + guardrail models — all benchmark-selected. New releases are onboarded based on customer demand. See the model library.
Get free credits on ARK Cloud and wire your first solution into production — or talk to us about dedicated endpoints, custom models, and self-hosted deployment.