Engineering deep-dives, architecture notes, and the occasional rant from the team building sovereign inference infrastructure for regulated European enterprise. We write about the things we wish someone had written when we were building ARK.
Current inference stacks were built for one-shot chat. Agents balloon the same context across hundreds of turns — and stateless APIs re-pay that tax on every call. Here is why stateful, isolated, fault-tolerant, sovereign inference is the substrate agentic AI actually needs.
Read the post →74% of enterprises plan agentic AI within two years. 85% need it customised. Only 21% have mature governance. Here’s why stateful, isolated, fault-tolerant, sovereign inference is the substrate agentic AI actually needs — and why the chat-era stack won’t cut it.
Large context has always carried a tax: more tokens, more memory, more dollars. Here’s how optimized models, stateful processing, and affordable hardware make 128k+ context practical for teams that aren’t named “Anthropic” or “OpenAI.”
A stateless API is easy to ship and expensive to run. We walk through the engineering tradeoffs of putting conversation state on the GPU — what it saves, what it demands of session management, and where the 10× token reduction actually comes from.
A walkthrough of ARK’s re-sharding protocol, what happens when a compute node falls out of the ring mid-generation, and why “the whole group crashes” isn’t an acceptable failure mode in production.