Notes from the runtime.

Engineering deep-dives, architecture notes, and the occasional rant from the team building sovereign inference infrastructure for regulated European enterprise. We write about the things we wish someone had written when we were building ARK.

3 posts published 4 topics tracked Updated weekly

Latest

Showing 3 of 3 posts

Built for agents: the inference substrate agentic AI was waiting for.

74% of enterprises plan agentic AI within two years. 85% need it customised. Only 21% have mature governance. Here’s why stateful, isolated, fault-tolerant, sovereign inference is the substrate agentic AI actually needs — and why the chat-era stack won’t cut it.

Unlocking larger context windows for AI models — without breaking the bank.

Large context has always carried a tax: more tokens, more memory, more dollars. Here’s how optimized models, stateful processing, and affordable hardware make 128k+ context practical for teams that aren’t named “Anthropic” or “OpenAI.”

Stateful vs stateless LLMs: what GPU-resident context actually buys you.

A stateless API is easy to ship and expensive to run. We walk through the engineering tradeoffs of putting conversation state on the GPU — what it saves, what it demands of session management, and where the 10× token reduction actually comes from.

Fault tolerance when a node drops mid-token: how ARK keeps 99% of sessions alive.

A walkthrough of ARK’s re-sharding protocol, what happens when a compute node falls out of the ring mid-generation, and why “the whole group crashes” isn’t an acceptable failure mode in production.