DEVELOPERS · BLOG

Notes from the runtime.

Engineering deep-dives, architecture notes, and the occasional rant from the team building sovereign inference infrastructure for regulated European enterprise. We write about the things we wish someone had written when we were building ARK.

3 posts published 4 topics tracked Updated weekly

Latest

Showing 3 of 3 posts

New · Architecture

Built for agents: the inference substrate agentic AI was waiting for.

Current inference stacks were built for one-shot chat. Agents balloon the same context across hundreds of turns — and stateless APIs re-pay that tax on every call. Here is why stateful, isolated, fault-tolerant, sovereign inference is the substrate agentic AI actually needs.

📅 Apr 18, 2026 ⏱ 7 min read By the ARK Engineering team

Read the post →

Architecture

Built for agents: the inference substrate agentic AI was waiting for.

74% of enterprises plan agentic AI within two years. 85% need it customised. Only 21% have mature governance. Here’s why stateful, isolated, fault-tolerant, sovereign inference is the substrate agentic AI actually needs — and why the chat-era stack won’t cut it.

Apr 18, 2026 ⏱ 7 min read

Performance

Unlocking larger context windows for AI models — without breaking the bank.

Large context has always carried a tax: more tokens, more memory, more dollars. Here’s how optimized models, stateful processing, and affordable hardware make 128k+ context practical for teams that aren’t named “Anthropic” or “OpenAI.”

Apr 3, 2026 ⏱ 7 min read

Architecture

Stateful vs stateless LLMs: what GPU-resident context actually buys you.

A stateless API is easy to ship and expensive to run. We walk through the engineering tradeoffs of putting conversation state on the GPU — what it saves, what it demands of session management, and where the 10× token reduction actually comes from.

Apr 10, 2026 ⏱ 8 min read

Coming soon

Fault tolerance when a node drops mid-token: how ARK keeps 99% of sessions alive.

A walkthrough of ARK’s re-sharding protocol, what happens when a compute node falls out of the ring mid-generation, and why “the whole group crashes” isn’t an acceptable failure mode in production.

Drafting In review