EU-hosted · Zero retention · AI Act ready

ARK Cloud. Sovereign inference at EU scale.

Enterprise-grade inference from frontier open models to governed production — OpenAI v1-compatible, EU-hosted, zero-retention by default. Lightning-fast performance. Transparent per-token pricing. No credit card. No data leaves the region.

Start free → Talk to sales

Free credits included — sign up, connect your SDK, start building in minutes.

10+

Frontier
Open Models

99%

Best-effort
Availability

100%

EU Data
Residency

Retention
By Default

▼ POST /v1/chat/completions

ARK SUPERVISOR

SV-4U ROUTER

SHARD ROUTER

COMPUTE NODES

NVIDIA H10080GBRUNNING

AMD MI300X192GB

NVIDIA B200180GB

NVIDIA H200141GB

AGENT RUN DONE

COMPUTE NODES

AMD MI325X256GBRUNNING

NVIDIA B200180GB

Intel Gaudi 3128GB

NVIDIA A10080GB

AGENT RUN DONE

COMPUTE NODES

NVIDIA H200141GBRUNNING

AMD MI300X192GB

NVIDIA H10080GB

NVIDIA B200180GB

AGENT RUN DONE

COMPUTE NODES

NVIDIA B200180GBRUNNING

AMD MI325X256GB

NVIDIA H200141GB

AMD MI300X192GB

AGENT RUN DONE

Inference request → ARK Supervisor → Compute Nodes → Agents

16 compute nodes · Multi-vendor fleet · 0 cross-border hops

WHAT YOU GET

Everything you need for governed, production-grade inference.

Six building blocks, one platform, one API — designed so your team can move from prototype to regulated production without rewiring anything.

⇆

Scalability without constraints

Run frontier open-source models on shared ARK endpoints for consistent sub-second performance. Scale seamlessly from prototype to production with autoscaling and a best-effort 99% availability target.

€

Optimized pricing for inference

Transparent, predictable per-token pricing with inference credits. Cut cost and latency further with ARK's density gains from heterogeneous GPU support and session-level isolation — independently benchmarked for accuracy.

★

Curated frontier models

Choose from 10+ frontier open-source models — DeepSeek, GPT-OSS, Llama, Qwen, Mistral, and more. Serve text, code, vision, and reasoning models through one OpenAI v1 / Anthropic-compatible API.

⚙

AI agent essentials

Build agents faster with native function calling, structured JSON outputs, and built-in safety guardrails. Stateful inference delivers 98.9% token reduction for multi-step reasoning at scale.

☼

Portal for non-technical users

Give product, ops, and support teams a built-in chat & embeddings Portal that speaks to the same governed EU-resident inference stack as your engineers — no separate SaaS, no data leaving the region.

▤

RAG & embeddings building blocks

Ship retrieval-augmented systems with high-performance embedding models and guardrail models served through the same API. Vector storage and managed fine-tuning are on the roadmap.

SERVICES

Three ways to serve inference on ARK Cloud.

Real-time API, high-throughput batch, and a built-in Portal — all on the same runtime, same API surface, same EU residency guarantee.

Inference Service

Real-time API

Serve text, code, vision, and reasoning models via a simple OpenAI v1 / Anthropic-compatible API. Best-effort 99% availability, sub-second p95, zero retention by default.

Batch API

High-throughput async

High-throughput asynchronous inference for large jobs. Predictable p95 even at peak, ideal for back-office scoring, document processing, and bulk embedding workloads.

Portal Service

Built-in chat & embeddings UI

Give non-technical teams a governed chat, image, and document-embedding workspace that speaks to the same inference stack as your engineers — authenticated, tenant-isolated, EU-resident. Managed fine-tuning is on the roadmap.

PROOF POINTS

Benchmark-backed performance and cost efficiency.

Independently verified on open benchmarks and in production at regulated European enterprises — not slideware.

Sub-second

Proven performance, verified benchmarks

Stable latency even at peak load. Top-tier throughput on DeepSeek V3, Llama-3.3, and Qwen2.5 — independently benchmarked.

100M+tok/min

Scale without limits

Handle 100M+ tokens per minute with best-effort 99% availability. Autoscaling and the ARK runtime's heterogeneous GPU support keep throughput consistent from prototype to global deployment.

10+

Curated frontier models

Access 10+ frontier open-source models — LLMs, vision, reasoning, embeddings — expanded based on benchmarks and customer demand.

DROP-IN API

Familiar API at your fingertips.

ARK Cloud speaks OpenAI v1. Change one base URL, keep your existing SDKs, and your code works unchanged — but now every request runs inside the EU, under your compliance regime, with zero retention by default.

Learn more about the API →

quickstart.py

from openai import OpenAI

client = OpenAI(
    base_url="https://api.ark-labs.cloud/api/v1",
    api_key="ARK_API_KEY",
)

completion = client.chat.completions.create(
    model="llama-3.3-70b-instruct",
    messages=[{
        "role": "user",
        "content": "What is the answer to all questions?",
    }],
)

print(completion.choices[0].message.content)
# → runs in the EU, zero retention, OpenAI-compatible.

FAQ

Questions and answers about ARK Cloud.

Can I use ARK Cloud for large production workloads?

Yes. ARK Cloud is built for large-scale, production-grade AI workloads. Shared and reserved endpoints deliver sub-second inference, best-effort 99% availability, and autoscaling throughput — ensuring consistent performance for workloads exceeding hundreds of millions of tokens per minute. Scale seamlessly from experimentation to global deployment, with no rate throttles and no GPU management. For software-support SLAs on customer-owned infrastructure, see ARK Tailored.

I'd like to use another open-source model — what do I do?

We regularly onboard new open-source releases, including Llama, GPT-OSS, Qwen, DeepSeek, Mistral, and Flux, based on customer demand and benchmarking. Enterprise users can also request model optimization or custom deployment support through our Solutions team.

Can I get a dedicated instance?

Yes. Reserved-capacity endpoints provide session-level isolation, predictable latency, and dedicated compute capacity. For contractual SLAs beyond the best-effort 99% availability target on ARK Cloud, customers can move to ARK Tailored — a full ARK deployment on your own infrastructure, backed by a Software Support SLA. Contact our team to size your endpoint to your workload and compliance requirements.

Can I deploy custom or fine-tuned models?

Managed fine-tuning and custom-checkpoint deployment are on the ARK Cloud roadmap. Today, customers who need to run their own weights can do so on ARK Tailored or ARK Core, where the full ARK runtime deploys on their own infrastructure. Talk to us about timelines and early-access if fine-tuning on Cloud is on your critical path.

How secure is ARK Cloud and where does my data go?

We operate in zero-retention mode by default — requests and outputs are never stored or reused for training. All data is processed in EU facilities, with strict data-residency guarantees and GDPR compliance. SOC 2 Type II and ISO 27001 certifications are in progress; see our Trust Center for the current status.

What are your rate limits, and can they be increased?

Starter tiers include high defaults; enterprise removes caps. We size endpoints to your real traffic profile. Enterprise customers can lift all limits — unlimited throughput, autoscaling to demand, and per-project tuning based on traffic profiles. Need more? Reach out and we'll size your endpoint for your real-world workload.

How can I build RAG applications on ARK Cloud?

ARK Cloud serves state-of-the-art embedding and guardrail models through the same OpenAI-compatible API as our chat and reasoning endpoints, so you can index, retrieve, and generate on one governed, EU-resident platform. Managed vector storage is on the roadmap; today, customers pair ARK embeddings with their own vector DB (PGVector, Qdrant, Weaviate, etc.).

How does your pricing compare to other providers?

Pricing is transparent per-token, with clear input/output separation and volume discounts as you scale. No hidden infrastructure costs, no idle GPU charges — pay only for what you serve.

Do you offer enterprise SLAs, dedicated support, and compliance options?

Yes. ARK Cloud ships with a best-effort 99% availability target, zero-retention by default, and GDPR compliance. Enterprise customers get reserved capacity, a dedicated Slack/support channel, custom DPAs, SSO, RBAC, and unified billing. For contractual Software Support SLAs, regulated-industry deployments, and full data sovereignty, customers move to ARK Tailored on their own infrastructure. SOC 2 Type II and ISO 27001 are in progress — see the Trust Center.

ARK Cloud. Sovereign inference at EU scale.

Run frontier open-source AI at EU-sovereign production speed.

Everything you need for governed, production-grade inference.

Scalability without constraints

Optimized pricing for inference

Curated frontier models

AI agent essentials

Portal for non-technical users

RAG & embeddings building blocks

Three ways to serve inference on ARK Cloud.

Real-time API

High-throughput async

Built-in chat & embeddings UI

Top open-source models, ready to serve.

Text and multimodal

Embeddings and guardrails

Follow ARK Cloud for updates, benchmarks, and technical deep-dives.

Benchmark-backed performance and cost efficiency.

Proven performance, verified benchmarks

Scale without limits

Curated frontier models

Familiar API at your fingertips.

ARK Cloud prices.

Questions and answers about ARK Cloud.

Start your sovereign inference journey.