EU-hosted · Zero retention · AI Act ready

ARK Cloud. Sovereign inference at EU scale.

Enterprise-grade inference from frontier open models to governed production — OpenAI v1-compatible, EU-hosted, zero-retention by default. Lightning-fast performance. Transparent per-token pricing. No credit card. No data leaves the region.

Free credits included — sign up, connect your SDK, start building in minutes.

10+
Frontier
Open Models
99%
Best-effort
Availability
100%
EU Data
Residency
0
Retention
By Default

Run frontier open-source AI at EU-sovereign production speed.

Deploy models like Llama, Qwen, DeepSeek, GPT-OSS, and Mistral on shared ARK endpoints with sub-second targets and a best-effort 99% availability target. Autoscaling and multi-region routing keep latency predictable at any scale — while every token stays inside the EU.

Try now

Everything you need for governed, production-grade inference.

Six building blocks, one platform, one API — designed so your team can move from prototype to regulated production without rewiring anything.

Scalability without constraints

Run frontier open-source models on shared ARK endpoints for consistent sub-second performance. Scale seamlessly from prototype to production with autoscaling and a best-effort 99% availability target.

Optimized pricing for inference

Transparent, predictable per-token pricing with inference credits. Cut cost and latency further with ARK's density gains from heterogeneous GPU support and session-level isolation — independently benchmarked for accuracy.

Curated frontier models

Choose from 10+ frontier open-source models — DeepSeek, GPT-OSS, Llama, Qwen, Mistral, and more. Serve text, code, vision, and reasoning models through one OpenAI v1 / Anthropic-compatible API.

AI agent essentials

Build agents faster with native function calling, structured JSON outputs, and built-in safety guardrails. Stateful inference delivers 98.9% token reduction for multi-step reasoning at scale.

Portal for non-technical users

Give product, ops, and support teams a built-in chat & embeddings Portal that speaks to the same governed EU-resident inference stack as your engineers — no separate SaaS, no data leaving the region.

RAG & embeddings building blocks

Ship retrieval-augmented systems with high-performance embedding models and guardrail models served through the same API. Vector storage and managed fine-tuning are on the roadmap.

Three ways to serve inference on ARK Cloud.

Real-time API, high-throughput batch, and a built-in Portal — all on the same runtime, same API surface, same EU residency guarantee.

Inference Service

Real-time API

Serve text, code, vision, and reasoning models via a simple OpenAI v1 / Anthropic-compatible API. Best-effort 99% availability, sub-second p95, zero retention by default.

Batch API

High-throughput async

High-throughput asynchronous inference for large jobs. Predictable p95 even at peak, ideal for back-office scoring, document processing, and bulk embedding workloads.

Portal Service

Built-in chat & embeddings UI

Give non-technical teams a governed chat, image, and document-embedding workspace that speaks to the same inference stack as your engineers — authenticated, tenant-isolated, EU-resident. Managed fine-tuning is on the roadmap.

Top open-source models, ready to serve.

A curated catalog of frontier text, multimodal, and embedding models — expanded continuously based on benchmarking and customer demand.

T

Text and multimodal

DeepSeek R1 & V3REASONING
DeepSeek-R1-Distill-Llama-70BDISTILL
Llama-3.3-70B-InstructTEXT
Mistral-Nemo-Instruct-2407TEXT
Qwen2.5-72BTEXT
QwQ-32BREASONING
Gemma-2-27b-itTEXT
GPT-OSS 120B & 20BTEXT
View all models
E

Embeddings and guardrails

BAAI/bge-en-iclEMBED
BAAI/bge-multilingual-gemma2EMBED
intfloat/e5-mistral-7b-instructEMBED
meta-llama/Llama-Guard-3-8BGUARD
Qwen/Qwen3-Embedding-8BEMBED
View all models

Follow ARK Cloud for updates, benchmarks, and technical deep-dives.

LinkedIn for product news, X for release announcements, and Discord for technical conversations with ARK engineers and other builders.

Benchmark-backed performance and cost efficiency.

Independently verified on open benchmarks and in production at regulated European enterprises — not slideware.

Sub-second

Proven performance, verified benchmarks

Stable latency even at peak load. Top-tier throughput on DeepSeek V3, Llama-3.3, and Qwen2.5 — independently benchmarked.

100M+tok/min

Scale without limits

Handle 100M+ tokens per minute with best-effort 99% availability. Autoscaling and the ARK runtime's heterogeneous GPU support keep throughput consistent from prototype to global deployment.

10+

Curated frontier models

Access 10+ frontier open-source models — LLMs, vision, reasoning, embeddings — expanded based on benchmarks and customer demand.

Familiar API at your fingertips.

ARK Cloud speaks OpenAI v1. Change one base URL, keep your existing SDKs, and your code works unchanged — but now every request runs inside the EU, under your compliance regime, with zero retention by default.

Learn more about the API
quickstart.py
from openai import OpenAI

client = OpenAI(
    base_url="https://api.ark-labs.cloud/api/v1",
    api_key="ARK_API_KEY",
)

completion = client.chat.completions.create(
    model="llama-3.3-70b-instruct",
    messages=[{
        "role": "user",
        "content": "What is the answer to all questions?",
    }],
)

print(completion.choices[0].message.content)
# → runs in the EU, zero retention, OpenAI-compatible.

ARK Cloud prices.

Scale from shared access to reserved capacity with best-effort 99% availability, transparent per-token pricing, and volume discounts for production workloads. Dedicated endpoints with software-support SLAs are available via ARK Tailored.

Questions and answers about ARK Cloud.

Can I use ARK Cloud for large production workloads?

Yes. ARK Cloud is built for large-scale, production-grade AI workloads. Shared and reserved endpoints deliver sub-second inference, best-effort 99% availability, and autoscaling throughput — ensuring consistent performance for workloads exceeding hundreds of millions of tokens per minute. Scale seamlessly from experimentation to global deployment, with no rate throttles and no GPU management. For software-support SLAs on customer-owned infrastructure, see ARK Tailored.

I'd like to use another open-source model — what do I do?

We regularly onboard new open-source releases, including Llama, GPT-OSS, Qwen, DeepSeek, Mistral, and Flux, based on customer demand and benchmarking. Enterprise users can also request model optimization or custom deployment support through our Solutions team.

Can I get a dedicated instance?

Yes. Reserved-capacity endpoints provide session-level isolation, predictable latency, and dedicated compute capacity. For contractual SLAs beyond the best-effort 99% availability target on ARK Cloud, customers can move to ARK Tailored — a full ARK deployment on your own infrastructure, backed by a Software Support SLA. Contact our team to size your endpoint to your workload and compliance requirements.

Can I deploy custom or fine-tuned models?

Managed fine-tuning and custom-checkpoint deployment are on the ARK Cloud roadmap. Today, customers who need to run their own weights can do so on ARK Tailored or ARK Core, where the full ARK runtime deploys on their own infrastructure. Talk to us about timelines and early-access if fine-tuning on Cloud is on your critical path.

How secure is ARK Cloud and where does my data go?

We operate in zero-retention mode by default — requests and outputs are never stored or reused for training. All data is processed in EU facilities, with strict data-residency guarantees and GDPR compliance. SOC 2 Type II and ISO 27001 certifications are in progress; see our Trust Center for the current status.

What are your rate limits, and can they be increased?

Starter tiers include high defaults; enterprise removes caps. We size endpoints to your real traffic profile. Enterprise customers can lift all limits — unlimited throughput, autoscaling to demand, and per-project tuning based on traffic profiles. Need more? Reach out and we'll size your endpoint for your real-world workload.

How can I build RAG applications on ARK Cloud?

ARK Cloud serves state-of-the-art embedding and guardrail models through the same OpenAI-compatible API as our chat and reasoning endpoints, so you can index, retrieve, and generate on one governed, EU-resident platform. Managed vector storage is on the roadmap; today, customers pair ARK embeddings with their own vector DB (PGVector, Qdrant, Weaviate, etc.).

How does your pricing compare to other providers?

Pricing is transparent per-token, with clear input/output separation and volume discounts as you scale. No hidden infrastructure costs, no idle GPU charges — pay only for what you serve.

Do you offer enterprise SLAs, dedicated support, and compliance options?

Yes. ARK Cloud ships with a best-effort 99% availability target, zero-retention by default, and GDPR compliance. Enterprise customers get reserved capacity, a dedicated Slack/support channel, custom DPAs, SSO, RBAC, and unified billing. For contractual Software Support SLAs, regulated-industry deployments, and full data sovereignty, customers move to ARK Tailored on their own infrastructure. SOC 2 Type II and ISO 27001 are in progress — see the Trust Center.

Start your sovereign inference journey.

Get free credits on ARK Cloud, or talk to our team about dedicated endpoints, custom models, and enterprise SLAs.