Every open model.
Your hardware. Your rules.

Start with the ARK-curated catalogue — frontier LLMs, vision, code, embeddings, image and speech, ready to call through an OpenAI v1-compatible API. Need something else from Hugging Face? Search the 500,000+ open-source library and request deployment in one click.

10+
Curated on ARK Cloud
500K+
Hugging Face on-request
6
Modalities supported
1
OpenAI v1 endpoint

Frontier open models, ready to call.

Same weights as upstream. Production-tuned for ARK's runtime — with stateful inference, session-level KV isolation, and EU data residency on Cloud.

See full pricing →
A
Llama-3.1-8B-Instruct
Meta · Llama 3.1 license
TEXT

Compact, instruction-tuned LLM. Strong default for chat, summarisation and tool-calling at low cost-per-token.

Params
8B
Context
128K
Stateful
Yes
Cloud price
$0.29 / $0.29 per M
Q
Qwen3-32B
Alibaba · Apache-2.0
TEXT

Mid-sized reasoning model, strong on multilingual benchmarks. A good upgrade from 8B-class when quality matters more than throughput.

Params
32B
Context
128K
Stateful
Yes
Cloud price
$1.79 / $1.79 per M
</>
Qwen3-Coder-30B-A3B-Instruct
Alibaba · Apache-2.0
CODE

Mixture-of-experts code model. Best-fit for IDE-integrated assistants and iterative debug loops — pairs naturally with stateful sessions.

Params
30B (MoE)
Active
3B
Stateful
Yes
Cloud price
$1.79 / $1.79 per M
B
Bielik-11B-v3.0-Instruct
SpeakLeash · Apache-2.0
TEXT

Polish-language LLM trained on the SpeakLeash corpus. The default for CEE deployments where Polish quality matters more than English throughput.

Params
11B
Context
32K
Stateful
Yes
Cloud price
$0.49 / $0.49 per M
V
Qwen2.5-VL-7B-Instruct
Alibaba · Apache-2.0
VISION

Vision-language model for OCR, document understanding, chart reasoning and visual grounding. Returns structured output through the standard chat API.

Params
7B
Modality
Image + Text
Stateful
Yes
Cloud price
$0.29 / $0.29 per M
O
gpt-oss-20b
OpenAI · Apache-2.0
TEXT

OpenAI's open-weights release. Drop-in 20B reasoning model for teams who want OpenAI-style behaviour without an OpenAI account or its data terms.

Params
20B
Context
128K
Stateful
Yes
Cloud price
$0.49 / $0.49 per M
E
BAAI/bge-m3
BAAI · MIT
EMBED

Multilingual embedding model with dense, sparse, and multi-vector retrieval modes. The default for RAG, semantic search and hybrid retrieval.

Dim
1024
Languages
100+
Max input
8192 tok
Cloud price
$0.01 per M tokens
I
Stable Diffusion 3.5-Large
Stability AI · SAI Community
IMAGE

Text-to-image generation at production quality. Configurable steps, samplers and resolution. Returns images through the standard images endpoint.

Params
8B
Output
up to 2MP
Stateful
N/A
Cloud price
$0.019 per image
S
Whisper Large v3-Turbo
OpenAI · MIT
SPEECH

Multilingual speech-to-text with strong Polish, German, French and English performance. Best-fit for live meeting and call transcription pipelines.

Languages
99
Latency
Real-time
Stateful
N/A
Cloud price
$0.0144 per audio-hour

Pricing shown is ARK Cloud, per million input/output tokens unless stated. ARK Tailored and ARK Core run any of these models on your hardware under a per-GPU license. See full pricing →

Don't see your model?
Search 500,000+ on Hugging Face.

ARK uses the standard Hugging Face model format — any open-weights model is a deployment request away. Search the live HF index, request the one you need, and our team validates it on the ARK runtime.

Quick search:
Start typing to search the Hugging Face index in real time.

Don't want to search?

From a model ID to a working endpoint.

  1. 1

    Pick or request a model

    Click Try on Cloud on a curated model, or Request deployment on any Hugging Face model. We capture the model ID, your tier, and the GPU profile that matters.

  2. 2

    We validate on the ARK runtime

    Compatibility check, quantisation strategy, sharding plan, benchmark on representative GPUs. You get a one-page report — expected throughput, VRAM footprint, context headroom.

  3. 3

    It goes live on your tier

    ARK Cloud: we promote it into the curated catalogue. ARK Tailored / Core: we ship the model entry into your Model Storage; your operator promotes it on the next rollout. No restart needed.

The honest answers.
Can I use any Hugging Face model on my ARK deployment?
If it's open-weights and ships in the standard Hugging Face format (with SafeTensors), yes — ARK is built around exactly that contract. Gated and licensed models require you to provide the access token at deployment time. Closed-weights APIs (GPT-4, Claude, Gemini) are not deployable on ARK by definition.
How long does a deployment request take?
For models in well-known families (Llama, Mistral, Qwen, Gemma, Phi, Whisper, Stable Diffusion, BGE) usually 2-3 working days end-to-end. For exotic architectures or new release weeks, allow up to 10 days — we'll give you a firm date in the validation step.
Are all models stateful-capable?
Stateful inference applies to autoregressive text and code models — that's where the KV-cache lives. Embeddings, image generation and speech models are stateless by construction. Stateful sessions on ARK reduce per-call prefill compute by ~98.9% on multi-turn workloads (~46 tokens/call vs ~4,150 stateless), cutting time-to-first-token (TTFT) by 87%.
Where do model weights physically live?
On ARK Cloud, in our EU-hosted Model Storage, behind the API Gateway. On ARK Tailored and ARK Core, in your Model Storage on your infrastructure — ARK never sees them. The same Hugging Face directory layout is used everywhere; weights are portable between deployments.
Can I deploy a fine-tuned variant of a curated model?
Yes. If your fine-tune ships as a Hugging Face directory or merged checkpoint, the deployment process is the same as any other model. We don't host private weights on ARK Cloud today — that lives on Tailored or Core.
What about model licensing?
Each model carries its upstream license (Apache-2.0, MIT, Llama 3.x license, SAI Community license, etc.). ARK does not relicense weights — you operate under whichever terms the model author published. We surface the license on every card and in the deployment confirmation.

Pick a model. Pick how it runs.

Same weights, same OpenAI-compatible contract. The decision is just where the GPUs sit.